skip to main content
Close Icon We use cookies to improve your website experience.  To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy.  By continuing to use the website, you consent to our use of cookies.

Omdia view

Summary

Data drives machine learning (ML) within the enterprise. Unfortunately, data and the algorithms, models, and other artifacts used in building predictive ML outcomes share very little architecturally with the data upon which they depend. In response, database vendor Splice Machine, with its recent 3.0 release, hopes to bring the two highly interdependent worlds a bit closer together to both accelerate and simplify the ML lifecycle.

Table functions: new life for an old idea

For even the most straightforward enterprise ML project, such as predicting customer churn, data scientists must marshal an incredibly complex and highly iterative workflow. These efforts, often carried out within a web-based notebook environment such as Project Jupyter, require a constant back and forth with supportive database assets, which becomes more difficult and more crucial as the ML project nears and enters production.

In response, technology providers are beginning to roll out more lifecycle complete artificial intelligence (AI) development tools, as with AWS Sagemaker Suite. Likewise, database vendors themselves are tackling this issue by pushing ML assets and executables directly into the database itself.

With its recent 3.0 release, database vendor Splice Machine has introduced an interesting take on this approach using an age-old database technique (stored procedures). The vendor has embedded its own implementation of a Jupyter Notebook environment, one that can support multiple languages beyond Python and R. This environment leverages an in-product implementation of the popular ML lifecycle management platform MLflow. Alternatively, models can be deployed to AWS Sagemaker or Microsoft Azure ML.

Using these tools, developers can call a single “deploy()” function to push their final model to production. This function takes the final ML model plus the supportive database table and generates code automatically as a table function (similar to stored procedures) within the database. Every time new records come into the database, this native table resource will execute, generating new output predictions.

Advantages inherent in this methodology include a greatly simplified model deployment routine, better visibility into key post-deployment issues such as model drift, performance gains because the data doesn’t have to leave the database, and a lower management burden since the solution incorporates its own container management platform. Translated into business outcomes, this means a significant reduction in the amount of time it takes to deliver an ML solution and a reduction in the cost of maintaining that solution over time.

Is this the only solution to the problem of operationalizing ML in the enterprise? Of course not. As with any real-world ML implementation, the desired outcome depends upon a happy marriage between the tools selected and the available resources (both technology and expertise). Fortunately, because the ML marketplace relies heavily on open source software, enterprise buyers can mix and match languages, libraries, orchestrators, et al.

This means, too, that buyers can invest in a database-based solution such as Splice Machine and still make use of familiar resources. At present, version 3.0 can push models to AWS Sagemaker and Microsoft Azure ML, for example. And in the future, Splice Machine may well allow users to run other ML orchestration tools such as Kubeflow alongside or in place of MLflow.

Omdia expects further investments in this approach from the broader database community. Oracle recently began pushing many ML features into Oracle Autonomous Database; Microsoft has done so as well. More will follow, particularly as the industry begins tackling more ML use cases in extreme environments (manufacturing, medicine, oil and gas, etc.), where time and performance are of the utmost importance.

Appendix

Further reading

“Oracle Autonomous Database expands its umbrella,” INT002-000271 (February 2020)

“Oracle looks to take the “Oops!” out of data science,” INT002-000273 (February 2020)

2020 Trends to Watch: Analytics and Data Management, INT002-000272 (February 2020)

Author

Bradley Shimmin, Distinguished Analyst, Data Management and Analytics

[email protected]

Recommended Articles

;