As AI and ML is getting pervasive into our technology landscape, there is a huge benefit in getting your Data Scientists to focus on models rather than working on Data Engineering or worse wait on Data Engineers.
There are many different acronyms – DataOps, Data Engineering, Feature Engineering, and even MLOps to indicate improvement in process and engineering. For building and deploying ML to production at scale, a key is to bring DevOps automation that is the Continuous Integration (CI) and Continuous Deployment (CD) to ML world.
Many organization are doing or in the process of bringing automation to AI/ML Engineering. However, even when you have your shiny DevOps CI/CD pipeline built and automate the whole process, they still face tremendous challenge with Data. The paper will provide a DevOps solution to the ML engineering problems.
As Data is New OIL, many organizations have data in your production system that are supporting business critical system such that those data may not be available as ML Engineering may overload the database in a way the critical system may not be able to function; thus, preventing ML Engineering teams at the mercy of the availability of the data. This is the big conundrum for ML Engineering team as they exist purely due to data; however, not having the actual data, they are unable to function properly.
The architecture and design presented in this paper, are from our experiences in implementing ML projects in organization that have large data and heavy regulations as well as based on our own application development effort.
In this paper, we will outline challenges facing ML projects and provide Zenlab’s solution in Data Engineering with our proprietary toolset and ML Engineering automation with DevOps.