MLGovernance and MLOps, why your organization should embrace it

Jorge De Corte
Control your generative AI at ReBatch
6 min readFeb 21, 2022

--

Building a machine learning model that solves a business case is a hard task. It takes a highly trained data scientist weeks to months to go from idea to trained model that has acceptable results. This is not easy, it takes a lot of effort and experimentation to get to a trained model. It would be a shame if the model never gets implemented, wasting all that effort. Surprisingly this happens in the majority of the cases.

Only 22% of all models get implemented in production

This means that 78% of all data scientists' effort leads to nothing, this is not a good sign. It’s not the fault of the data scientist, but rather due to technical and organizational challenges.

Most data scientists have no experience in deploying complex systems

Data scientists are expensive and increasingly hard to find, hire, and retain. Yet companies are letting an increasingly large portion of that scarce capacity go-to model deployment, maintenance, and management. Data scientists are neither good at nor do they like doing this stuff. This is just crazy. The companies that get this right will leave the ones that continue on this course in their dust. — H.P. Bunaes. Founder AI Powered Banking

When a model gets deployed in production it’s running most of the time in its silo. Meaning, the only person that knows how it works, is the data scientist that built it. This creates risk for the organization. Knowledge about existing experiments and machine learning models should flow freely and transparently through the organization.

MLGoverance and MLOps have an answer to these concerns and are guided by the following principles:

  • Verifiability and transparency: Models need to be tested, risks need to be mapped
  • Reproducibility: Every experiment should be reproducible by any data scientist that has access.
  • Continuous Integration and collaboration: Teams of data scientists need to have the infrastructure to collaborate on ideas and experiments.
  • Effortless: All the principles above should have minimal impact on the current workflow that data scientists and business users currently use.

Most of these principles are known in the DevOps community, but they mean so much more in a data science context. Data scientists can use the tools given by MLGovernance and MLOps to collaborate, reproduce, verify and deploy their experiments in an effortless process.

MLGovernance: This concerns the verifiability and transparency of the models. Will the model continue to work in production? ML Governance makes sure the ML-model has the desired effect on the business case and no unnecessary investments are made.

MLOps: Just like code and software release. Machine learning pipelines need to consider CI/CD, deployment, tooling, data/model versioning, and reproducibility of experiments. Investing in a strong MLOps infrastructure will result in faster throughput time for experiments and higher efficiency of ML models, due to a more streamlined collaboration with other data scientists.

We present a Model Development Lifecycle that embraces the iterative process in machine learning with a framework based on MLGovernance and MLOps. This lifecycle will lead to a higher throughput of experiments, better results, and a risk-free deployment. It consists of 3 stages that are interconnected to each other:

  • Development: Understand the business case, define feasibility, set clear-cut KPIs. Train and test the model using a centralized pipeline. Map risks and explainability of the model.
  • Operations: Deploy model, implement metrics, and logging. Save predictions for retraining. This is where MLOps will shine, lots of DevOps principles can be found here.
  • Monitoring: Detection mechanisms for data or concept drift, prediction explainability, decision making based on predictions.

Implementing MLOps in the development phase is crucial to ensure reproducibility and transparency of the experiments done by data scientists. It’s necessary to control data versions, code versions, experiments, and the results. Every experiment should be tracked and easily reproducible. Any data scientist can reproduce the experiments they have access to. By implementing a strong MLOps infrastructure at the development phase it becomes easy to govern the ML models, and experiments by mapping the results and risks of every experiment to the code and data that is used.

During this phase, risks and possible failures of the model must be well documented, no ML model is 100% accurate. Mistakes will happen. By documenting in what situation a failure is likely, action can be taken to prevent that situation, reducing the risk and barrier to go from trained model to deployed model. In essence, the effect of the ML-model in the organization when deployed, should be known before deployment!

An ML project can only bring value if it gets deployed

An example of a centralized training pipeline that tests and explains the results

In the operations phase selected models that have achieved the predefined KPIs, need to be deployed in production. Using methodologies from DevOps, ML-models must be effortlessly deployed in production. By using CI/CD and logging frameworks, models can be tracked in real-time.

The monitoring phase is the most overlooked in the development lifecycle but is arguably the most important one. A model is trained on a fixed dataset, but the world around us is changing. Your machine learning model should reflect that change or the predictions will be suboptimal. Detecting this drift is crucial and action needs to be taken when it happens. When drift is detected data scientists can fall back on a strong MLOps infrastructure reducing the time and effort it takes to retrain en redeploy an improved ML-model.

Using explanation frameworks that are model agnostic can be a useful tool in detecting drift

Conclusion

When an organization wants to use machine learning, it is crucial to implement a process with MLGovernance and MLOps as a cornerstone. This reduces risk, optimizes workflow, and delivers better results by collaboration. It removes the barrier from theoretical model to production system. Overlooking this can result in an unnecessarily complex system of machine learning models that only a few understand. This needs to be avoided to ensure a stable and healthy machine learning infrastructure.

As ML has come to the mainstream, it’s become apparent that MLOps is the key to driving efficiency and scale for organizations of any size. ML is both an important and emerging category that C-level leaders will prioritize
if they want to stay competitive. — Tim Tully, CTO Splunk

At ReBatch we are a team of data scientists and MLOps Engineers that build reproducible, stable, and efficient machine learning software that solves business cases. Interested in what we can do for your organization? Don’t hesitate to reach out or send a mail to jorge.decorte@rebatch.be

References

--

--