Machine learning models – from Jupyter to production

by Laura Zucchetti
03 May 20223 min read

Recently, a client asked us how we can use a machine learning model within a web application. We think that’s a fantastic question.

Many machine learning tutorials you find online are focused only on the “training” part of machine learning. That is, things like extracting data, preparing data, feature engineering, training, and evaluation.

However, unless there is a way to use your model elsewhere, your awesome machine learning model stays in your Jupyter notebook (or whatever you were using for training).

So, how does one actually deploy this model to production to be used by your app? Also, do we need to hire people with a different skillset in order to do this?

What is a model, exactly?

It might be helpful to review what exactly a machine learning model is.

From a machine learning standpoint, a model can be defined by:

Hyperparameters. For example, in deep learning, your hyperparameters are the architecture of your neural network (number of layers, nodes at each layer, activation functions, etc.). Learned parameters. For example, in deep learning this will be the weights of each node in your network. If we know both of these things, we can fully define a model. So, all we need to do is to save this information in a format that we can use to reproduce a model. This process is called serialisation.


For the purposes of this blog post, we will assume you are mainly using Python tools.

Unfortunately, the serialisation process is often library-specific, with each library or framework recommending its own serialisation methods.

For example, Keras (a high-level deep learning framework) serialises into a format called the Hierarchical Data Format (with the file extension .hdf5). The recommended method for PyTorch, another popular deep learning framework, involves pickling a Python dictionary containing the model’s weights (see the documentation).

Pickling is also recommended by scikit-learn, as described here. There are problems involved with using pickle, such as lack of backwards compatibility (i.e. future versions of scikit-learn cannot use pickled models of previous versions), and also the fact that we are constrained to using Python.

As these tools mature, we can expect to see better support for cross-framework, and cross-platform serialisation.

Deploying the model

For many situations, the simplest way to deploy a model is to simply move the serialised file to a Django or Flask (or any Python web framework) web service. Ideally, this service should be hosted separately from your main web application for performance and memory reasons, and your main application should interact with it via HTTP requests. (Further reading: here’s a good tutorial to do this for Keras.)

Once the file is in the filesystem, we can load the model with Python and then use the model to make predictions within the web application code, and return the result in a HTTP response. This method works quite well, and is suitable for applications that do not involve high traffic. This is relatively straightforward to achieve, and the suitable person to do this would be a backend developer with some familiarity with Python web applications.

For high performance and high traffic applications that use deep learning models, another good option is Tensorflow Serving. This is an ultra optimised implementation of a deep learning web service, with more industrial-grade features.


In summary, to deploy a machine learning model to production, we first serialise the model into a file with a format that can reproduce the model. Then we copy the model to a web service written in Python, which will read the file and rebuild the model. We then interact with the Python web service via HTTP to make predictions.

If you need help with your machine learning needs, we would love to hear from you and discuss your situation!

Sign up and stay in the loop!

Your email address

No spam and no sharing of your details. Just useful thoughts and ideas in your inbox. :)