Retrieval Enhanced Machine Learning Highlights a Vacuum Deep Learners have been Ignoring for Years

Large scale language modelling has been all the rage in the last couple years, with the release of incremental updates of the GPT-X architecture, as well as a host of competing models published by (primarily) big-tech research departments. At a recent lunch with a colleague, I expressed my frustration on behalf of researchers who don’t have access to big-tech mega infrastructure and dollars, but still hope to do innovative work in language modelling and AI more generally, from 2020 onwards. Even running these giant NLP models for inference is prohibitively expensive, and comes with substantial infrastructural overhead. This is an alarming trend, which has compounded since 2018. By way of contrast, in 2018, at Zalando research (“medium tech”) we developed Flair embeddings, and trained all RNN models on a single client with 1 GPU, yielding highly useful embeddings, which were leveraged to achieve state-of-the-art on a number of core NLP tasks. I would personally like to see more work seeking to reestablish this type of agile experimental spirit, and to see researchers having to resort to something of the type “look what I got by querying the OpenAI GPT-X API”.

Logging Parameters of Deep Learning Models with AI-JSONable

In the two previous posts (here and here) I covered how to serialize end-2-end deep-learning models, including pre- and post-processing, using respectively pickle and dill. We saw that both methods are subject to the shortcoming that the saved output is opaque as regarding which parameters and settings were used to build a model. This is often vital information for collaboration, post-mortem debugging, etc..

Packaging Deep Learning Models with Dill

A few posts back I announced a series of posts examining how to serialize deep-learning models with a variety of techniques. In the previous post, we looked at the standard method for serializing Python objects, using pickle, which is in the Python standard library. We saw that pickle is subject to several gotchas and drawbacks when it comes to deep learning code. In this post we’re going to look at dill, a third party package, developed by the The UQ Foundation. dill purports to solve several issues which arise when using pickle. We’ll see in the post, that how to actually use dill to do this is not exactly straightforward, and comes with some additional gotchas, which aren’t necessarily intuitive on the outset. As in the previous post I’m going to use these simple classes to instantiate our toy-NLP model:

Packaging Deep Learning Models with Pickle

We saw two posts back that it’s important when productionizing, and crystallizing the results of deep-learning experimentation that all logic necessary for production is encapsulated in a single serialized format. If that’s not achieved, then data engineers may be confronted with awkward questions such as:

Wrap your PyTorch models with this simple function for great convenience

In the previous post, I proposed a convention for packaging deep learning models, so that models always live together with the vital routines necessary for preparing and post-processing inputs and outputs for operational use. Two posts back, I talked about why this is so important. In this post, I want to formalize this convention with a simple function written in pure PyTorch, which can really facilitate using PyTorch models in practice.

Conventions for packaging deep learning models

We talked in the last post about the need for a code of “best practices” for packaging and exporting deep learning models. In this post, I’m going to propose some handy conventions, which I believe will assist in the interoperability between deep learning models and the broader ecosystem.

Packaging deep learning models is hard

After you’ve diligently trained and re-trained your deep neural network and performed model selection over a grid of hyperparameters, as a data-scientist or researcher, your feeling is generally “great now I’m done!”.

Coming soon!

Watch this space for thoughts, news and updates.