To train or not to train

Should you use your company data to train a LLM? And is there any benefit in this vs just fine-tuning an existing one? There’s a prevailing lack of understanding around what exactly “training” is and overexcitement to get models deployed and running in businesses can lead to all sorts of problems. Executive Excitement I was…

Should you use your company data to train a LLM? And is there any benefit in this vs just fine-tuning an existing one?

There’s a prevailing lack of understanding around what exactly “training” is and overexcitement to get models deployed and running in businesses can lead to all sorts of problems.

Executive Excitement

I was in a client meeting the other day and one of the executives presented their AI strategy to those present. It was a good presentation with many interesting insights into how AI could help this business. One thing, though, in particular stuck out. The proposal was to train a Large Language Model on their data and then deploy this to their employees to use in their day-to-day work.

Sounds like a potentially great idea, but I, being somewhat facetious, posed the question – do you have $20 million to spend? This was a finger in the air estimation based on how much it costs to train a LLM . Given that this particular company has revenues of about $100m it seems pretty unlikely they’re going to be able to unlock that level of capex any time soon. This assumes that they even have enough internal data to justify the need for an LLM as well. Given that the number of data you need to train one is around 20 times the parameter size of the model, it’s unlikely they have that much data to train it on. So the question really becomes not should you train your own LLM but what are you hoping to achieve by doing so?

It’s very easy to get overexcited by the possibilities of LLMs without really thinking about if your very own one is even needed. Business leaders tend to get overexcited by the possibilities but when presented with the cots and complexities of training one from scratch, lose enthusiasm and think about canning the project. This is a shame because this surface-level understanding can make businesses pivot away from AI when it could actually be very useful for them with significantly less effort and costs.

What to do

Many people have some confusion over the difference between three key things: training, fine-tuning and inference.

Training – building a model from scratch

Fine-tuning – tuning an existing model to do a specific task

Inference - running inputs through the model

Let’s begin at the end with inference. Inference is simply what you do when the model is trained and you want it to do some prediction for you (e.g. classify some articles etc.). I’ll cover inference in more detail in a future post because there’s some interesting things you can do with vectorisation and inference with existing models.

Fine-tuning is pretty much what it sounds like. You take a pre-trained model, like BERT, and run some “light” training to get it to make accurate predictions on a specific dataset. Fine-tuning can take a long time and be very computationally expensive (it can also be very quick) depending on what you want it to predict and how much data you have to train it on. With fine-tuning all of the limitations of the initial model (the one that was pre-trained and you’re fine-tuning from) exist in the final model. This is something to think about when selecting models to fine-tune. You wouldn’t exactly choose a network that’s been trained on images to do text classification and vice versa.

Training is starting a model from scratch. In this case you start with blank neural network with no weights adjusted in any of the layers and hyperparameters that you’re just starting with. The advantage of training from scratch is that the model will be highly tailored to your specific needs, but this could come at great expense and computational cost to you. Of course, a model you’ve trained can subsequently be fine-tuned further.

So should you do it?

Unfortunately there’s no clear cut answer to this. It really depends on your requirements. I’m going to go out on a limb though and say that the vast majority of businesses do not need to train their own LLMs. It’s expensive. The risk of overfitting is high and there advantages of doing so over fine-tuning an existing model are rarely worth it.