TLDR; You don’t need a shop-vac for every clean-up job, and you don’t need large AI models for every use case.
Earlier today, my dog brought a frisbee full of dirt and dumped it in the garage. I had a shop-vac, an upright, a handheld, and dustpan & broom to choose from. They all would have worked, and they were all steps away from me – I went with the dustpan & broom because it was the fastest way of getting the job done. OK, I know that’s not a vacuum, but it’d suck to lose the snappy title (see what I did there?).
It got me thinking about machine learning models and how everyone assumes transformer models like GPT-4 are the only answer… but it’s unequivocally not. Just as you don’t need a shop-vac to clean up a small pile of dirt, you don’t always need the largest, most complex machine learning models to solve your problems.
Each piece of equipment [model] has its purpose and its place. Each tool is suited to a specific range of tasks, and using the wrong tool can lead to wasted time, effort, and resources. AI models come in a range of sizes and complexities. You have your simple linear regression models, your decision trees, basic neural networks, and massive transformer models like GPT-4.
A simple linear regression model is like the dust-buster of machine learning. It’s not the most powerful or complex model, but it’s perfect for simpler tasks or for gaining a quick understanding of a problem.
Decision trees (XGBoost being a popular variety) are like your standard upright vacuum. They’re more complex and powerful than linear regressions, capable of handling more intricate problems and capturing more complex relationships.
Finally, transformer models like GPT-4 are the shop-vac of the machine learning world. They’re large, complex, and capable of performing tasks that would be beyond the scope of smaller models.
However, just like with vacuums, bigger isn’t always better when it comes to machine learning models. A larger, more complex model can certainly provide more nuanced and accurate predictions – if the problem at hand requires that level of complexity.
For simpler tasks, a larger model can be overkill. They’re computationally expensive to train and run, the rationale behind a specific prediction is difficult to explain, and they can lead to overfitting if used on small or simple datasets.
Understand the scope of the problem you’re trying to solve. Consider the resources you have available – do you have the computational power and data to train a large model? Is it worth the time and effort to bring out the shop-vac?