“Do You Even Lift?” – A Jaxon Guide to Training Classifiers

“Do You Even Lift?” – A Jaxon Guide to Training Classifiers

Once you’ve used Jaxon to label your training set and you’re ready to embark on training classifiers, it would seem that you’re in the home stretch and there’s not much more work to be done. However, to obtain a high-performing classifier, this isn’t the case. The data you use for training classifiers is vital, but it turns out that how you use it for training classifiers is important as well. 

Children learn in a pretty predictable way, at least most of the time. For example, first they will learn to count, then to add, then to do algebra, then calculus, and so on. And unless they have an eidetic memory, they will likely need to see each of these concepts several times in order to properly learn them. If a child hasn’t learned one of the concepts, they will not be able to learn the more advanced topics that rely on the foundation. It can be helpful to imagine that neural networks learn in a similar way – the order in which information is presented to them, as well as repetition, affects their overall accuracy as a classifier. 

This is why Jaxon allows for both transfer learning and training schedules for the neural network. For a crash course, let’s start with curriculum training and work our way back to transfer learning. 

Once you have labeled data, it’s not just a matter of running it through your classifier once and wiping your hands clean of it. Neural networks are complex architectures that benefit greatly from seeing information many times – just like humans. The order in which the examples are presented can also have an effect. Add multiple training sets to the mix (if you’re lucky enough to have them!) and it can get complicated very quickly.

Training the neural net for several epochs allows it to run through the data once, check to see how well it did, refine the parameters, and try again until it’s gone through all the epochs specified. As the neural net sees the data several times, the parameters will become more and more optimized to the particular set it is training on. Typically, the F1 score will increase over several epochs until the neural net has reached peak performance, but exactly how many are needed depends on the data and neural net used.

Jaxon also provides other training options to increase classifier accuracy, such as training on data sets augmented with synthetic labels and techniques such as freezing neural net layers for several epochs, allowing task-specific layers to begin learning the dataset before allowing the pretrained foundation to drift. Each of these techniques can greatly increase classifier accuracy, but it is usually difficult and time-consuming to manually figure out what will be best for each particular training set. Jaxon makes these processes as easy as selecting from a list or clicking a checkbox and then letting Jaxon work its black magic juju. Jaxon makes it easy  to try these techniques on several different neural architectures.

Transfer learning primes the neural network for the language and patterns it’s about to see and allows it to begin to set its parameters for the particular dataset before it even sees any labels. It’s not just a useful boost for when the labeled sets are small; the beneficial effect is even seen when there are huge amounts of labeled data as the early examples the neural net is exposed to have a big influence on the outcome of training. It has even been found that pretrained neural networks actually learn different features altogether from the labeled training sets than non-pretrained networks. 

By making transfer learning so easy to implement, Jaxon amplifies the expertise of data scientists and analysts, enabling them to supervise and focus on the jobs that AI is not (yet!) capable of. By fine-tuning on domain-related datasets as well as on the training data itself, a very specialized vocabulary can be developed that enhances the learning speed and ultimate performance ceiling of a neural model. In the end, Jaxon optimizes model accuracy with great speed and efficiency, allowing classifiers to be used in production much quicker than has ever been possible before.

– Charlotte Ruth, Director of Linguistics