Jaxon automates the process of creating training data for AI.
Taking Human Labelers out of the Loop
What used to take months to do now takes days!
People can be creative powerhouses, but when it comes to simple repetitive tasks, machines rule the day. Modern manufacturing relies on people to design the factory, but assembly lines are automated - this makes them cheaper, faster, and more consistent.
The same goes for machine learning. People must be involved with training pipeline design, but they shouldn’t be a part of the pipeline.
With Jaxon, users design the factory; they don’t work on the assembly line!
Given a classification algorithm, there is a direct correlation between the number of labeled examples provided for training and the resulting accuracy of the classification model, but labeled training data is difficult and expensive to acquire at scale.
Jaxon augments a small number of manual Ground Truth labels to synthesize labels for the remainder of the training data, thus creating a larger training dataset. While a training dataset comprised of 100% Ground Truth labels is ideal for optimal F1, incorporating synthetic labels dramatically reduces the number of manual labels needed for a classifier and still yields an ultimate classifier accuracy comparable to that of a classifier trained on a 100% Ground Truth dataset.
Completely unsupervised machine learning misses out on a lot of knowledge that humans have. A great way to encapsulate human knowledge about a domain is to write heuristics. Heuristics represent this knowledge while requiring minimal human involvement and are a powerful way to fill in gaps of coverage for a model. Likely, heuristics can give lift to areas where the classical and neural models are performing subpar.
Jaxon leverage ensembling techniques, such as Snorkel, to appropriately combine human-provided heuristics in the form of regular expressions with cutting-edge machine learning models. These work together to label large datasets automatically and improve training outcomes. Our hybrid approach allows human understanding to augment machine learning and let the machines do the heavy lifting, resulting in a powerful combination.
Unsupervised Data Augmentation
Deep Learning requires a lot of data - language models learn from millions (sometimes billions) of examples just to know how to structure language. Sometimes, the amount of data that you have available is not enough. Data Augmentation provides a solution for this lack of data: create large and more diverse dataset by generating extra examples based on what you already have.
In addition to supervised learning, Jaxon's Data Augmentation uses unlabeled data to generate new, augmented unlabeled examples. In using this method, Jaxon can create dozens - sometimes even hundreds - of new examples based off of only one unlabeled example. These augmented examples then help train a model to be consistent in its predictions.
Neural Architecture Design
Jaxon provides a common interface to rapidly train, hyperparameterize, and evaluate neural text classifiers. Users can craft and apply problem-specific data and knowledge without writing code (even on their own models), view intermediate results, and make adjustments along the way. Jaxon enables users to focus on the high-level creative parts of model design, not the plumbing.
Jaxon leverages state-of-the-art architectures such as:
As new algorithms and architectures are discovered, Jaxon’s patent-pending technology incorporates them.
A key factor when working with a limited labeling budget is efficiently identifying which labels have the least confident predictions and would most benefit from user verification. Jaxon leverages active learning to identify these labels, which involves prioritizing the order by which users label and strategically and incrementally building up an aggregation of valuable labels. In order to prevent the brute-force labeling of an entire training corpus, only a few examples should be selected for manual verification prioritized with a:
Human-produced training labels are dirty. The old adage “Garbage In, Garbage Out” applies well to data science. Training and validating against human-produced labels can culminate in misleading, dirty results.
Jaxon internalizes label quality:
Noise reduction techniques help utilize Gold labels to improve the training utility of Silver and Bronze labels. Jaxon strategically utilizes different label classes to maximize the signal available throughout all available data while minimizing the noise in Silver and Bronze labels.
Pretraining fits parameters (weights) to a large, unlabeled dataset in order to gain general skills in some problem domain like computer vision or NLP. Once initial training has been completed, it is then reused to solve other specific tasks within the same problem domain. This comprises a form of transfer learning, allowing a large amount of unlabeled data to support a much smaller labeled dataset.
To solve a target task, pretrained models can either be fine-tuned on datasets particular to the new task at hand (e.g. classification), or transformed wherein the pretrained model remains fixed and its output is used as the input to a new model trained on the new dataset. When fine-tuning, the pretrained model parameters are treated as initial values for the new task-specific model. When transforming, the output of the pretrained model serves as abstracted feature vectors when training the separate, and usually different, model that solves the new task.
In addition to utilizing pretrained language models like the BERT family, Jaxon enables refinement of a pretrained model by continuing pretraining with problem-specific datasets. This task-specific pretraining helps extract the nuances of domain-specific language: specialized vocabulary, syntax, and semantics. This enhances the learning speed and raises the ultimate performance ceiling for a neural model.
Custom Training Schedules
Deep Neural Networks are trained over several epochs. During each epoch, the model processes each example in the training dataset and a loss function is calculated. Typically, training context remains the same throughout all epochs: parameters, dataset, loss function, layer management, feature vocabulary, etc. When the loss has converged, training stops.
Jaxon implements custom multi-stage Training Schedules that support changing context:
How you train is just as important as what you train.
Analyze results with Jaxon. Model evaluation begins with an F1 score, but with Jaxon, users can visualize the data, identify areas of weakness, and iterate to improve performance.
© Copyright 2021. All rights reserved.