Covariate Drift & SmartSplit Technology
As we go through the process of training machine learning models and start thinking about how we’re going to prepare our training data, one of the first questions that always comes to mind is “How am I going…
As we go through the process of training machine learning models and start thinking about how we’re going to prepare our training data, one of the first questions that always comes to mind is “How am I going…
Jaxon’s patent-pending SmartSplit Technology is a proprietary means of splitting a dataset (e.g. into training and holdout datasets) in such a way as to avoid covariate drift and other latent differences between those datasets. Specifically, it aims to improve…
The Jaxon team hosted a webinar covering Data Augmentation where we discussed what data augmentation is, how it works for images, how it works for text, and why text augmentation is so much more difficult than image augmentation:…
Let's break applied machine learning problems down into simple taxonomy. These problems can be classified according to a (non-exhaustive) two-dimensional model: data type and problem type: Data TypesText (Natural Language)ImagesTabularVideoSensor/Time-SeriesProblem TypesClassificationRegressionInformation ExtractionTransformation (seq2seq, generative)We can even plot the intersection…
When people first learn about Jaxon, a common question is how we are able to train a model to produce data that will train a better model. Isn't that first model already the model we want if it…
In an ideal world, machine learning pipelines would build themselves. As it sits though, this tedious process currently falls on the shoulders of data scientists and engineers. As noted in “The Gotchas of ML/NLP”, the key to successfully…
Once you've used Jaxon to label your training set and you're ready to embark on training classifiers, it would seem that you’re in the home stretch and there’s not much more work to be done. However, to obtain a high-performing classifier,…
I first read about Lean Thinking in the book The Machine That Changed the World, based on MIT’s $5M, five-year study on the future of auto manufacturing. The concept of ‘lean’ embraces ideas like just-in-time delivery, elimination of waste…
We had hubris a couple years ago, thinking you can create accurate machine learning (ML) models completely unsupervised. Turns out, some human supervision really is needed. Just enough human knowledge to train the model properly. The trick is…
Building machine learning models can be exhilarating - finding that optimal combination of technologies and piecing them together into a final, smooth end result gives a unique sense of accomplishment that only data scientists and engineers really understand.…