Synthetic Data Generator

Data isn’t always cheap—sometimes it’s time-consuming or costly to obtain. The Flask helps sidestep this problem by generating synthetic data from what you already have.

The Flask takes your real data and uses an array of heuristic and generative deep learning methods to create synthetic data. Generative AI can be used to train custom machine learning models by generating large amounts of diverse and realistic training data.

The Flask’s training data helps improve the accuracy of your ML models. When you save time with data generation, more effort can be devoted to improving your model.

Free-Form Text Augmentation

screenshot of the flask synthetic data creator: text augmentation

LM (Language Model) Text Generation

Jaxon uses a modular generative AI system based on large language models to generate new examples in the style of an existing dataset.

Frequent Terms

Replaces a percentage of words based on how frequently the words appear in the dataset. TF-IDF (term frequency–inverse document frequency) is used.

Synonyms

Replaces words in the original example with synonyms.

Random Words

Replaces a word from the example with another word selected at random from the same corpus of text.

Tabular Data Augmentation

VAE (Variational Autoencoder)

 Compresses existing data and expands it again using deep learning. The reconstruction is purposefully noisy, creating variation.

Gaussian STDEV (Standard Deviation)

Sets the maximum distance (standard deviation) from the original value. The higher the standard deviation, the greater the allowable difference between the original and generated values.

Gaussian Noise

Changes numerical values, assuming that the distribution of existing values fits a bell curve.

Categorical

Changes a categorical value to another used in the dataset (e.g. in Countries, changing “United States” to “Canada”).

screenshot of the flask synthetic data creator: tabular augmentation

More about Synthetic Data

Greater outputs with fewer inputs​.​

Jaxon rivaled state-of-the-art with over 1000x fewer manually labeled examples.

Greg Harman, CTO of Jaxon, shares his thoughts on synthetic data, tradeoffs between noise and quantity, and how to improve your models.

Learn how to augment data in Jaxon with the Flask, and read a breakdown of the techniques that the Flask uses to augment data.