Synthetic Data: Noise vs. Quantity

Using generative AI, synthetic data is artificially created from real datasets, both to expand them in general and to acquire more long tail data points. This lets you augment your machine learning models without being restricted by data collection. However, adding more synthetic data also adds more noise, which leads to poorer models.

Greg Harman, CTO of Jaxon, shares his thoughts on how to balance the tradeoffs between noise and quantity.

RAG is NOT Enough