Let’s break applied machine learning problems down into simple taxonomy. These problems can be classified according to a (non-exhaustive) two-dimensional model: data type and problem type:
Text (Natural Language)
Transformation (seq2seq, generative)
We can even plot the intersection of these dimensions and derive some common use cases:
Intent Detection, Spam Detection
How cute is that cat?
Where is the cat?
Deepfakes (I’m really a dog person)
This is a good start, but real-world problems (the kind you get paid to solve) tend to be more complex than these individual cells. Counter-intuitively, we need to collapse these dimensions down in order to add complexity.
Multimodal models help us collapse and combine datasets. Are we really confined to a single data type? Much more likely, raw data exists in many forms and the data types we’ve identified in our first dimension is more a reflection of tactical computational concerns than of the actual business problem at hand. Example: if I am attempting to perform spam detection, wouldn’t it be useful to look beyond the email body and take a peek at the headers?
Solution graphs can help with the other dimension. While these basic machine learning tasks are useful building blocks, they rarely translate directly into the needs of real-world applications and business problems. Composing complex graphs from basic machine learning tasks yields higher-level solutions that more closely map to direct needs. Example: a basic chatbot needs to perform both intent detection (classification) and slot detection (information extraction). More sophisticated chatbots might use an ensemble of specialized classifiers, slot detectors, and transformations (translations, text generation, etc.)
Ultimately, machine learning tasks are low-level building blocks. Solving complex, real-world problems requires unifying these blocks across the dimensions of data type and task.
– Greg Harman, CTO