Term Frequency – Inverse Document Frequency (TFIDF)

A technique used to represent text data by considering the frequency of words in a document relative to their occurrence across multiple documents. In TFIDF, each word’s importance is determined by two factors: its frequency in the current document (Term Frequency) and its rarity across all documents (Inverse Document Frequency). This normalization process ensures that common words are downweighted while rare, contextually significant words are upweighted, resulting in a more informative representation of the text data. TFIDF is widely used in information retrieval, text mining, and natural language processing tasks to improve the relevance and discriminative power of features extracted from text.