Menu

TF-IDF Vectorization

TF-IDF Vectorization

Computers cannot directly understand text like humans do. Because movie genres are written as text, they must be converted into numbers before being used by a machine learning model.
TF-IDF (Term Frequency–Inverse Document Frequency) is a technique used to convert textual genre information into numerical values. It assigns importance to each genre based on how frequently it appears in a movie and how rare it is across all movies.
Using TF-IDF, each movie is represented as a numerical vector. These vectors make it possible to compare movies mathematically and measure how similar their genres are.