Topic Modeling on my Watched Movies

less than 1 minute read

The project is available on my GitHub and online on Medium (Analytics Vidhya)

I rate all the movies I watch on IMDb and the website allows you to download a nice .csv with all your ratings. This .csv contains basic information about the movies. In order to perform topic modeling, I need the plots and/or summaries of the movies. I will grab this information from Wikipedia and use it to enrich the IMDb dataset. Then I will perform LDA for topic modeling on the plots+summaries of the movies to find 6 topics. The purpose of this project is to:

  • Use Wikipedia to grab movies and more specifically their Summaries and Plots.
  • Merge IMDb data with Wikipedia.
  • Build, Evaluate and Visualize an LDA model

Leave a Comment