I rate all the movies I watch on IMDb and the website allows you to download a nice .csv with all your ratings. This .csv contains basic information about the movies. In order to perform topic modeling, I need the plots and/or summaries of the movies. I will grab this information from Wikipedia and use it to enrich the IMDb dataset. Then I will perform LDA for topic modeling on the plots+summaries of the movies to find 6 topics. The purpose of this project is to:
- Use Wikipedia to grab movies and more specifically their Summaries and Plots.
- Merge IMDb data with Wikipedia.
- Build, Evaluate and Visualize an LDA model