Efficiently tuning topic models

This project was created for the 2020 Nashville Analytics Summit hosted by the Nashville Technology Council.

Abstract: In natural language processing, topic models are used to extract meaningful and human-interpretable topics from a corpus. However, tuning topic models for large corpora can be time consuming and computationally expensive. By monitoring topic coherence as a function of corpus size, we can determine how to efficiently create a high quality topic model. In this project, we will demonstrate this technique using the English Wikipedia corpus.

View my work on my Github repository.

Watch my virtual presentation at the 2020 Nashville Analytics Summit.