Twitter Sockpuppet Analysis with NLP
Millions of tweets. Thousands of fake accounts. A sophisticated disinformation campaign.
Millions of tweets. Thousands of fake accounts. A sophisticated disinformation campaign.
In natural language processing, topic models are used to extract meaningful and human-interpretable topics from a corpus. However, tuning topic models for large corpora can be time consuming and computationally expensive. By monitoring topic coherence as a function of corpus size, we can determine how to efficiently create a high quality topic model. In this project, we will demonstrate this technique using the English Wikipedia corpus.