Abstract:
Twitter has grown into a vast network of small informal text, and navigating it often becomes difficult for us. Here, we explore Natural Language Processing (NLP) approaches to make the topic classification of tweets easier. We do so with the use case for filtering non-profit tweets among different categories which are arranged in a hierarchy. This paper proposes an efficient pipeline for filtering relevant tweets and a novel data augmentation strategy for sparse datasets. Our data augmentation technique shows a significant leap in the training metrics and the accuracy on the test data increases by 9.52% and the F1-score by 24.82%.