Language Identification and Context-based Analysis of Code-switching Behaviors in Social Media Discussions
No Thumbnail Available
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Social media discussions see the participation of multilingual individuals: who tend to utilize alternate languages in a single post (code-switching) for effective communication in a discussion. This paper attempts to characterize such discussions to analyze contextual factors related to multilingual communities. Features extracted from the posts are used to train a CRF-based sequence labeling algorithm for language identification in an intra-sentential code-switching scenario. The context of a sentence in a discussion is modeled in defining relevance through Term Frequency Inverse Document Frequency (TF-IDF). Further context of a multilingual sentence with respect to the discussion such as agreement and questioning between pairs of posts is also modeled.
Description
Keywords
Computer Science, Code-switching, Data mining, Language identification, CRF