Abstract:
With a boom in the internet, the social media text had been
increasing day by day and the user generated content (such
as tweets and blogs) in Indian languages are written using
Roman script due to various socio-cultural and technological
reasons. A majority of these posts are multilingual in nature
and many involve code mixing where lexical items and gram-
matical features from two languages appear in one sentence.
Focusing on this current multilingual scenario, code-mixed
cross-script (i.e., non-native script) data gives rise to a new
problem and presents serious challenges to automatic Ques-
tion Answering (QA) and for this question classi cation will
be required which is an important step towards QA. This
paper proposes an approach to handle cross script question
classi cation as it is an important task of question analysis
which detects the category of the question.