Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
8 results
Search Results
Item FAID: Feature Aftermath for Irony Discernment(IEEE, 2019) Sharma, YashvardhanThis paper deals with the impediment of identifying sarcasm in social media text which can be used to improve sentiment analysis technique. After thorough analysis, some features were identified which could help in recognition of sarcasm. In state of art, features have been extracted from the data set which embraced standalone sentences. Proposed algorithm analyzes the impact of these features and a combination of them on the review data set in which reviews had three or more sentences, so that context of sentence is also taken into consideration by the machine before classifying a review.Item Encoder-Decoder Architectures for Generating Questions(Elsevier, 2018) Sharma, YashvardhanWith exploding textual data on the internet with e-books, legal documents and products information, it is an opportunity to harness it for applications which can aid human tasks. Developing systems for question generation can be used for making frequently-asked-questions, creating school quiz-es and serve for the purpose of unified AI. Here in this study various encoder decoder architectures for generating questions from text inputs have been explored using Stanford’s SQuAD dataset as for training development and test sets and evaluation metrics such as BLEU, ROUGUE and training time were used to compare the effectiveness of the models. The article develops upon the work of current end-to-end system by using gated recurrent unit in place of long short term memory which give similar accuracy but with lesser training time, further it also show the successfully use of a convolution based encoder for this task which gives results comparable to current state of the art system with much lesser training time.Item Bits_Pilani@INLI-FIRE-2017:Indian Native Language Identification using Deep Learning(CEUR, 2017) Sharma, YashvardhanThe task of Native Language Identification involves identifying the prior or first learnt language of a user based on his writing technique and/or analysis of speech and phonetics in second language. There is a surplus of such data present on social media sites and organised dataset from bodies like Educational Testing Service(ETS), which can be exploited to develop language learning systems and forensic linguistics. In this paper we propose a deep neural network for this task using hierarchical paragraph encoder with attention mechanism to identify relevant features over tendencies and errors a user makes with second language for the INLI task in FIRE 2017. The task involves six Indian languages as prior/native set and english as the second language which has been collected from user's social media account.Item Catchphrase Extraction from Legal Documents Using LSTM Networks(CEUR, 2017-12) Sharma, YashvardhanLegal texts usually have a complex structure and reading through them is a time-consuming and strenuous task. Hence it is essential to provide the legal practitioners a concise representation of the text. Catchphrases are those phrases which state the important issues present in the text, thus effectively characterizing it. This paper proposes an approach for the subtask 1 of the task IRLed (Information Retrieval from Legal Documents), FIRE 2017. The proposed algorithm uses a three step approach for extracting catchphrases from legal documents.Item Named Entity Recognition for Code Mixing in Indian Languages using Hybrid Approach(CEUR, 2016-12) Sharma, YashvardhanAutomating the process of Named Entity Recognition has received a lot of attention over past few years in Social Media Text. Named Entities are real world objects such as Person, Organization, Product, Location. Identifying these entities in social media text is an important challenging task due the informal nature of text present on social media. One such challenge that is faced in recognizing named entities in Indian Social Media Text is Code Mixing. Code Mixing is usage of more than one language in a sentence. Being a multilingual country, people of India tend to know more than one language, which in turn results in the code mixing of text while expressing their opinions. This paper describes the proposed approach for shared task CMEE-IL (Code Mix Entity Extraction in Indian Language), FIRE 2016. Pro- posed algorithm uses a hybrid approach of a dictionary cum supervised classi cation approach for identifying entities in Code Mix Text of Indian Languages such as Hindi- English and Tamil-English.Item Sentiment analysis for mixed script Indic sentences(IEEE, 2016) Sharma, YashvardhanIndia is a multi-lingual and multi-script country. Developing natural language processing techniques for Indic languages is an active area of research. With the advent of social media, there has been an increasing trend of mixing different languages to convey thoughts in social media text. Users are more comfortable in their regionalistic language and tend to express their thoughts by mixing words from multiple languages. In this paper, we have attempted to develop a system for mining sentiments from code mixed sentences for English with combination of four other Indian languages (Tamil, Telugu, Hindi and Bengali). Due to the complex nature of the problem the technique used is divided into two stages, viz Language Identification and Sentiment Mining Approach. Evaluated results are compared to baseline obtained from machine translated sentences in English, and found to be around 8% better in terms of precision. The proposed approach is flexible and robust enough to handle additional languages for identification as well as anomalous foreign or extraneous words.Item Query Labelling for Indic Languages using a hybrid approach(CEUR, 2015) Sharma, YashvardhanWith a boom in the internet, social media text has been increasing day by day. Much of the user generated content on internet is written in a very informal way. Usually people tend to write text on social media using indigenous script. To understand a script different from ours is a difficult task. Moreover, nowadays queries received by the search engines are large number of transliterated text. Hence providing a common platform to deal with the problem of transliterated text becomes really important. This paper presents our approach to handle labeling of queries as part of the FIRE2015 shared task on Mixed-Script Information Retrieval. Tokens in the query are labeled on basis of a hybrid approach which involves rule based and machine learning techniques. Each annotation has been dealt separately but sequentially.Item TwiBiNG: A Bipartite News Generator Using Twitter(CEUR, 2014) Sharma, YashvardhanOnline Journalism is being seen as future of Journalism. News Professionals are vying to capture newsworthy stories that emerge from crowd. Live Social Media especially Twitter is generating enormous volumes of data every minute. It becomes difficult to select credible and relevant tweets that may form quality news among others. The problem intensifies due to the freedom of Twitter being an informal language. Generating headlines by solving this problem may still not be relevant and may face the question of authenticity. Given a set of keywords and a time period this problem becomes manageable and can be solved efficiently. We propose a bipartite algorithm that clusters authentic tweets based on key phrases and ranks the clusters based on trends in each timeslot.