Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
35 results
Search Results
Item Composite Sequential Modeling for Identifying Fake Reviews(De Gruyter, 2018-04) Sharma, YashvardhanThis paper presents a comprehensive analysis and comparison of various proposed sequential models based on different deep networks such as the convolutional neural network, long short-term memory, and recurrent neural network. The different sequential models are analyzed based on the number of layers, the number of output dimensions, order, and the combination of different deep network architectures. The proposed approach is compared to a baseline model based on traditional machine learning techniques.Item Neural Network-Based Architecture for Sentiment Analysis in Indian Languages(De Gruyter, 2018-06) Sharma, YashvardhanSentiment analysis refers to determining the polarity of the opinions represented by text. The paper proposes an approach to determine the sentiments of tweets in one of the Indian languages (Hindi, Bengali, and Tamil). Thirty-nine sequential models have been created using three different neural network layers [recurrent neural networks (RNNs), long short-term memory (LSTM), convolutional neural network (CNN)] with optimum parameter settings (to avoid over-fitting and error accumulation). These sequential models have been investigated for each of the three languages. The proposed sequential models are experimented to identify how the hidden layers affect the overall performance of the approach. A comparison has also been performed with existing approaches to find out if neural networks have an added advantage over traditional machine learning techniques.Item Language Identification and Context-based Analysis of Code-switching Behaviors in Social Media Discussions(IEEE, 2019) Sharma, YashvardhanSocial media discussions see the participation of multilingual individuals: who tend to utilize alternate languages in a single post (code-switching) for effective communication in a discussion. This paper attempts to characterize such discussions to analyze contextual factors related to multilingual communities. Features extracted from the posts are used to train a CRF-based sequence labeling algorithm for language identification in an intra-sentential code-switching scenario. The context of a sentence in a discussion is modeled in defining relevance through Term Frequency Inverse Document Frequency (TF-IDF). Further context of a multilingual sentence with respect to the discussion such as agreement and questioning between pairs of posts is also modeled.Item HADCLEAN: A hybrid approach to data cleaning in data warehouses(IEEE, 2012) Challa, Jagat Sesh; Sharma, YashvardhanData cleaning is an essential step in populating and maintaining data warehouses. Owing to likely differences in conventions between the external sources and the target data warehouse, as well as due to a variety of errors, data from external sources may not conform to the standards and requirements at the data warehouse. Therefore, data has to be transformed and cleaned before it is loaded into the warehouse so that downstream data analysis is reliable and accurate. This is usually accomplished through an Extract-Transform-Load (ETL) process. Typical data cleaning tasks include record matching, de-duplication, and column segmentation which often go beyond traditional relational operators. This has led to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. Data cleansing is the first step, and most critical, in a Business Intelligence (BI) or Data Warehousing (DW) project, yet easily the most underestimated. T. Redman [1] suggests that the cost associated with poor quality data is about 8-12% of the revenue of a typical organization. Thus, it is very significant to perform data cleaning process for building any enterprise data warehouse.Item Thin Servers for the Internet of Things(SSRN, 2019-03) Sharma, YashvardhanThe Internet of Things involves a lot of server/gateway solutions, specifically for home automation. A specific Constrained Application Protocol has been developed by the IETF to provide for machine-to-machine communication in these solutions. However the communication paths defined by these solutions using the protocol have room for improvement and currently subject the server to a lot more load than necessary. In these paths all the sensor and actuator communication travels via the cloud gateway. Better paths exist directly between the sensor and the actuator. The work done involves a solution in which sensors and actuators are bound together while the cloud gateway initiates the interaction and it also involves the concepts of RESTlets which enable actuator to do some form processing on the sensor input hence keeping the interaction of the cloud gateway to a minimum, thereby reducing the load on it and making it thinner.Item FAID: Feature Aftermath for Irony Discernment(IEEE, 2019-03) Sharma, YashvardhanThis paper deals with the impediment of identifying sarcasm in social media text which can be used to improve sentiment analysis technique. After thorough analysis, some features were identified which could help in recognition of sarcasm. In state of art, features have been extracted from the data set which embraced standalone sentences. Proposed algorithm analyzes the impact of these features and a combination of them on the review data set in which reviews had three or more sentences, so that context of sentence is also taken into consideration by the machine before classifying a review.Item FAID: Feature Aftermath for Irony Discernment(IEEE, 2019) Sharma, YashvardhanThis paper deals with the impediment of identifying sarcasm in social media text which can be used to improve sentiment analysis technique. After thorough analysis, some features were identified which could help in recognition of sarcasm. In state of art, features have been extracted from the data set which embraced standalone sentences. Proposed algorithm analyzes the impact of these features and a combination of them on the review data set in which reviews had three or more sentences, so that context of sentence is also taken into consideration by the machine before classifying a review.Item Deep Learning Approaches for Question Answering System(Elsevier, 2018) Sharma, YashvardhanQuestion Answering (QA) System is very useful as most of the deep learning related problems can be modeled as a question answering problem. Consequently, the field is one of the most researched fields in computer science today. The last few years have seen considerable developments and improvement in the state of the art, much of which can be credited to upcoming of Deep Learning. In this paper, a discussion about various approaches starting from the basic NLP and algorithms based approach has been done and the paper eventually builds towards the recently proposed methods of Deep Learning. Implementation details and various tweaks in the algorithms that produced better results have also been discussed. The evaluation of the proposed models was done on twenty tasks of babI dataset of Facebook.Item Encoder-Decoder Architectures for Generating Questions(Elsevier, 2018) Sharma, YashvardhanWith exploding textual data on the internet with e-books, legal documents and products information, it is an opportunity to harness it for applications which can aid human tasks. Developing systems for question generation can be used for making frequently-asked-questions, creating school quiz-es and serve for the purpose of unified AI. Here in this study various encoder decoder architectures for generating questions from text inputs have been explored using Stanford’s SQuAD dataset as for training development and test sets and evaluation metrics such as BLEU, ROUGUE and training time were used to compare the effectiveness of the models. The article develops upon the work of current end-to-end system by using gated recurrent unit in place of long short term memory which give similar accuracy but with lesser training time, further it also show the successfully use of a convolution based encoder for this task which gives results comparable to current state of the art system with much lesser training time.Item Composite Sequential Modeling for Identifying Fake Reviews(De Gruyter, 2018-04) Sharma, YashvardhanThis paper presents a comprehensive analysis and comparison of various proposed sequential models based on different deep networks such as the convolutional neural network, long short-term memory, and recurrent neural network. The different sequential models are analyzed based on the number of layers, the number of output dimensions, order, and the combination of different deep network architectures. The proposed approach is compared to a baseline model based on traditional machine learning techniques.