Browsing by Author "Mitra, Satanik"

Now showing 1 - 9 of 9

An Approach to Identify Provocative and Problematic Content with Social Nociceptor
(ACM Digital Library, 2021-01) Mitra, Satanik
In recent times the gradual decrease in the confidence of mainstream media among the masses is observable. As per a survey, 36% of US citizens trust news organizations for delivering factually correct and straight information with respect to 54% in mid-1989 [1]. In this post-trust era, the news is motivated by individual belief and emotions, depriving the true information. In such a scenario “stories of uncertain provenance or accuracy”, accepted by people as fact [2], [3]. Inaccurate or compromised news carries potential threats towards communal harmony, political motives and hatred among different cultural or religious communities. Zimbars, (2016) creates a bibliography of websites along with 11 categories of misleading or fake news which includes – hate news, junk science, politics etc. However, the automatic detection of such misleading information is expected in today's world. Human psychology reveals the effects of confirmation bias, which is an inclination to process information in conformity with individuals’ preconception. Social influence is another factor which results in herd behaviour [4], [5]. These notions varies across culture, religion, education, economic statuses and languages. Hence, precise categorization of topics that is utterly sensitive for the people will be effective. We term such topics as “social nociceptors”. Nociceptors is a biological term, represents a type of sensory neuron to signal the damage happens in the body, externally. We extend this term in social context.
Experiments on Fraud Detection use case with QML and TDA Mapper
(IEEE, 2021-11) Mitra, Satanik
In the era of online financial transactions, it is significant for the credit card firms to be equipped with capabilities to identify fraudulent credit card transactions. This work covers study and implementation of two approaches for developing a credit card fraud detection model. First one, with hybrid quantum neural networks. In recent times, Quantum Computers (QC) are making their footprints into AI/ML domain. Quantum neural networks (QNN) hybrid with classical neural net has been used in various tasks such as – natural language processing, image processing etc. The second approach is with Topological Data Analysis (TDA). Finding topological structure in the input data also become relevant from the perspective of noise reduction. The visualization capabilities of TDA can become an aid in classification of credit card fraud as well. TDA is implemented with mapper based method here. In hybrid QNN, we are covering a reference implementation of Xanadu’s StrawberryFields, where a classical network processes the input to be fed into a QNN model. Although technique wise these two approaches are drastically different, for the sake of generalization we implement TDA and hybrid QNN with a publicly available credit card fraud detection dataset. We tested with balanced fraud and genuine features and hybrid QNN model provides accuracy of 89.5%, whereas TDA mapper with our novel approach of classification provides an accuracy of 94%.
Helpfulness of online consumer reviews: A multi-perspective approach
(Elsevier, 2021-05) Mitra, Satanik
Helpful online reviews crave the attention of many researchers as it significantly affects purchase decision. However, consumers’ perception of helpfulness remains an open problem due to a lack of semantic analysis of review content and unreliable voting mechanism. In this work, we propose three qualitative perspectives considering both semantic and syntactic features of review content - lexical, sequential and structural to assess helpfulness. N-gram based semantic relation among words is explored with a d-CNN model, to predict helpfulness from lexical perspective. Sequential perspective is analysed with LSTM model, which predict helpfulness by comprehending sequence of words. Structural perspective is addressed with fourteen syntactic statistical features and predict helpfulness of review. These three models of qualitative perspective trained with “X of Y” ratio of helpfulness voting. Now, to decimate the unreliability of helpfulness voting mechanism and unveil the human perception of helpfulness, the manual scoring approach is implemented over a sample of reviews. With experimentation, we show that there exists a linear relationship among the perspectives with the human perceived helpfulness score. It is observed that all these perspectives have an impact on consumers’ perception of helpfulness of a review. Five different product category of a benchmark dataset has been used for experimentation. A sample of 2000 reviews from five different categories has been used for human scoring of helpfulness. Finally, we estimate the weights of each of the perspectives of consumers’ perception of helpfulness from online reviews and discuss the significant theoretical and practical implications.
Horie: helpfulness of online reviews with improved embedding
(Springer, 2024-07) Mitra, Satanik
Consumer review helpfulness has a significant role in purchase decision making in an online shopping environment. Deep learning modules with pre-trained word embeddings are predominantly used to asses review helpfulness. Pre-trained word embeddings are trained on generic corpora and lack in incorporating domain knowledge and sentiment information of a word. Moreover, pre-trained embeddings fail to capture the subtle change of semantics of same word with different parts of speech. In this work, we propose HORIE (Heplfulness of Online Reviews with Improved Embedding) which improve pre-trained embedding with domain, sentiment and parts of speech information and analyse helpfulness as classification problem. In HORIE, domain knowledge is acquired from domain specific corpora. The average of pre-trained and domain specific embedding is combined with vectorized sentiment information, extracted from lexical dictionaries, along with POS tag information. Later, we apply a dual CNN based model for classification of reviews. HORIE is tested with five different domain and compare our performance with existing embeddings. We also compare our approach with handcrafted feature sets and existing helpfulness classification technique. AUROC is used as a metric. Our approach shows improvement over existing approaches.
Hybrid Improved Document-level Embedding (HIDE)
(ARXIV, 2020-06) Mitra, Satanik
In recent times, word embeddings are taking a significant role in sentiment analysis. As the generation of word embeddings needs huge corpora, many applications use pretrained embeddings. In spite of the success, word embeddings suffers from certain drawbacks such as it does not capture sentiment information of a word, contextual information in terms of parts of speech tags and domain-specific information. In this work we propose HIDE a Hybrid Improved Document level Embedding which incorporates domain information, parts of speech information and sentiment information into existing word embeddings such as GloVe and Word2Vec. It combine improved word embeddings into document level embeddings. Further, Latent Semantic Analysis (LSA) has been used to represent documents as a vectors. HIDE is generated, combining LSA and document level embeddings, which is computed from improved word embeddings. We test HIDE with six different datasets and shown considerable improvement over the accuracy of existing pretrained word vectors such as GloVe and Word2Vec. We further compare our work with two existing document level sentiment analysis approaches. HIDE performs better than existing systems.
OBIM: A computational model to estimate brand image from online consumer review
(Elsevier, 2020-06) Mitra, Satanik
Brand image is comprehended in consumers’ mind through favourability, strength, and uniqueness of brand associations. In this paper, a model is proposed to quantify Online Brand IMage (OBIM) from consumer reviews. We consider the product aspects as a brand association. Natural language processing techniques are used to extract those associations. Favourability, strength, and uniqueness of the extracted associations are computed using sentiment and co-word network analysis. Finally, the multiplicative sum of these values considers as the OBIM score. It can be used as a measure of consumer perception, which apprehends the relation between the association and their changes over time. The proposed model is demonstrated using a dataset of five mobile phones crawled from Amazon. Two applications of OBIM score, Association Based SWOT analysis and Senti-Concept Mapper technique to discover hidden concepts, are proposed. It shows how these techniques can support the decision-making process of marketers.
Sarcasm Detection in News Headlines using Supervised Learning Publisher: IEEE PDF
(IEEE, 2022) Mitra, Satanik
Nowadays, social media has an enormous amount of news content with a sarcastic message. It is often expressed in the form of verbal and non-verbal. In this paper, the authors aim to identify sarcasm in news headlines using supervised learning. We address this task with the Bag-of-words features, context-independent features, and context-dependent features. Specifically, the authors employ seven supervised learning models, namely, Naïve Bayes-support vector machine, logistic regression, bidirectional gated recurrent units, Bidirectional encoders representation from Transformers (BERT), DistilBERT, and RoBERTa. Our experimental results indicate that RoBERTa achieves a better performance than others.
SentiCon: A Concept Based Feature Set for Sentiment Analysis
(IEEE, 2018) Mitra, Satanik
Selection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results.
Suicidal Intention Detection in Tweets Using BERT-Based Transformers
(IEEE, 2022) Mitra, Satanik
Suicidal intention or ideation detection is one of the evolving research fields in social media. People use this platform to share their thoughts, tendencies, opinions, and feelings toward suicide. Therefore, this task becomes a challenging one due to the unstructured and noisy texts. In this paper, we propose five BERT-based pre-trained transformer models, namely, BERT, DistilBERT, ALBERT, RoBERTa, and DistilRoBERTa, for the task of suicidal intention detection. The performance of these models evaluated using the standard classification metrics. Specifically, we use the one-cycle learning rate policy to train all models. Our results show that the RoBERTa model achieves a better performance than other BERT-based models. The model gains 99.23%, 96.35%, and 95.39% accuracy for training, validation, and testing, respectively.