Abstract:
Sentiment analysis is the interpretation and classification of emotions conveyed by text data. While there have been many attempts to classify the sentiment of a given text, there have been few models that can do the same when provided with non-English data exhibiting sarcasm or irony. This paper aims to compare various techniques of sarcasm detection and decide which method works the best for datasets of different sizes and types. The models have been tested on datasets of three different non-English languages - Arabic, French and a Hindi-English code-mix. None of the presented models are language-specific and can be run on data of any language. A comparison between a sub-word model, the usage of Term Frequency-Inverse Document Frequency (TF-IDF) and neural networks, a Long Short-Term Memory (LSTM) model and machine learning techniques such as Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Naive Bayes (NB), Support Vector Machine (SVM) Linear, SVM radial basis function (RBF), SVM Sigmoid has been performed. The output for each language and model has been evaluated based on their F1-score, accuracy, precision, and recall.