Abstract:
The current circumstances of the Arab world have provided bloggers and commenters with various subjects to
discuss. Therefore, Arabic-generated content in social media is ramping up continuously. An informal written
form of spoken Arabic called Arabizi has recently emerged as a commonly used language in the Arabic space,
attracting great interest for sentiment analysis tasks. However, only a few sentiment resources exist, and
state-of-the-art language models such as BERT and FastText do not consider Arabizi yet. This paper presents
the first version of ArabiziVec, a set of pre-trained distributed word representations. ArabiziVec provides six
different word embedding models to deal with Arabizi sentiment analysis challenges. The presented work
surpasses all of the baseline sets for each experiment, regardless of whether the test set is from a previously
published dataset or an extracted one. To the best of our knowledge, this is one of the first few resources that
deals with Arabizi content and semantics in the context of sentiment analysis