Abstract:
Paraphrasing means expressing or conveying the same mean-
ing or essence of a sentence or text using different words or
rearrangement of words. Paraphrase detection is a chal-
lenge, especially in Indian languages like Hindi, because it is
very essential to understand the semantics of the language.
Detecting paraphrases is very relevant in real life because
it has a lot of importance in applications like Information
Retrieval, Extraction and Text Summarization. This paper
focuses on using Machine Learning classification techniques
for detecting paraphrases in Hindi language for the DPIL
Task in Fire 2016. A feature vector based approach has been
used for detecting paraphrases. The task involves checking
whether a given pair of sentences conveys the same informa-
tion and meaning even if they are written in different forms.
Given a pair of sentences in Hindi, the proposed technique
labels whether the pair of sentences are Paraphrases (P),
Semi-Paraphrases (SP) or Not Paraphrases (NP)