View Article |
Identifying document-level text plagiarism: a two-phase approach
Vani, K1, Deepa Gupta2.
The rapid evolution of information content and its ease of access have made the
field of research and academia so vulnerable to plagiarism. Plagiarism is an act
of intellectual theft and information breach which must be restricted to ensure
educational integrity. Usually in plagiarism checking, exhaustive document
comparisons with large repositories and databases have to be done. The paper
presents a two phase document retrieval approach which can effectively reduce
the search space for plagiarism detection task. An initial heuristic retrieval
process is carried out before the actual exhaustive analysis to retrieve the
globally similar documents or the near duplicates corresponding to the
suspected document. The work proposes a two phase candidate retrieval
approach for an offline plagiarism detection system that can identify the
plagiarized sources having different complexity levels. This means that we
already have the source document data base offline and hence the work is not
focusing on online source retrieval. It explores and integrates the prospective
aspects of document ranking approaches with vector space model in first phase
and N-gram models in second phase for candidate refinement stage. The
proposed approach is evaluated on the standard plagiarism corpus provided by
PAN-14 text alignment data set and the efficiency is analyzed using the
standard IR measures, viz., precision, recall and F1-score. Comparison is done
with the vector space model and N-gram models to analyse the performance
efficiency. Further statistical analysis is done using paired t-test with means of
F1-scores of these techniques over the samples extracted from the PAN-14 set.
Experimental results show that the proposed two phase candidate selection
approach outperforms the compared models specifically when it comes to
comparison and retrieval of complex and manipulated text.
Affiliation:
- Amrita University, India
- Amrita University, India
Download this article (This article has been downloaded 38 time(s))
|
|
Indexation |
Indexed by |
MyJurnal (2019) |
H-Index
|
0 |
Immediacy Index
|
0.000 |
Rank |
0 |
Indexed by |
Scopus (SCImago Journal Rankings 2016) |
Impact Factor
|
- |
Rank |
Q3 (Engineering (miscellaneous)) |
Additional Information |
0.193 (SJR) |
|
|
|