A Comparison of the Vector Space Model Method and Winnowing Algorithm to Measure the Similarity of Documents

Eva Y Puspaningrum, Budi Nugroho, Firza Prima A

Abstract


The growth of information and communication technology has increased significantly from year to year. The issue that is developing now is the number of documents that are copied and paste. The amount of text data is constantly increasing in cyberspace so that everyone can easily find the documents they need. Because of these problems, measuring the similarity of the two documents is necessary and is fundamental to detecting plagiarism from many different documents. In this work, we would like to compare the effectiveness of the algorithm used to measure the similarity between two documents. Winnowing and SVM algorithms are widely used to compare documents because the plot is easy to understand and easy to use. The Experiment Result, we can find that the performance of fingerprints and winnowing is better than VSM. Moreover, the winnowing algorithm is more stable than others.


Keywords: Vector Space Model, Winnowing, Similarity of Documents


Full Text:

PDF

Refbacks

  • There are currently no refbacks.



Proceeding International Conference on Information Technology and Business (ICITB) is abstracting and indexing in the following databases:


PROCEEDING INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND BUSINESS

Managing By: Lembaga Penelitian dan Pengabdian kepada Masyarakat (LPPM)

Publisher: Institut Informatika dan Bisnis Darmajaya
Address: Jl. Z.A. Pagar Alam No. 93 Gedong Meneng, Bandar Lampung Lampung
Website: jurnal.darmajaya.ac.id

Email: ProceedingICITB@darmajaya.ac.id


 

Creative Commons License

IC-BITERA is licensed under a Creative Commons Attribution 4.0 International License.