Mentor
Zubair Shafiq
Participation year
2017
Project title

Spun document

Abstract

The continuous battle between spammers and search engine has been going on for some time. Spammers have now resorted to using software spinning to conceal their articles from plagiarism detectors. Text rewriting is a unique writing technique used in Search Engine Optimization (SEO) and in other application. Article spinning user's spin the articles to lessen the similarity ratio associated with redundant pages or pages with the minimal content material. Spinning is done by automatically reconstructing sentences and substituting words and phrases with their synonyms. Earlier work on finding spun content is limited because of its dependence knowledge about the dictionary utilized by the spinning software. With this work, there was a two-step approach to detect spun content and its seed without relying on a dictionary reference. The Spinning software introduces lexical and stylometric artifacts in spun documents that influence using intrinsic evaluation to identify them. We then make use of extrinsic analysis to identify the original source of spun documents. We implement and assess our proposed method on a corpus of spun documents that are created utilizing popular text re-writing software. Through identifying spun articles without any dictionary reference, this research highlights the experiments conducted in identifying spun documents and their seed articles. We have findings that prove to be successful throughout several articles rewriting software. Documents were spun using Spinnerchief in this experiment to illustrate the experiments accuracy.

Gabriel Akanni
Education
Towson University