Ajou University repository

Machine learning-based advertisement banner identification technique for effective piracy website detection processoa mark
Citations

SCOPUS

4

Citation Export

Publication Year
2022-01-01
Publisher
Tech Science Press
Citation
Computers, Materials and Continua, Vol.71, pp.2883-2899
Keyword
Advertisement bannersCopyright infringementMachine learningOnline advertisementPiracy website detectionSupport vector machineWord embeddingWord2vec
Mesh Keyword
Advertisement bannerCopyright infringementDetection processEmbeddingsIdentification techniquesOnline advertisementsPiracy website detectionSupport vectors machineWord embeddingWord2vec
All Science Classification Codes (ASJC)
BiomaterialsModeling and SimulationMechanics of MaterialsComputer Science ApplicationsElectrical and Electronic Engineering
Abstract
In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies. To do so, an effective detection mechanism is a necessary first step. Some researchers have used various approaches to analyze the possible common features of suspected piracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites. They usually encompass keywords such as click-words (words that redirect to install malicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key features in the process of successfully detectingwebsites involved in the act of copyright infringement. Research has been conducted to identify advertisements served on suspected piracy websites. However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords. This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites. Therefore, we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites. We have done this by leveraging the power of machine learning algorithms, particularly the support vector machine with the word2vec word-embedding model. After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracywebsites and 90 normal or legitimate websites, we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%. We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches. To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites.
Language
eng
URI
https://dspace.ajou.ac.kr/dev/handle/2018.oak/32420
DOI
https://doi.org/10.32604/cmc.2022.023167
Fulltext

Type
Article
Funding
Funding Statement: This research project was supported by the Ministry of Culture, Sports, and Tourism (MCST) and the Korea Copyright Commission in 2021(2019-PF-9500).
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

KWAK, JIN Image
KWAK, JIN곽진
Department of Cyber Security
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.