NexTech 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • UBICOMM 2021, The Fifteenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies
  • ADVCOMP 2021, The Fifteenth International Conference on Advanced Engineering Computing and Applications in Sciences
  • SEMAPRO 2021, The Fifteenth International Conference on Advances in Semantic Processing
  • AMBIENT 2021, The Eleventh International Conference on Ambient Computing, Applications, Services and Technologies
  • EMERGING 2021, The Thirteenth International Conference on Emerging Networks and Systems Intelligence
  • DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics
  • GLOBAL HEALTH 2021, The Tenth International Conference on Global Health Challenges
  • CYBER 2021, The Sixth International Conference on Cyber-Technologies and Cyber-Systems

SoftNet 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • ICSEA 2021, The Sixteenth International Conference on Software Engineering Advances
  • ICSNC 2021, The Sixteenth International Conference on Systems and Networks Communications
  • CENTRIC 2021, The Fourteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services
  • VALID 2021, The Thirteenth International Conference on Advances in System Testing and Validation Lifecycle
  • SIMUL 2021, The Thirteenth International Conference on Advances in System Simulation
  • SOTICS 2021, The Eleventh International Conference on Social Media Technologies, Communication, and Informatics
  • INNOV 2021, The Tenth International Conference on Communications, Computation, Networks and Technologies
  • HEALTHINFO 2021, The Sixth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing

NetWare 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • SENSORCOMM 2021, The Fifteenth International Conference on Sensor Technologies and Applications
  • SENSORDEVICES 2021, The Twelfth International Conference on Sensor Device Technologies and Applications
  • SECURWARE 2021, The Fifteenth International Conference on Emerging Security Information, Systems and Technologies
  • AFIN 2021, The Thirteenth International Conference on Advances in Future Internet
  • CENICS 2021, The Fourteenth International Conference on Advances in Circuits, Electronics and Micro-electronics
  • ICQNM 2021, The Fifteenth International Conference on Quantum, Nano/Bio, and Micro Technologies
  • FASSI 2021, The Seventh International Conference on Fundamentals and Advances in Software Systems Integration
  • GREEN 2021, The Sixth International Conference on Green Communications, Computing and Technologies

TrendNews 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • CORETA 2021, Advances on Core Technologies and Applications
  • DIGITAL 2021, Advances on Societal Digital Transformation

 


ThinkMind // International Journal On Advances in Security, volume 11, numbers 1 and 2, 2018 // View article sec_v11_n12_2018_10


Empirical Analysis of Domain Blacklists

Authors:
Tran Phuong Thao
Akira Yamada
Ayumu Kubota

Keywords: Web Security, Empirical Analysis, Blacklist, Malicious Domain, Whois Information, HTML Document, Text Mining

Abstract:
Malicious content has grown along with the explosion of the Internet. Therefore, many organizations construct and maintain blacklists to help web users protect their computers. There are many kinds of blacklists in which domain blacklists are the most popular one. Existing empirical analyses on domain blacklists have several limitations such as using only outdated blacklists, omitting important blacklists, or focusing only on simple aspects of blacklists. In this paper, we analyze the top 14 blacklists downloaded on 2017/02/28 including popular and updated blacklists like Safe Browsing from Google and urlblacklist.com. We are the first to filter out the old entries in the blacklists using an enormous dataset of user browsing history. Besides the analysis on the intersections and the registered information from Whois (such as top-level domain, domain age and country), we also build two classification models for web content categories (i.e., education, business, etc.) and malicious categories (i.e., landing and distribution) using machine learning. Our work found some important results. First, the blacklists Safe Browsing version 3 and 4 are being separately deployed and have independent databases with diverse entries although they belong to the same organization. Second, the blacklist dsi.ut capitole.fr is almost a subset of the blacklist urlblacklist.com with 98% entries. Third, largest portion of entries in the blacklists are created in 2000 with 6.08%, and from United States with 24.28%. Fourth, Safe Browsing version 4 can detect younger domains compared with the others. Fifth, Tech & Computing is the dominant web content category in all the blacklists, and the blacklists in each group (i.e., small public blacklists, large public blacklists, private blacklists) have higher correlation in web content as opposed to blacklists in other groups. Sixth, the number of landing domains are larger than that of distribution domains at least 75% in large public blacklists and at least 60% in other blacklists. In addition, we collected and analysed the updated version of 11 public blacklists that we downloaded on 2017/11/09, which is over 7 months after the previous blacklist version downloaded on 2017/02/28, and found some new results such as: the number of malicious domains injected by ransomwares is significantly increased (6.67x larger); or many Top Level Domains (TLDs) which belong to the type of new generic TLD such as .forsale, .institute, .church, etc., appear in the new blacklist version. We also discussed several challenges on measuring registration time of malicious domains in each blacklist, how to determine a malicious domain, malicious classification using Whois-document-based text mining, and standardization of Whois-attribute extraction.

Pages: 127 to 137

Copyright: Copyright (c) to authors, 2018. Used with permission.

Publication date: June 30, 2018

Published in: journal

ISSN: 1942-2636

SERVICES CONTACT
2010 - 2017 © ThinkMind. All rights reserved.
Read Terms of Service and Privacy Policy.