NexTech 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • UBICOMM 2021, The Fifteenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies
  • ADVCOMP 2021, The Fifteenth International Conference on Advanced Engineering Computing and Applications in Sciences
  • SEMAPRO 2021, The Fifteenth International Conference on Advances in Semantic Processing
  • AMBIENT 2021, The Eleventh International Conference on Ambient Computing, Applications, Services and Technologies
  • EMERGING 2021, The Thirteenth International Conference on Emerging Networks and Systems Intelligence
  • DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics
  • GLOBAL HEALTH 2021, The Tenth International Conference on Global Health Challenges
  • CYBER 2021, The Sixth International Conference on Cyber-Technologies and Cyber-Systems

SoftNet 2021 Congress
October 03, 2021 to October 07, 2021 - Barcelona, Spain

  • ICSEA 2021, The Sixteenth International Conference on Software Engineering Advances
  • ICSNC 2021, The Sixteenth International Conference on Systems and Networks Communications
  • CENTRIC 2021, The Fourteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services
  • VALID 2021, The Thirteenth International Conference on Advances in System Testing and Validation Lifecycle
  • SIMUL 2021, The Thirteenth International Conference on Advances in System Simulation
  • SOTICS 2021, The Eleventh International Conference on Social Media Technologies, Communication, and Informatics
  • INNOV 2021, The Tenth International Conference on Communications, Computation, Networks and Technologies
  • HEALTHINFO 2021, The Sixth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing

NetWare 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • SENSORCOMM 2021, The Fifteenth International Conference on Sensor Technologies and Applications
  • SENSORDEVICES 2021, The Twelfth International Conference on Sensor Device Technologies and Applications
  • SECURWARE 2021, The Fifteenth International Conference on Emerging Security Information, Systems and Technologies
  • AFIN 2021, The Thirteenth International Conference on Advances in Future Internet
  • CENICS 2021, The Fourteenth International Conference on Advances in Circuits, Electronics and Micro-electronics
  • ICQNM 2021, The Fifteenth International Conference on Quantum, Nano/Bio, and Micro Technologies
  • FASSI 2021, The Seventh International Conference on Fundamentals and Advances in Software Systems Integration
  • GREEN 2021, The Sixth International Conference on Green Communications, Computing and Technologies

TrendNews 2021 Congress
November 14, 2021 to November 18, 2021 - Athens, Greece

  • CORETA 2021, Advances on Core Technologies and Applications
  • DIGITAL 2021, Advances on Societal Digital Transformation

 


ThinkMind // International Journal On Advances in Software, volume 10, numbers 3 and 4, 2017 // View article soft_v10_n34_2017_25


Effects of an Apriori-based Data-mining Algorithm for Detecting Type 3 Clones

Authors:
Yoshihisa Udagawa

Keywords: Code clone; Apriori-based algorithm; Maximal frequent sequence; Longest common subsequence(LCS) algorithm; Java source code.

Abstract:
A code clone is a fragment of source code that appears at least twice in software source code. Code clones introduce difficulties in software maintenance because an error in one fragment is reproduced in code clones. It is significant to detect every code clones for making software maintenance easy and reliable. This paper describes software clone detection techniques using an Apriori-based sequential data mining algorithm. The Apriori-based algorithm is used because it is designed to find all frequent items that occur no less than a user-specified threshold named the minimum support (minSup). Since clones are slightly modified by adding, removing, or changing source code in general, the algorithm for detecting code clones has to deal with both match and mismatch portions of source code. The essential idea of the proposed approach is a combination of a partial string match using the longest-common-subsequence (LCS) and an Apriori-based algorithm for finding frequent sequences. Generally, Apriori-based algorithms extract vast numbers of frequent sequences especially when the minSup is small, creating an obstacle to the detection of code clones. The novelties of our approach include pruning processes that depend on characteristics of a programming language, techniques to reduce the number of frequent sequences, and functions to control repetitive subsequences. We evaluate the effectiveness of the proposed algorithm based on experimental results using the source code of the Java SDK SWING graphics package. The results show that the proposed sequential data mining algorithm maintains the performance at a practical level until the minSup reaches two. This paper also shows some mined sequences and source code to demonstrate that the proposed algorithm works from short sequences to long ones.

Pages: 477 to 489

Copyright: Copyright (c) to authors, 2017. Used with permission.

Publication date: December 31, 2017

Published in: journal

ISSN: 1942-2628

SERVICES CONTACT
2010 - 2017 © ThinkMind. All rights reserved.
Read Terms of Service and Privacy Policy.