Measuring Source Code Similarity by Finding Similar Subgraph with an Incremental Genetic algorithm

Cited 5 time in Web of Science Cited 7 time in Scopus

Kim, Jinhyun; Choi, HyukGeun; Yun, Hansang; Moon, Byung-Ro

Issue Date
Proceeding GECCO '16 Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 925-932
Measuring Source Code Similarity by Finding Similar Subgraph with an Incremental Genetic algorithm복합학Code similaritysubgraph isomorphism problemincremental genetic algorithmprogram dependence graph
Measuring similarity between source codes has lots of applications, such as code plagiarism detection, code clone detection, and malware detection. A variety of methods for the measurement have been developed and program-dependence-graph based methods are known to be well working against disguise techniques. But these methods usually rely on solving NP-hard problems which cause a scalability issue. In this paper, we propose a genetic algorithm to measure the similarity between two codes by solving an error correcting subgraph isomorphism problem on dependence graphs. We propose a new cost function for this problem, which reflects the characteristic of the source codes. An incremental genetic algorithm is used to solve the problem. The size of the graph to be searched is gradually increasing during the evolutionary process. We developed new operators for the algorithm, and the overall system is tested on some real world data. Experimental results showed that the system successfully works on code plagiarism detection and malware detection. The similarity computed by the system turned out to reflect the similarity between the codes properly.
Files in This Item:
There are no files associated with this item.
Appears in Collections:
College of Engineering/Engineering Practice School (공과대학/대학원)Dept. of Computer Science and Engineering (컴퓨터공학부)Journal Papers (저널논문_컴퓨터공학부)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.