S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Computer Science and Engineering (컴퓨터공학부) Journal Papers (저널논문_컴퓨터공학부)
Measuring Source Code Similarity by Finding Similar Subgraph with an Incremental Genetic algorithm
- Kim, Jinhyun; Choi, HyukGeun; Yun, Hansang; Moon, Byung-Ro
- Issue Date
- Proceeding GECCO '16 Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 925-932
- Measuring Source Code Similarity by Finding Similar Subgraph with an Incremental Genetic algorithm; 복합학; Code similarity; subgraph isomorphism problem; incremental genetic algorithm; program dependence graph
- Measuring similarity between source codes has lots of applications, such as code plagiarism detection, code clone detection, and malware detection. A variety of methods for the measurement have been developed and program-dependence-graph based methods are known to be well working against disguise techniques. But these methods usually rely on solving NP-hard problems which cause a scalability issue. In this paper, we propose a genetic algorithm to measure the similarity between two codes by solving an error correcting subgraph isomorphism problem on dependence graphs. We propose a new cost function for this problem, which reflects the characteristic of the source codes. An incremental genetic algorithm is used to solve the problem. The size of the graph to be searched is gradually increasing during the evolutionary process. We developed new operators for the algorithm, and the overall system is tested on some real world data. Experimental results showed that the system successfully works on code plagiarism detection and malware detection. The similarity computed by the system turned out to reflect the similarity between the codes properly.
- Files in This Item: There are no files associated with this item.