Inicio  /  Applied Sciences  /  Vol: 13 Par: 21 (2023)  /  Artículo
ARTÍCULO
TITULO

TCCCD: Triplet-Based Cross-Language Code Clone Detection

Yong Fang    
Fangzheng Zhou    
Yijia Xu and Zhonglin Liu    

Resumen

Code cloning is a common practice in software development, where developers reuse existing code to accelerate programming speed and enhance work efficiency. Existing clone-detection methods mainly focus on code clones within a single programming language. To address the challenge of code clone instances in cross-platform development, we propose a novel method called TCCCD, which stands for Triplet-Based Cross-Language Code Clone Detection. Our approach is based on machine learning and can accurately detect code clone instances between different programming languages. We used the pre-trained model UniXcoder to map programs written in different languages into the same vector space and learn their code representations. Then, we fine-tuned TCCCD using triplet learning to improve its effectiveness in cross-language clone detection. To assess the effectiveness of our proposed approach, we conducted thorough comparative experiments using the dataset provided by the paper titled CLCDSA (Cross Language Code Clone Detection using Syntactical Features and API Documentation). The experimental results demonstrated a significant improvement of our approach over the state-of-the-art baselines, with precision, recall, and F1-measure scores of 0.96, 0.91, and 0.93, respectively. In summary, we propose a novel cross-language code-clone-detection method called TCCCD. TCCCD leverages the pre-trained model UniXcode for source code representation and fine-tunes the model using triplet learning. In the experimental results, TCCCD outperformed the state-of-the-art baselines in terms of the precision, recall, and F1-measure.

 Artículos similares

       
 
Baskhad Idrisov and Tim Schlippe    
Our paper compares the correctness, efficiency, and maintainability of human-generated and AI-generated program code. For that, we analyzed the computational resources of AI- and human-generated program code using metrics such as time and space complexit... ver más
Revista: Algorithms

 
Subin Kim, Heejin Hwang, Keunyeong Oh and Jiuk Shin    
The seismically deficient column details in existing reinforced concrete buildings affect the overall behavior of the building depending on the failure type of the column. The purpose of this study is to develop and validate a machine-learning-based pred... ver más
Revista: Applied Sciences

 
Zikang Jin, Zonghan Yu, Fanshuo Meng, Wei Zhang, Jingzhi Cui, Xiaolong He, Yuedi Lei and Omer Musa    
The parametric design method is widely utilized in the preliminary design stage for hypersonic vehicles; it ensures the fast iteration of configuration, generation, and optimization. This study proposes a novel parametric method for a wide-range, wing-mo... ver más
Revista: Aerospace

 
Gang Yao, Guifeng Wang, Lihai Tan, Yinfeng Zhang, Ruizhi Wang and Xiaohan Yang    
To study the influence of inclusions on the fracture evolution and mechanical properties of mortar structures, a series of uniaxial compression tests for mortar samples containing cylinder inclusions of varying mechanical properties were conducted. The d... ver más
Revista: Applied Sciences

 
Hui Li, Yi-Kun Ba, Ning Zhang, Yong-Jian Liu and Wei Shi    
In regions with severe cold and high latitudes, concrete structures are susceptible to cracking and displacement due to uneven temperature stress, which directly impacts their normal utilization. Therefore, to investigate the temperature distribution cha... ver más
Revista: Applied Sciences