Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages?

    Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages?

    August 14, 2024

    Cross-lingual code cloning has become an important and difficult job due to the rising complexity of modern software development, where numerous programming languages are typically employed inside a single project. The term ‘cross-lingual code clone detection’ describes the process of finding identical or nearly identical code segments in several computer languages. 

    Recent advances in Artificial Intelligence and Machine Learning have made tremendous progress in handling many computing jobs possible, especially with the introduction of Large Language Models (LLMs). Due to their exceptional Natural Language Processing skills, LLMs have garnered attention for their possible use in code-related tasks like code clone identification. Building on these advancements, in recent research, a team of researchers from the University of Luxembourg has re-examined the problem of cross-lingual code clone detection and studied the effectiveness of both LLMs and pre-trained embedding models in this field.

    The research assesses the performance of four different LLMs in conjunction with eight unique prompts intended to support the detection of cross-lingual code clones. It evaluates the usefulness of a pre-trained embedding model that produces vector representations of code excerpts. Following this, pairs of code fragments are categorized as clones or non-clones based on these representations. Two popular cross-lingual datasets have been used for the evaluations, which are CodeNet and XLCoST.

    The study’s findings have demonstrated the benefits and drawbacks of LLMs in this situation. When working with simple programming examples like those in the XLCoST dataset, the LLMs showed that they could attain high F1 scores, up to 0.98. However, when presented with more difficult programming tasks, their performance suffered. This decline raises the possibility that LLMs will find it difficult to completely appreciate the subtle meaning of code clones, especially in a cross-lingual context where it is crucial to comprehend the functional equivalency of code between languages.

    However, the research has shown that embedding models, which represent code fragments from many programming languages within a single vector space, offer a stronger basis for identifying cross-lingual code clones. With an improvement of about two percentage points on the XLCoST dataset and about 24 percentage points on the more complicated CodeNet dataset, the researchers could attain results that surpassed all evaluated LLMs by training a basic classifier using these embeddings.

    The team has summarized their primary contributions as follows.

    The work broadly analyzes LLM capacities to identify cross-lingual code clones, with a particular emphasis on Java combined with ten distinct programming languages. This work applies several LLMs to a wide range of cross-lingual datasets and assesses the effects of several quick engineering methods, providing a distinct viewpoint in contrast to previous research.

    The study offers insightful information about how well LLM performs in code clone identification. It emphasizes how much the closeness of two programming languages influences LLMs’ capacity to identify clones, particularly when given straightforward cues. The effects of programming language differences are lessened when prompts focus on reasoning and logic. The generalisability and universal effectiveness of LLMs in cross-lingual code clone detection tasks have also been discussed.

    The study contrasts LLM performance with traditional ML techniques using learned code representations as a basis. The experiment’s findings have indicated that LLMs might not fully understand the meaning of clones in the context of code clone detection, suggesting that conventional techniques may still be superior in this regard.

    In conclusion, the results imply that while LLMs are highly capable, especially when it comes to handling simple code examples, they might not be the most effective method for cross-lingual code clone detection, especially when dealing with more complicated circumstances. On the other hand, embedding models are more appropriate for attaining state-of-the-art performance in this domain since they provide consistent and language-neutral representations of code. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Researchers at FPT Software AI Center Introduce XMainframe: A State-of-the-Art Large Language Model (LLM) Specialized for Mainframe Modernization to Address the $100B Legacy Code Modernization

    The post Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages? appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop SQL Courses to Try in 2024
    Next Article Microsoft and Paige Researchers Developed Virchow2 and Virchow2G: Second-Generation Foundation Models for Computational Pathology

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Don’t Tread on Me Penguins Against Trump Shirt https://viralstyle.com/graydesigner/dont-tread-on-me-penguins-against-trump Make a bold statement with our “Don’t Tread on Me Penguins Against Trump” shirt. This eye-catching design features rebellious penguins standing up to Trump, blending humor with political activism. Perfect for protests, casual wear, or sparking conversation. Soft, high-quality cotton for all-day comfort. Wear your values loud and proud!

    Web Development

    A Memorable Trip to Munnar with Team

    Development

    Maximizing ROI with Experience Cloud: Best Practices for Analytics and Reporting

    Development

    Verizon is using AI to prevent accidental internet outages – here’s how

    Development

    Highlights

    Development

    U.S. Customs and Border Protection Issues Guide for Travelers on Facial Recognition Opt-Out

    July 30, 2024

    The U.S. Customs and Border Protection (CBP) agency has released a guide detailing how travelers…

    Gray Duck Mail – modern group email discussion lists

    May 14, 2025

    Ongoing Campaign Bombarded Enterprises with Spam Emails and Phone Calls

    May 14, 2024

    Salix – Slackware-based Linux distribution

    January 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.