Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      In-House vs Outsourcing for React.js Development: Understand What Is Best for Your Enterprise

      July 17, 2025

      Tiny Screens, Big Impact: The Forgotten Art Of Developing Web Apps For Feature Phones

      July 16, 2025

      Kong AI Gateway 3.11 introduces new method for reducing token costs

      July 16, 2025

      Native vs hybrid vs cross-platform: Resolving the trilemma

      July 16, 2025

      Microsoft’s AI CEO says Google nearly launched “ChatGPT” before OpenAI — but brutal skeptics, fears of disrupting search, and safety concerns thwarted the plan

      July 17, 2025

      You’ve got to try these 5 premium Minecraft add-ons — Dinosaurs, security systems, and more really shake up Bedrock Edition

      July 17, 2025

      This Microsoft pay scale reveals AI pros are making bank — with compensation packages reaching up to $336,000/year

      July 17, 2025

      ZeniMax QA testers face whiplash and “rancid” work morale following Microsoft’s gaming layoffs — but the union still fights

      July 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 17, 2025
      Recent

      The details of TC39’s last meeting

      July 17, 2025

      Vector Search Embeddings and RAG

      July 16, 2025

      Python Meets Power Automate: Trigger via URL

      July 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s AI CEO says Google nearly launched “ChatGPT” before OpenAI — but brutal skeptics, fears of disrupting search, and safety concerns thwarted the plan

      July 17, 2025
      Recent

      Microsoft’s AI CEO says Google nearly launched “ChatGPT” before OpenAI — but brutal skeptics, fears of disrupting search, and safety concerns thwarted the plan

      July 17, 2025

      You’ve got to try these 5 premium Minecraft add-ons — Dinosaurs, security systems, and more really shake up Bedrock Edition

      July 17, 2025

      This Microsoft pay scale reveals AI pros are making bank — with compensation packages reaching up to $336,000/year

      July 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Optimizing Assembly Code with LLMs: Reinforcement Learning Outperforms Traditional Compilers

    Optimizing Assembly Code with LLMs: Reinforcement Learning Outperforms Traditional Compilers

    May 24, 2025

    LLMs have shown impressive capabilities across various programming tasks, yet their potential for program optimization has not been fully explored. While some recent efforts have used LLMs to enhance performance in languages like C++ and Python, the broader application of LLMs to optimize code, especially in low-level programming contexts, remains limited. Existing LLM benchmarks largely focus on code generation from natural language or solving GitHub issues, as seen in HumanEval, MBPP, APPS, SWE-bench, and SWE-agent. Moreover, models such as Codex, AlphaCode, and Code Llama primarily aim to improve code generation quality rather than performance. However, select research has begun addressing optimization, including parallelization and code efficiency improvements, though many of these approaches are constrained by the need for formal verification, limiting scalability.

    In contrast, some newer methods embrace test-based validation, allowing optimization of more complex programs with loops. Learning-based strategies in compiler optimization—like AutoPhase, which uses reinforcement learning for pass sequencing, and Coreset, which applies graph neural networks—have shown promise in improving performance. Superoptimization techniques aim to find the most efficient version of a program but are typically restricted to small-scale problems. Additionally, frameworks like AutoTVM and Ansor have focused on optimizing GPU kernel code through statistical modeling and search. Recently, LLM-driven optimization has gained attention, with reinforcement learning approaches guiding LLMs using feedback from test cases. Techniques like CodeRL and PPOCoder leverage policy optimization methods to fine-tune models for better performance, even across resource-constrained programming languages like Verilog. 

    Stanford, UIUC, CMU, and Visa Research researchers explore using LLMs to optimize assembly code performance—an area traditionally handled by compilers like GCC. They introduce a reinforcement learning framework using Proximal Policy Optimization (PPO), guided by a reward balancing correctness and speedup over the gcc -O3 baseline. Using a dataset of 8,072 real-world programs, their model, Qwen2.5-Coder-7B-PPO, achieves a 96.0% test pass rate and a 1.47× average speedup, outperforming 20 other models, including Claude-3.7-sonnet. Their results show that with RL training, LLMs can effectively outperform conventional compiler optimizations. 

    The methodology involves optimizing compiled C programs for performance using an RL approach. Given a C program C, it is compiled to assembly P using gcc -O3. The goal is to generate a new assembly program P’ that is functionally equivalent but faster. Correctness is verified using a test set, and speedup is measured by execution time improvement. Using CodeNet as the dataset, the authors apply PPO to train a language model that generates improved code. Two reward functions—Correctness-Guided Speedup and Speedup-Only—are used to guide training based on program validity, correctness, and performance gains. 

    The study evaluates various language models on optimizing assembly code, revealing that most models struggle with low test pass rates and minimal speedups. However, Qwen2.5-Coder-7B-PPO, trained with reinforcement learning, significantly outperforms others, achieving 96% accuracy and a 1.47× average speedup. Ablation studies show that using gcc -O3 as a reference aids performance, while removing it leads to sharp declines. Notably, models like Claude-3.7-sonnet can surpass compilers by identifying hardware-specific optimizations, such as replacing loops with a single popcnt instruction, demonstrating their ability to perform semantic-level code transformations beyond traditional compiler capabilities. 

    In conclusion, the study explores using LLMs to optimize assembly code, a domain where traditional compilers struggle due to the complexity of low-level performance tuning. The authors fine-tune Qwen2.5-Coder-7B using PPO, rewarding both correctness (via test cases) and speedup over gcc -O3. They introduce a benchmark of 8,072 real-world C programs to evaluate performance. The model achieves a 96.0% test pass rate and a 1.47× average speedup, outperforming 20 other models, including Claude-3.7-sonnet. While effective, limitations include a lack of formal correctness guarantees and variability in hardware performance across systems. 


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post Optimizing Assembly Code with LLMs: Reinforcement Learning Outperforms Traditional Compilers appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleStep-by-Step Guide to Build a Customizable Multi-Tool AI Agent with LangGraph and Claude for Dynamic Agent Creation
    Next Article A Comprehensive Coding Guide to Crafting Advanced Round-Robin Multi-Agent Workflows with Microsoft AutoGen

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 17, 2025
    Machine Learning

    Accenture scales video analysis with Amazon Nova and Amazon Bedrock Agents

    July 16, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2024-41446 – Alkacon OpenCMS Stored XSS Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-28036 – TOTOLINK A950RG Remote Command Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    ServiceNow and Nvidia’s new reasoning AI model raises the bar for enterprise AI agents

    News & Updates

    How to Build Autonomous Agents using Prompt Chaining with AI Primitives (No Frameworks)

    Development

    Highlights

    CVE-2025-20985 – Microsoft Xbox ThemeManager Privilege Escalation Vulnerability

    June 4, 2025

    CVE ID : CVE-2025-20985

    Published : June 4, 2025, 5:15 a.m. | 2 hours, 18 minutes ago

    Description : Improper privilege management in ThemeManager prior to SMR Jun-2025 Release 1 allows local privileged attackers to reuse trial items.

    Severity: 5.5 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    8 Key Questions Every CEO Should Ask Before Hiring a Node.js Development Company in 2025

    July 11, 2025

    I tried a PCIe 5.0 NVMe SSD that reads twice as fast as my Samsung 990 Pro, and it proves better value in one area

    May 13, 2025

    Introducing sub-issues: Enhancing issue management on GitHub

    April 11, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.