Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 21, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 21, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 21, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 21, 2025

      The best smart glasses unveiled at I/O 2025 weren’t made by Google

      May 21, 2025

      Google’s upcoming AI smart glasses may finally convince me to switch to a pair full-time

      May 21, 2025

      I tried Samsung’s Project Moohan XR headset at I/O 2025 – and couldn’t help but smile

      May 21, 2025

      Is Google’s $250-per-month AI subscription plan worth it? Here’s what’s included

      May 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025
      Recent

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Low Physical Effort

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Flexibility in Use

      May 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025
      Recent

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025

      Microsoft Brings Grok 3 AI to Azure with Guardrails and Enterprise Controls

      May 21, 2025

      You won’t have to pay a fee to publish apps to Microsoft Store

      May 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

    Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

    May 20, 2025

    Meta has introduced KernelLLM, an 8-billion-parameter language model fine-tuned from Llama 3.1 Instruct, aimed at automating the translation of PyTorch modules into efficient Triton GPU kernels. This initiative seeks to lower the barriers to GPU programming by simplifying kernel development processes.

    Technical Overview

    KernelLLM is trained on approximately 25,000 paired examples of PyTorch modules and their corresponding Triton kernel implementations. The dataset, known as KernelBook, comprises filtered code from The Stack and synthetically generated samples using torch.compile() and other prompting techniques.

    The model employs a supervised instruction tuning approach, utilizing prompt templates that include format examples during both training and evaluation. Training was conducted over 10 epochs with a batch size of 32, using 16 GPUs over approximately 12 hours (192 GPU hours).

    Performance Evaluation

    KernelLLM’s performance was assessed using KernelBench-Triton, a benchmark designed to evaluate the generation of Triton kernels from PyTorch modules. The model achieved a Pass@1 score of 20.2, outperforming larger models such as GPT-4o (~200B parameters) and DeepSeek V3 (671B parameters), which scored 15 and 16 respectively. With multiple inferences, KernelLLM’s Pass@10 and Pass@20 scores reached 51.8 and 57.1, indicating robust performance in generating correct kernels.

    Implications for GPU Programming

    By automating the generation of Triton kernels from PyTorch modules, KernelLLM has the potential to streamline the development of GPU-accelerated applications. This could be particularly beneficial for developers seeking to optimize performance without delving into the complexities of manual kernel programming.

    The model’s ability to produce efficient kernels may also contribute to more accessible and efficient utilization of GPU resources, potentially impacting areas such as deep learning model training and inference.


    Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTx-PEARS: Elevate Software Quality with Smarter NFT Practices
    Next Article A Step-by-Step Coding Guide to Efficiently Fine-Tune Qwen3-14B Using Unsloth AI on Google Colab with Mixed Datasets and LoRA Optimization

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 21, 2025
    Machine Learning

    Step-by-Step Guide to Create an AI agent with Google ADK

    May 21, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Prototyping with Porsche Design System

    Development

    CVE-2025-3953 – WordPress WP Statistics Unauthenticated Settings Modification Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Elden Ring DLC: How to beat Romina, Saint of the Bud, in Shadow of the Erdtree

    Development

    Google komt met Android-updates voor aangevallen FreeType-lek

    Security

    Highlights

    Development

    Malaysian Digital Ministry To Bolster National Cybersecurity Frameworks with Data Commission

    July 29, 2024

    The Malaysian Digital Ministry is increasing efforts to secure the country’s digital landscape as it…

    Four Critical Ivanti CSA Vulnerabilities Exploited—CISA and FBI Urge Mitigation

    January 23, 2025

    Universal Design Series: Chronic Conditions in Health Systems – Telehealth and Remote Monitoring -12

    August 29, 2024

    Have a genealogy mystery? How I used AI to solve a family puzzle

    January 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.