Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      10 Top Node.js Development Companies for Enterprise-Scale Projects (2025-2026 Ranked & Reviewed)

      July 4, 2025

      12 Must-Know Cost Factors When Hiring Node.js Developers for Your Enterprise

      July 4, 2025

      Mirantis reveals Lens Prism, an AI copilot for operating Kubernetes clusters

      July 3, 2025

      Avoid these common platform engineering mistakes

      July 3, 2025

      “A fantastic device for creative users” — this $550 discount on ASUS’s 3K OLED creator laptop disappears before Prime Day

      July 5, 2025

      Distribution Release: Rhino Linux 2025.3

      July 5, 2025

      Just days after joining Game Pass, the Xbox PC edition of Call of Duty: WW2 is taken offline for “an issue”

      July 5, 2025

      Xbox layoffs and game cuts wreak havoc on talented developers and the company’s future portfolio — Weekend discussion 💬

      July 5, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Flaget – new small 5kB CLI argument parser

      July 5, 2025
      Recent

      Flaget – new small 5kB CLI argument parser

      July 5, 2025

      The dog days of JavaScript summer

      July 4, 2025

      Databricks Lakebase – Database Branching in Action

      July 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      “A fantastic device for creative users” — this $550 discount on ASUS’s 3K OLED creator laptop disappears before Prime Day

      July 5, 2025
      Recent

      “A fantastic device for creative users” — this $550 discount on ASUS’s 3K OLED creator laptop disappears before Prime Day

      July 5, 2025

      Distribution Release: Rhino Linux 2025.3

      July 5, 2025

      EmptyEpsilon – spaceship bridge simulator game

      July 5, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

    Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

    May 20, 2025

    Meta has introduced KernelLLM, an 8-billion-parameter language model fine-tuned from Llama 3.1 Instruct, aimed at automating the translation of PyTorch modules into efficient Triton GPU kernels. This initiative seeks to lower the barriers to GPU programming by simplifying kernel development processes.

    Technical Overview

    KernelLLM is trained on approximately 25,000 paired examples of PyTorch modules and their corresponding Triton kernel implementations. The dataset, known as KernelBook, comprises filtered code from The Stack and synthetically generated samples using torch.compile() and other prompting techniques.

    The model employs a supervised instruction tuning approach, utilizing prompt templates that include format examples during both training and evaluation. Training was conducted over 10 epochs with a batch size of 32, using 16 GPUs over approximately 12 hours (192 GPU hours).

    Performance Evaluation

    KernelLLM’s performance was assessed using KernelBench-Triton, a benchmark designed to evaluate the generation of Triton kernels from PyTorch modules. The model achieved a Pass@1 score of 20.2, outperforming larger models such as GPT-4o (~200B parameters) and DeepSeek V3 (671B parameters), which scored 15 and 16 respectively. With multiple inferences, KernelLLM’s Pass@10 and Pass@20 scores reached 51.8 and 57.1, indicating robust performance in generating correct kernels.

    Implications for GPU Programming

    By automating the generation of Triton kernels from PyTorch modules, KernelLLM has the potential to streamline the development of GPU-accelerated applications. This could be particularly beneficial for developers seeking to optimize performance without delving into the complexities of manual kernel programming.

    The model’s ability to produce efficient kernels may also contribute to more accessible and efficient utilization of GPU resources, potentially impacting areas such as deep learning model training and inference.


    Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

    The post Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTx-PEARS: Elevate Software Quality with Smarter NFT Practices
    Next Article A Step-by-Step Coding Guide to Efficiently Fine-Tune Qwen3-14B Using Unsloth AI on Google Colab with Mixed Datasets and LoRA Optimization

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 5, 2025
    Machine Learning

    Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

    July 4, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5225 – Campcodes Advanced Online Voting System SQL Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Easily convert, compress, and merge your files with Convert Eaze

    Web Development

    Program Execution, follow-up pt II

    Operating Systems

    CVE-2025-4524 – Madara WordPress Theme Local File Inclusion Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Artificial Intelligence

    Introducing Gemma 3

    July 5, 2025

    The most capable model you can run on a single GPU or TPU. Source: Read…

    Spinbetter in Bangladesh: Best Sports & Casino Platform

    June 6, 2025

    CVE-2025-48172 – SumatraPDF CHMLib Heap-Based Buffer Overflow

    July 4, 2025

    New Mirai botnet infect TBK DVR devices via command injection flaw

    June 8, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.