Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 16, 2025

      This week in AI dev tools: Apple’s Foundations Model framework, Mistral’s first reasoning model, and more (June 13, 2025)

      June 13, 2025

      Open Talent platforms emerging to match skilled workers to needs, study finds

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      The 5 gadgets that got me through marathons and obstacle races (and why they work)

      June 16, 2025

      This beastly 500W charger replaced every other charger I had – with six ports to boot

      June 16, 2025

      Mac Mini won’t power on? Apple will fix it for you – for free

      June 16, 2025

      Why I’m switching to VS Code. Hint: It’s all about AI tool integration

      June 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      From Concept to Code: Final Year PHP Projects with Reports for Smart Submissions

      June 16, 2025
      Recent

      From Concept to Code: Final Year PHP Projects with Reports for Smart Submissions

      June 16, 2025

      Building Construction suppliers in India

      June 16, 2025

      Neutralinojs v6.1 released

      June 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Edge’s Quiet Shift to AVIF: Why It Matters

      June 16, 2025
      Recent

      Microsoft Edge’s Quiet Shift to AVIF: Why It Matters

      June 16, 2025

      Windows 11 test builds are accidentally playing the Windows Vista startup sound

      June 16, 2025

      Leaked: ROG Xbox Ally and Xbox Ally X pre-orders set for August, launch in October

      June 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks

    JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks

    May 2, 2025

    JetBrains has officially open-sourced Mellum, a purpose-built 4-billion-parameter language model tailored for software development tasks. Developed from the ground up, Mellum reflects JetBrains’ engineering-first approach, offering a domain-specialized model trained for practical usage across codebases and programming environments. With its release on Hugging Face under the Apache 2.0 license, JetBrains extends an invitation to the broader research and developer community to experiment, adapt, and advance Mellum’s capabilities.

    A Focal Model for Code Understanding

    Unlike general-purpose LLMs, Mellum is classified by JetBrains as a “focal model”—a term they use to describe models with a narrow yet deep specialization. Mellum is optimized specifically for programming-related tasks such as autocompletion, infilling, and structural understanding of source code. This focused design avoids the overhead of broader linguistic modeling and enables the model to perform efficiently in IDE-like environments.

    The model supports a wide array of languages including Java, Kotlin, Python, Go, PHP, C, C++, C#, JavaScript, TypeScript, CSS, HTML, Rust, and Ruby—reflecting the polyglot nature of modern development teams.

    Model Architecture and Training Pipeline

    Mellum follows a LLaMA-style architecture and was trained from scratch using over 4.2 trillion tokens drawn from code-rich sources such as The Stack, StarCoder, CommitPack, and English Wikipedia. It features an 8K token context window and was trained using bf16 mixed precision across a high-throughput cluster of 256 NVIDIA H200 GPUs connected via Infiniband.

    The training process spanned approximately 20 days and leveraged modern infrastructure for scalable model development. The architecture and training procedure were designed with reproducibility and deployment flexibility in mind, making Mellum usable in both cloud inference setups (e.g., vLLM) and on local environments (e.g., llama.cpp, Ollama).

    Benchmarking and Evaluation

    JetBrains evaluated Mellum across a range of benchmarks that reflect its primary use cases—code infilling and completion. The model’s performance indicates strong alignment with the design goals:

    • RepoBench v1.1 (8K context):
      • Python EM: 27.97%
      • Java EM: 31.08%
    • SAFIM (Syntax-Aware Fill-in-the-Middle):
      • pass@1: 38.11%
    • HumanEval Infilling:
      • Single-line: 66.21%
      • Multi-line: 38.52%
      • Random-span: 29.70%

    These results reflect Mellum’s specialization for structured code understanding, especially in scenarios involving partial or interrupted code, which are common in real-world development workflows.

    Rationale for Open Sourcing

    JetBrains’ decision to release Mellum as open-source is grounded in several practical motivations:

    • Transparency: Enables scrutiny of both training data and architectural decisions.
    • Reusability: Supports integration in custom development environments and research experiments.
    • Community Collaboration: Facilitates contribution from external developers to refine model behavior.
    • Pedagogical Value: Provides educators and students with a hands-on artifact for understanding how domain-specific LLMs are constructed and applied.

    The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python).

    Implications for Developer Tooling

    The availability of a compact, performant model optimized for source code opens new opportunities in the IDE space and beyond. JetBrains envisions Mellum as part of a broader strategy involving multiple focal models, each optimized for specific programming tasks such as diff generation or code review assistance. This approach aligns with the growing need for deployable, cost-effective, and context-aware AI tooling that can augment developer productivity without introducing opaque or oversized general-purpose models.

    Conclusion

    Mellum represents a deliberate shift toward smaller, specialized language models that prioritize utility, transparency, and efficiency. By making the model openly available, JetBrains offers a high-quality foundation for building the next generation of AI-assisted developer tools. Its architecture, training methodology, and benchmark performance signal a practical step forward in the evolving space of LLMs tailored for software engineering.


    The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python). Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGet faster and actionable AWS Trusted Advisor insights to make data-driven decisions using Amazon Q Business
    Next Article Meta and Booz Allen Deploy Space Llama: Open-Source AI Heads to the ISS for Onboard Decision-Making

    Related Posts

    Machine Learning

    Extend your Amazon Q Business with PagerDuty Advance data accessor

    June 16, 2025
    Machine Learning

    How Apollo Tyres is unlocking machine insights using agentic AI-powered Manufacturing Reasoner

    June 16, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    The next chapter of our Gemini era

    Artificial Intelligence

    CVE-2025-43973 – GoBGP Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Fortifying Debian With SELinux by Enforcing Mandatory Access Control for Ultimate System Security

    Learning Resources

    CVE-2025-4298 – Tenda AC1206 Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Beyond the Hype: Google’s Practical AI Guide Every Startup Founder Should Read

    April 30, 2025

    In 2025, AI continues to reshape how startups build, operate, and compete. Google’s Future of…

    7 strategic insights business and IT leaders need for AI transformation in 2025

    April 11, 2025

    Xinbi Telegram Market Tied to $8.4B in Crypto Crime, Romance Scams, North Korea Laundering

    May 14, 2025

    CVE-2025-43865 – React Router HTTP Header Injection Vulnerability

    April 24, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.