Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Upwork Freelancers vs Dedicated React.js Teams: What’s Better for Your Project in 2025?

      August 1, 2025

      Is Agile dead in the age of AI?

      August 1, 2025

      Top 15 Enterprise Use Cases That Justify Hiring Node.js Developers in 2025

      July 31, 2025

      The Core Model: Start FROM The Answer, Not WITH The Solution

      July 31, 2025

      Finally, a sleek gaming laptop I can take to the office (without sacrificing power)

      August 1, 2025

      These jobs face the highest risk of AI takeover, according to Microsoft

      August 1, 2025

      Apple’s tariff costs and iPhone sales are soaring – how long until device prices are too?

      August 1, 2025

      5 ways to successfully integrate AI agents into your workplace

      August 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025
      Recent

      Enhancing Laravel Queries with Reusable Scope Patterns

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025

      Everything We Know About Livewire 4

      August 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      YouTube wants to use AI to treat “teens as teens and adults as adults” — with the most age-appropriate experiences and protections

      August 1, 2025
      Recent

      YouTube wants to use AI to treat “teens as teens and adults as adults” — with the most age-appropriate experiences and protections

      August 1, 2025

      Sam Altman is afraid of OpenAI’s GPT-5 creation — “The Manhattan Project feels very fast, like there are no adults in the room”

      August 1, 2025

      9 new features that arrived on the Windows 11 Insider Program during the second half of July 2025

      August 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks

    JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks

    May 2, 2025

    JetBrains has officially open-sourced Mellum, a purpose-built 4-billion-parameter language model tailored for software development tasks. Developed from the ground up, Mellum reflects JetBrains’ engineering-first approach, offering a domain-specialized model trained for practical usage across codebases and programming environments. With its release on Hugging Face under the Apache 2.0 license, JetBrains extends an invitation to the broader research and developer community to experiment, adapt, and advance Mellum’s capabilities.

    A Focal Model for Code Understanding

    Unlike general-purpose LLMs, Mellum is classified by JetBrains as a “focal model”—a term they use to describe models with a narrow yet deep specialization. Mellum is optimized specifically for programming-related tasks such as autocompletion, infilling, and structural understanding of source code. This focused design avoids the overhead of broader linguistic modeling and enables the model to perform efficiently in IDE-like environments.

    The model supports a wide array of languages including Java, Kotlin, Python, Go, PHP, C, C++, C#, JavaScript, TypeScript, CSS, HTML, Rust, and Ruby—reflecting the polyglot nature of modern development teams.

    Model Architecture and Training Pipeline

    Mellum follows a LLaMA-style architecture and was trained from scratch using over 4.2 trillion tokens drawn from code-rich sources such as The Stack, StarCoder, CommitPack, and English Wikipedia. It features an 8K token context window and was trained using bf16 mixed precision across a high-throughput cluster of 256 NVIDIA H200 GPUs connected via Infiniband.

    The training process spanned approximately 20 days and leveraged modern infrastructure for scalable model development. The architecture and training procedure were designed with reproducibility and deployment flexibility in mind, making Mellum usable in both cloud inference setups (e.g., vLLM) and on local environments (e.g., llama.cpp, Ollama).

    Benchmarking and Evaluation

    JetBrains evaluated Mellum across a range of benchmarks that reflect its primary use cases—code infilling and completion. The model’s performance indicates strong alignment with the design goals:

    • RepoBench v1.1 (8K context):
      • Python EM: 27.97%
      • Java EM: 31.08%
    • SAFIM (Syntax-Aware Fill-in-the-Middle):
      • pass@1: 38.11%
    • HumanEval Infilling:
      • Single-line: 66.21%
      • Multi-line: 38.52%
      • Random-span: 29.70%

    These results reflect Mellum’s specialization for structured code understanding, especially in scenarios involving partial or interrupted code, which are common in real-world development workflows.

    Rationale for Open Sourcing

    JetBrains’ decision to release Mellum as open-source is grounded in several practical motivations:

    • Transparency: Enables scrutiny of both training data and architectural decisions.
    • Reusability: Supports integration in custom development environments and research experiments.
    • Community Collaboration: Facilitates contribution from external developers to refine model behavior.
    • Pedagogical Value: Provides educators and students with a hands-on artifact for understanding how domain-specific LLMs are constructed and applied.

    The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python).

    Implications for Developer Tooling

    The availability of a compact, performant model optimized for source code opens new opportunities in the IDE space and beyond. JetBrains envisions Mellum as part of a broader strategy involving multiple focal models, each optimized for specific programming tasks such as diff generation or code review assistance. This approach aligns with the growing need for deployable, cost-effective, and context-aware AI tooling that can augment developer productivity without introducing opaque or oversized general-purpose models.

    Conclusion

    Mellum represents a deliberate shift toward smaller, specialized language models that prioritize utility, transparency, and efficiency. By making the model openly available, JetBrains offers a high-quality foundation for building the next generation of AI-assisted developer tools. Its architecture, training methodology, and benchmark performance signal a practical step forward in the evolving space of LLMs tailored for software engineering.


    The release includes both the base model (Mellum-4b-base) and a fine-tuned variant for Python (Mellum-4b-sft-python). Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post JetBrains Open Sources Mellum: A Developer-Centric Language Model for Code-Related Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleGet faster and actionable AWS Trusted Advisor insights to make data-driven decisions using Amazon Q Business
    Next Article Meta and Booz Allen Deploy Space Llama: Open-Source AI Heads to the ISS for Onboard Decision-Making

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 1, 2025
    Machine Learning

    TransEvalnia: A Prompting-Based System for Fine-Grained, Human-Aligned Translation Evaluation Using LLMs

    August 1, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    KTrip is a public transport navigator

    Linux

    CVE-2025-3967 – Itwanger Paicoding Article Handler Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    EcoFlow’s new portable battery stations are lighter and more powerful (DC plug included)

    News & Updates

    CVE-2025-5680 – Shenzhen Dashi Tongzhou Information Technology AgileBPM Groovy Script Handler Remote Deserialization Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-53621 – DSpace XXE Injection Vulnerability

    July 15, 2025

    CVE ID : CVE-2025-53621

    Published : July 15, 2025, 3:15 p.m. | 1 hour, 19 minutes ago

    Description : DSpace open source software is a repository application which provides durable access to digital resources. Two related XML External Entity (XXE) injection possibilities impact all versions of DSpace prior to 7.6.4, 8.2, and 9.1. External entities are not disabled when parsing XML files during import of an archive (in Simple Archive Format), either from command-line (`./dspace import` command) or from the “Batch Import (Zip)” user interface feature. External entities are also not explicitly disabled when parsing XML responses from some upstream services (ArXiv, Crossref, OpenAIRE, Creative Commons) used in import from external sources via the user interface or REST API. An XXE injection in these files may result in a connection being made to an attacker’s site or a local path readable by the Tomcat user, with content potentially being injected into a metadata field. In the latter case, this may result in sensitive content disclosure, including retrieving arbitrary files or configurations from the server where DSpace is running. The Simple Archive Format (SAF) importer / Batch Import (Zip) is only usable by site administrators (from user interface / REST API) or system administrators (from command-line). Therefore, to exploit this vulnerability, the malicious payload would have to be provided by an attacker and trusted by an administrator, who would trigger the import. The fix is included in DSpace 7.6.4, 8.2, and 9.1. Please upgrade to one of these versions. For those who cannot upgrade immediately, it is possible to manually patch the DSpace backend. One may also apply some best practices, though the protection provided is not as complete as upgrading. Administrators must carefully inspect any SAF archives (they did not construct themselves) before importing. As necessary, affected external services can be disabled to mitigate the ability for payloads to be delivered via external service APIs.

    Severity: 6.9 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-46258 – BdThemes Element Pack Pro Missing Authorization Vulnerability

    June 5, 2025

    My favorite office chair is on sale right now — Get it even cheaper with this coupon

    April 14, 2025

    Click-Free Credential Theft: Microsoft Telnet Exploit Bypasses Prompts in Trusted Zones

    May 7, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.