Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»MBZUAI Researchers Release Atlas-Chat (2B, 9B, and 27B): A Family of Open Models Instruction-Tuned for Darija (Moroccan Arabic)

    MBZUAI Researchers Release Atlas-Chat (2B, 9B, and 27B): A Family of Open Models Instruction-Tuned for Darija (Moroccan Arabic)

    November 7, 2024

    Natural language processing (NLP) has made incredible strides in recent years, particularly through the use of large language models (LLMs). However, one of the primary issues with these LLMs is that they have largely focused on data-rich languages such as English, leaving behind many underrepresented languages and dialects. Moroccan Arabic, also known as Darija, is one such dialect that has received very little attention despite being the main form of daily communication for over 40 million people. Due to the lack of extensive datasets, proper grammatical standards, and suitable benchmarks, Darija has been classified as a low-resource language. As a result, it has often been neglected by developers of large language models. The challenge of incorporating Darija into LLMs is further compounded by its unique mix of Modern Standard Arabic (MSA), Amazigh, French, and Spanish, along with its emerging written form that still lacks standardization. This has led to an asymmetry where dialectal Arabic like Darija is marginalized, despite its widespread use, which has affected the ability of AI models to cater to the needs of these speakers effectively.

    Meet Atlas-Chat!!

    MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) has released Atlas-Chat, a family of open, instruction-tuned models specifically designed for Darija—the colloquial Arabic of Morocco. The introduction of Atlas-Chat marks a significant step in addressing the challenges posed by low-resource languages. Atlas-Chat consists of three models with different parameter sizes—2 billion, 9 billion, and 27 billion—offering a range of capabilities to users depending on their needs. The models have been instruction-tuned, enabling them to perform effectively across different tasks such as conversational interaction, translation, summarization, and content creation in Darija. Moreover, they aim to advance cultural research by better understanding Morocco’s linguistic heritage. This initiative is particularly noteworthy because it aligns with the mission to make advanced AI accessible to communities that have been underrepresented in the AI landscape, thus helping bridge the gap between resource-rich and low-resource languages.

    Technical Details and Benefits of Atlas-Chat

    Atlas-Chat models are developed by consolidating existing Darija language resources and creating new datasets through both manual and synthetic means. Notably, the Darija-SFT-Mixture dataset consists of 458,000 instruction samples, which were gathered from existing resources and through synthetic generation from platforms like Wikipedia and YouTube. Additionally, high-quality English instruction datasets were translated into Darija with rigorous quality control. The models have been fine-tuned on this dataset using different base model choices like the Gemma 2 models. This careful construction has led Atlas-Chat to outperform other Arabic-specialized LLMs, such as Jais and AceGPT, by significant margins. For instance, in the newly introduced DarijaMMLU benchmark—a comprehensive evaluation suite for Darija covering discriminative and generative tasks—Atlas-Chat achieved a 13% performance boost over a larger 13 billion parameter model. This demonstrates its superior ability in following instructions, generating culturally relevant responses, and performing standard NLP tasks in Darija.

    Why Atlas-Chat Matters

    The introduction of Atlas-Chat is crucial for multiple reasons. First, it addresses a long-standing gap in AI development by focusing on an underrepresented language. Moroccan Arabic, which has a complex cultural and linguistic makeup, is often neglected in favor of MSA or other dialects that are more data-rich. With Atlas-Chat, MBZUAI has provided a powerful tool for enhancing communication and content creation in Darija, supporting applications like conversational agents, automated summarization, and more nuanced cultural research. Second, by providing models with varying parameter sizes, Atlas-Chat ensures flexibility and accessibility, catering to a wide range of user needs—from lightweight applications requiring fewer computational resources to more sophisticated tasks. The evaluation results for Atlas-Chat highlight its effectiveness; for example, Atlas-Chat-9B scored 58.23% on the DarijaMMLU benchmark, significantly outperforming state-of-the-art models like AceGPT-13B. Such advancements indicate the potential of Atlas-Chat in delivering high-quality language understanding for Moroccan Arabic speakers.

    Conclusion

    Atlas-Chat represents a transformative advancement for Moroccan Arabic and other low-resource dialects. By creating a robust and open-source solution for Darija, MBZUAI is taking a major step in making advanced AI accessible to a broader audience, empowering users to interact with technology in their own language and cultural context. This work not only addresses the asymmetries seen in AI support for low-resource languages but also sets a precedent for future development in underrepresented linguistic domains. As AI continues to evolve, initiatives like Atlas-Chat are crucial in ensuring that the benefits of technology are available to all, regardless of the language they speak. With further improvements and refinements, Atlas-Chat is poised to bridge the communication gap and enhance the digital experience for millions of Darija speakers.


    Check out the Paper and Models on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

    [Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

    The post MBZUAI Researchers Release Atlas-Chat (2B, 9B, and 27B): A Family of Open Models Instruction-Tuned for Darija (Moroccan Arabic) appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA New Google DeepMind Research Reveals a New Kind of Vulnerability that Could Leak User Prompts in MoE Model
    Next Article How Zalando optimized large-scale inference and streamlined ML operations on Amazon SageMaker

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Why Anthropic’s latest Claude model could be the new AI to beat – and how to try it

    News & Updates

    Layer Parallelism: Enhancing LLM Inference Efficiency Through Parallel Execution of Transformer Layers

    Machine Learning

    FineWeb-C: A Community-Built Dataset For Improving Language Models In ALL Languages

    Development

    Microsoft introduces 6 Security Copilot agents to effectively combat cyberattacks

    Operating Systems

    Highlights

    Artificial Intelligence

    Introducing Gemma 3

    May 13, 2025

    The most capable model you can run on a single GPU or TPU. Source: Read…

    Best Practices for Structuring Redux Applications

    November 12, 2024

    WACK: Advancing Hallucination Detection by Identifying Knowledge-Based Errors in Language Models Through Model-Specific, High-Precision Datasets and Prompting Techniques

    November 1, 2024

    Designer Spotlight: Stephanie Bruce

    April 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.