Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 30, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 30, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 30, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 30, 2025

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025

      The Witcher 3: Wild Hunt reaches 60 million copies sold as work continues on The Witcher 4

      May 30, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      How Remix is shaking things up

      May 30, 2025
      Recent

      How Remix is shaking things up

      May 30, 2025

      Perficient at Kscope25: Let’s Meet in Texas!

      May 30, 2025

      Salesforce + Informatica: What It Means for Data Cloud and Our Customers

      May 30, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025
      Recent

      Does Elden Ring Nightreign have crossplay or cross-platform play?

      May 30, 2025

      Cyberpunk 2077 sequel enters pre-production as Phantom Liberty crosses 10 million copies sold

      May 30, 2025

      EA has canceled yet another game, shuttered its developer, and started more layoffs

      May 30, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Merging

    Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Merging

    February 17, 2025

    Large Language Models (LLMs) have shown exceptional capabilities in complex reasoning tasks through recent advancements in scaling and specialized training approaches. While models like OpenAI o1 and DeepSeek R1 have set new benchmarks in addressing reasoning problems, a significant disparity exists in their performance across different languages. The dominance of English and Chinese in training data for foundation models like Llama and Qwen has created a substantial capability gap for low-resource languages. However, these models face challenges such as incorrect character usage and code-switching. These issues become pronounced during reasoning-focused fine-tuning and reinforcement learning processes.

    Regional LLM initiatives have emerged to address low-resource language limitations through specialized pretraining and post-training approaches. Projects like Typhoon, Sailor, EuroLLM, Aya, Sea-lion, and SeaLLM have focused on adapting models for specific target languages. However, the data-centric approach to adapting reasoning capabilities lacks transparency in reasoning model data recipes. Moreover, scaling requires substantial computational resources, as evidenced by DeepSeek R1 70B’s requirement of 800K examples for distillation and general SFT, far exceeding academic efforts like Sky-T1 and Bespoke-Stratos. Model merging has emerged as an alternative approach, showing promise in combining multiple specialized models’ weights to improve performance across tasks without additional training.

    Researchers from SCB 10X R&D and SCBX Group Bangkok, Thailand have proposed an innovative approach to enhance reasoning capabilities in language-specific LLMs, particularly focusing on Thai language models. The research combines data selection and model merging methods to incorporate advanced reasoning capabilities similar to DeepSeek R1 while maintaining target language proficiency. The study addresses the critical challenge of improving reasoning abilities in low-resource language models, using only publicly available datasets and a modest computational budget of $1,201, matching DeepSeek R1’s reasoning capabilities without compromising performance on target language tasks.

    The implemented methodology utilizes Typhoon2 70B Instruct and DeepSeek R1 70B Distill as base models. The approach involves applying Supervised Fine-Tuning (SFT) to Typhoon2 70B and merging it with DeepSeek R1 70B. The training configuration employs LoRA with specific parameters: rank 32 and α of 16. The system uses sequence packing with 16,384 maximum lengths, alongside Liger kernels, FlashAttention-2, and DeepSpeed ZeRO-3 to optimize computational efficiency. Training runs on 4×H100 GPUs for up to 15 hours using axolotl4, with model merging performed via Mergekit. The evaluation focuses on two key aspects: reasoning capability and language task performance, utilizing benchmarks like AIME 2024, MATH-500, and LiveCodeBench, with Thai translations for assessment.

    Experimental results reveal that DeepSeek R1 70B Distill excels in reasoning tasks like AIME and MATH500 but shows reduced effectiveness in Thai-specific tasks such as MTBench-TH and language accuracy evaluations. Typhoon2 70B Instruct shows strong performance in language-specific tasks but struggles with reasoning challenges, achieving only 10% accuracy in AIME and trailing DeepSeek R1 by over 20% in MATH500. The final model, Typhoon2-R1-70B combines DeepSeek R1’s reasoning capabilities with Typhoon2’s Thai language proficiency, achieving performance within 4% of Typhoon2 on language tasks while maintaining comparable reasoning abilities. This results in performance improvements of 41.6% over Typhoon2 and 12.8% over DeepSeek R1.

    In conclusion, researchers present an approach to enhance reasoning capabilities in language-specific models, through the combination of specialized models. While the study proves that SFT and model merging can effectively transfer reasoning capabilities with limited resources, several limitations exist in the current methodology. The research scope was confined to merging DARE in a two-model setup within a single model family, without optimizing instruction tuning despite available high-quality datasets like Tulu3. Significant challenges persist in multilingual reasoning and model merging including the lack of culturally aware reasoning traces. Despite these challenges, the research marks a step toward advancing LLM capabilities in underrepresented languages.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

    🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

    The post Enhancing Reasoning Capabilities in Low-Resource Language Models through Efficient Model Merging appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleA Step-by-Step Guide to Setting Up a Custom BPE Tokenizer with Tiktoken for Advanced NLP Applications in Python
    Next Article Higher-Order Guided Diffusion for Graph Generation: A Coarse-to-Fine Approach to Preserving Topological Structures

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 30, 2025
    Machine Learning

    World-Consistent Video Diffusion With Explicit 3D Modeling

    May 30, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    NIST Releases a Machine Learning Tool for Testing AI Model Risks

    Development

    Is this Windows 11 ‘bug’ the feature we’ve been waiting for? Say goodbye to Copilot (for now)

    News & Updates

    CSS Length Units

    Development

    How Wiz is empowering organizations to remediate security risks faster with Amazon Bedrock

    Development

    Highlights

    Artificial Intelligence

    The race to AI integration

    November 7, 2024

    AI burst onto the scene in record speed and hasn’t slowed down since. Initially an…

    AI Module Security Flaws in Drupal: MyCERT Urges Immediate Patching

    March 17, 2025

    Google offers AI certification for business leaders now – and the training is free

    May 14, 2025

    This wireless portable speaker delivers gloriously smooth sound without much distortion

    June 6, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.