Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Kimi k1.5: A Next Generation Multi-Modal LLM Trained with Reinforcement Learning on Advancing AI with Scalable Multimodal Reasoning and Benchmark Excellence

    Kimi k1.5: A Next Generation Multi-Modal LLM Trained with Reinforcement Learning on Advancing AI with Scalable Multimodal Reasoning and Benchmark Excellence

    January 23, 2025

    Reinforcement learning (RL) has fundamentally transformed AI by allowing models to improve performance iteratively through interaction and feedback. When applied to large language models (LLMs), RL opens new avenues for handling tasks that require complex reasoning, such as mathematical problem-solving, coding, and multimodal data interpretation. Traditional methods rely heavily on pretraining with large static datasets. Still, their limitations have become evident as models solve problems that require dynamic exploration and adaptive decision-making.

    A main challenge in advancing LLMs lies in scaling their capabilities while ensuring computational efficiency. Based on static datasets, conventional pretraining approaches struggle to meet the demands of complex tasks involving intricate reasoning. Also, existing LLM RL implementations have failed to deliver state-of-the-art results due to inefficiencies in prompt design, policy optimization, and data handling. These shortcomings have left a gap in developing models capable of performing well across diverse benchmarks, especially those requiring simultaneous reasoning over text and visual inputs. Solving this problem necessitates a comprehensive framework that aligns model optimization with task-specific requirements while maintaining token efficiency.

    Prior solutions for improving LLMs include supervised fine-tuning and advanced reasoning techniques such as chain-of-thought (CoT) prompting. CoT reasoning allows models to break down problems into intermediate steps, enhancing their ability to tackle complex questions. However, this method is computationally expensive and often constrained by traditional LLMs’ limited context window size. Similarly, Monte Carlo tree search, a popular technique for reasoning enhancement, introduces additional computational overhead and complexity. The absence of scalable RL frameworks for LLMs has further restricted progress, underscoring the need for a novel approach that balances performance improvements with efficiency.

    Researchers from the Kimi Team have introduced Kimi k1.5, a next-generation multimodal LLM designed to overcome these challenges by integrating RL with extended context capabilities. This model employs innovative techniques such as long-context scaling, which expands the context window to 128,000 tokens, enabling it to process larger problem contexts effectively. Unlike prior approaches, the Kimi k1.5 avoids relying on complex methods like Monte Carlo tree search or value functions, opting for a streamlined RL framework. The research team implemented advanced RL prompt set curation to enhance the model’s adaptability, including diverse prompts spanning STEM, coding, and general reasoning tasks. 

    The Kimi k1.5 was developed in two versions: 

    1. The long-CoT model: It excels in extended reasoning tasks, leveraging its 128k-token context window to achieve groundbreaking results across benchmarks. For instance, it scored 96.2% on MATH500 and 94th percentile on Codeforces, demonstrating its ability to handle complex, multi-step problems. 
    2. The short-CoT model: The short-CoT model was optimized for efficiency using advanced long-to-short context training methods. This approach successfully transferred reasoning priors from the long-CoT model, allowing the short-CoT model to maintain high performance, 60.8% on AIME and 94.6% on MATH500, while significantly reducing token usage.

    The training process combined supervised fine-tuning, long-chain reasoning, and RL to create a robust framework for problem-solving. Key innovations included partial rollouts, a technique that reuses previously computed trajectories to improve computational efficiency during long-context processing. Using multimodal data sources, such as real-world and synthetic visual reasoning datasets, further strengthened the model’s ability to interpret and reason across text and images. Advanced sampling strategies, including curriculum and prioritized sampling, ensured training focused on areas where the model demonstrated weaker performance.

    Kimi k1.5 demonstrated significant improvements in token efficiency through its long-to-short context training methodology, enabling the transfer of reasoning priors from long-context models to shorter models while maintaining high performance and reducing token consumption. The model achieved exceptional results across multiple benchmarks, including a 96.2% exact match accuracy on MATH500, a 94th percentile on Codeforces, and a pass rate of 77.5% on AIME, surpassing state-of-the-art models like GPT-4o and Claude Sonnet 3.5 by substantial margins. Its short-CoT performance outperformed GPT-4o and Claude Sonnet 3.5 on benchmarks like AIME and LiveCodeBench by up to 550%, while its long-CoT performance matched o1 across multiple modalities, including MathVista and Codeforces. Key features include long-context scaling with RL using context windows of up to 128k tokens, efficient training through partial rollouts, improved policy optimization via online mirror descent, advanced sampling strategies, and length penalties. Also, Kimi k1.5 excels in joint reasoning over text and vision, highlighting its multi-modal capabilities.

    The research presented several key takeaways:

    1. By enabling models to explore dynamically with rewards, RL removes the constraints of static datasets, expanding the scope of reasoning and problem-solving.
    2. Using a 128,000-token context window allowed the model to effectively perform long-chain reasoning, a critical factor in its state-of-the-art results.
    3. Partial rollouts and prioritized sampling strategies optimized the training process, ensuring resources were allocated to the most impactful areas.
    4. Incorporating diverse visual and textual data enabled the model to excel across benchmarks requiring simultaneous reasoning over multiple input types.
    5. The streamlined RL framework used in Kimi k1.5 avoided the pitfalls of more computationally demanding techniques, achieving high performance without excessive resource consumption.

    In conclusion, the Kimi k1.5 addresses the limitations of traditional pretraining methods and implements innovative techniques for context scaling and token efficiency; the research sets a new benchmark for performance across reasoning and multimodal tasks. The long-CoT and short-CoT models collectively showcase the adaptability of Kimi k1.5, from handling complex, extended reasoning tasks to achieving token-efficient solutions for shorter contexts.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

    🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

    The post Kimi k1.5: A Next Generation Multi-Modal LLM Trained with Reinforcement Learning on Advancing AI with Scalable Multimodal Reasoning and Benchmark Excellence appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMIT Researchers Propose Graph-PReFLexOR: A Machine Learning Model Designed for Graph-Native Reasoning in Science and Engineering
    Next Article advertising firms in san francisco

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Databricks Mosaic Research Examines Long-Context Retrieval-Augmented Generation: How Leading AI Models Handle Expansive Information for Improved Response Accuracy

    Development

    London Hospitals Report Service Disruption from Synnovis Ransomware Attack

    Development

    Apple Researchers Present ReALM: An AI that Can ‘See’ and Understand Screen Context

    Development

    Good Vibes Only: A Vibe Coding Primer

    Development
    GetResponse

    Highlights

    Development

    OpenTelemetry in N|Solid

    June 27, 2024

    Introduction N|Solid Runtime, the OSS runtime that powers N|Solid Pro, is an innovative, lightweight runtime…

    CVE-2025-37820 – Xen-netfront NULL Pointer Dereference and Memory Leak Vulnerability

    May 8, 2025

    CodeSOD: Every Day

    February 20, 2025

    Our Interfaces Have Lost Their Senses — And It’s Time to Bring Them Back!

    March 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.