Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Ultimate Guide to Node.js Development Pricing for Enterprises

      July 29, 2025

      Stack Overflow: Developers’ trust in AI outputs is worsening year over year

      July 29, 2025

      Web Components: Working With Shadow DOM

      July 28, 2025

      Google’s new Opal tool allows users to create mini AI apps with no coding required

      July 28, 2025

      I replaced my Samsung OLED TV with this Sony Mini LED model for a week – and didn’t regret it

      July 29, 2025

      I tested the most popular robot mower on the market – and it was a $5,000 crash out

      July 29, 2025

      5 gadgets and accessories that leveled up my gaming setup (including a surprise console)

      July 29, 2025

      Why I’m patiently waiting for the Samsung Z Fold 8 next year (even though the foldable is already great)

      July 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Performance Analysis with Laravel’s Measurement Tools

      July 29, 2025
      Recent

      Performance Analysis with Laravel’s Measurement Tools

      July 29, 2025

      Memoization and Function Caching with this PHP Package

      July 29, 2025

      Laracon US 2025 Livestream

      July 29, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft mysteriously offered a Windows 11 upgrade to this unsupported Windows 10 PC — despite it failing to meet the “non-negotiable” TPM 2.0 requirement

      July 29, 2025
      Recent

      Microsoft mysteriously offered a Windows 11 upgrade to this unsupported Windows 10 PC — despite it failing to meet the “non-negotiable” TPM 2.0 requirement

      July 29, 2025

      With Windows 10’s fast-approaching demise, this Linux migration tool could let you ditch Microsoft’s ecosystem with your data and apps intact — but it’s limited to one distro

      July 29, 2025

      Windows 10 is 10 years old today — let’s look back at 10 controversial and defining moments in its history

      July 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

    Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

    April 28, 2025

    Achieving strong, multi-step reasoning in LMs remains a major challenge, despite notable progress in general task performance. Such reasoning is crucial for complex problem-solving domains, such as scientific research and strategic planning. Traditionally, enhancing reasoning skills involves supervised fine-tuning (SFT), where models learn by imitating step-by-step reasoning demonstrations from more advanced models, such as o1. While effective, this method heavily depends on the availability of high-quality reasoning traces, which are costly and risk promoting shallow mimicry over genuine logical exploration. RL offers an alternative by enabling models to learn directly from reward signals, encouraging broader reasoning exploration. However, RL approaches are often resource-heavy and complex, raising the question of how to build reasoning-capable models cost-effectively.

    Following the release of strong models like o1-preview, several open-source efforts such as STILL, Sky-T1, SimpleRL, PRIME, and DeepScaleR have explored efficient strategies to replicate or surpass o1’s reasoning capabilities. Techniques include lightweight imitation learning, scalable instruction tuning, and simplified RL methods. Meanwhile, newer innovations, such as Group Relative Policy Optimization (GRPO), enhance RL training efficiency by eliminating the need for separate value networks, as seen in models like DeepSeek-R1. To further lower training costs, researchers are also investigating Low-Rank Adaptation (LoRA) methods, which update only a small subset of model parameters, maintaining modularity while preserving reasoning ability. This approach enables efficient fine-tuning without the computational demands of full-parameter updates.

    Researchers from the University of Southern California introduce Tina, a family of compact reasoning models that achieve strong performance with minimal cost. Using RL enhanced by LoRA on a 1.5B parameter base model, Tina models outperform or match state-of-the-art models at a fraction of the computational expense. Their best model improves reasoning performance by over 20% and achieves 43.33% Pass@1 on AIME24, with a post-training cost of just $9. By leveraging LoRA’s efficiency to adapt reasoning formats while preserving base knowledge, Tina highlights a highly accessible, cost-effective approach, with all resources fully open-sourced.

    Tina is a family of tiny reasoning models built by post-training the DeepSeek-R1-Distill-Qwen-1.5B model using LoRA during reinforcement learning with a GRPO-style approach. The framework emphasizes minimalism—tiny models, small parameter updates, and a low hardware and budget footprint. Tina models were trained using public datasets and replicated setups from models like STILL-3, DeepScaleR, and Open-RS. Training leveraged the OpenR1 codebase, minimal hyperparameter tuning, and just two NVIDIA L40S GPUs, occasionally RTX 6000 Ada GPUs. Training and evaluation costs were low, averaging well under a $100 budget per experiment, making Tina a highly accessible platform for reasoning research.

    To ensure fair comparisons, the authors reevaluated baseline reasoning models using a consistent setup with the LightEval framework and vLLM engine, thereby eliminating variations introduced by previous studies. Six reasoning benchmarks, including AIME 24/25, AMC 23, MATH 500, GPQA, and Minerva, were utilized. They then evaluated Tina models—small, LoRA-trained versions of baseline models—showing that Tina models often outperformed their full-parameter counterparts despite using minimal training (19–57% of an epoch). Further ablation studies revealed that smaller, high-quality datasets, appropriate learning rates, moderate LoRA ranks, and careful choice of RL algorithm significantly impacted performance, confirming the efficiency and robustness of their LoRA-based reasoning approach.

    In conclusion, Tina, a series of lightweight reasoning models that achieve strong performance using minimal computational resources. By applying LoRA during RL on a 1.5 B-parameter base model, they achieve reasoning abilities competitive with larger state-of-the-art models at a post-training cost of just $9. Tina models show over a 20% improvement in reasoning and 43.33% Pass@1 accuracy on AIME24. While showcasing impressive cost-performance efficiency, limitations remain, including the smaller model scale, limited diversity in reasoning tasks, and minimal hyperparameter tuning. All code, logs, and model checkpoints are open-sourced to promote accessible research and further exploration.


    Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDevin AI Introduces DeepWiki: A New AI-Powered Interface to Understand GitHub Repositories
    Next Article What Are the Different Font Styles?

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 29, 2025
    Machine Learning

    Amazon Develops an AI Architecture that Cuts Inference Time 30% by Activating Only Relevant Neurons

    July 29, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    You Can Now Auto-Generate Google Forms Using Gemini Using Prompts or Files – Here’s How

    Operating Systems

    Windows 11 is planning a huge redesign of its Start Menu

    Operating Systems

    Skywings Marketing: Premier Digital Marketing Company in Laxmi Nagar

    Web Development

    Monitor not paper-y enough? A 25-inch color E Ink monitor just dropped for *checks notes* $1,900

    News & Updates

    Highlights

    Orthodontist Orange County

    July 22, 2025

    Post Content Source: Read More 

    The best Minecraft game is free on Amazon Prime

    April 5, 2025

    PlayStation kicks off Summer Sale with massive discounts on your favorite titles

    July 16, 2025

    CVE-2025-4055 – WordPress Multiple Post Type Order Stored Cross-Site Scripting

    May 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.