Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Web Components: Working With Shadow DOM

      July 28, 2025

      Google’s new Opal tool allows users to create mini AI apps with no coding required

      July 28, 2025

      Designing Better UX For Left-Handed People

      July 25, 2025

      This week in AI dev tools: Gemini 2.5 Flash-Lite, GitLab Duo Agent Platform beta, and more (July 25, 2025)

      July 25, 2025

      Microsoft wants you to chat with its browser now – but can you trust this Copilot?

      July 28, 2025

      I tested the Dell XPS’ successor – here are the biggest upgrades (and what’s the same)

      July 28, 2025

      I’m a Linux pro – here are my top 5 command line backup tools for desktops and servers

      July 28, 2025

      Should you buy a refurbished iPad? I tried one from Back Market and here’s my verdict

      July 28, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      elegantweb/sanitizer

      July 28, 2025
      Recent

      elegantweb/sanitizer

      July 28, 2025

      Streamlined String Encryption with Laravel’s Fluent Methods

      July 28, 2025

      Resume PHP

      July 28, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Gamers bypass UK age verification with Death Stranding — no real face or VPN required

      July 28, 2025
      Recent

      Gamers bypass UK age verification with Death Stranding — no real face or VPN required

      July 28, 2025

      New Xbox games launching this week, from July 28 through August 3 — Grounded 2 arrives on Xbox Game Pass

      July 28, 2025

      TikTok’s owner forked Microsoft’s Visual Studio Code and concerns have been raised — reports suggest it’s resource heavy and never stops ‘phoning home’

      July 28, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

    Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA

    April 28, 2025

    Achieving strong, multi-step reasoning in LMs remains a major challenge, despite notable progress in general task performance. Such reasoning is crucial for complex problem-solving domains, such as scientific research and strategic planning. Traditionally, enhancing reasoning skills involves supervised fine-tuning (SFT), where models learn by imitating step-by-step reasoning demonstrations from more advanced models, such as o1. While effective, this method heavily depends on the availability of high-quality reasoning traces, which are costly and risk promoting shallow mimicry over genuine logical exploration. RL offers an alternative by enabling models to learn directly from reward signals, encouraging broader reasoning exploration. However, RL approaches are often resource-heavy and complex, raising the question of how to build reasoning-capable models cost-effectively.

    Following the release of strong models like o1-preview, several open-source efforts such as STILL, Sky-T1, SimpleRL, PRIME, and DeepScaleR have explored efficient strategies to replicate or surpass o1’s reasoning capabilities. Techniques include lightweight imitation learning, scalable instruction tuning, and simplified RL methods. Meanwhile, newer innovations, such as Group Relative Policy Optimization (GRPO), enhance RL training efficiency by eliminating the need for separate value networks, as seen in models like DeepSeek-R1. To further lower training costs, researchers are also investigating Low-Rank Adaptation (LoRA) methods, which update only a small subset of model parameters, maintaining modularity while preserving reasoning ability. This approach enables efficient fine-tuning without the computational demands of full-parameter updates.

    Researchers from the University of Southern California introduce Tina, a family of compact reasoning models that achieve strong performance with minimal cost. Using RL enhanced by LoRA on a 1.5B parameter base model, Tina models outperform or match state-of-the-art models at a fraction of the computational expense. Their best model improves reasoning performance by over 20% and achieves 43.33% Pass@1 on AIME24, with a post-training cost of just $9. By leveraging LoRA’s efficiency to adapt reasoning formats while preserving base knowledge, Tina highlights a highly accessible, cost-effective approach, with all resources fully open-sourced.

    Tina is a family of tiny reasoning models built by post-training the DeepSeek-R1-Distill-Qwen-1.5B model using LoRA during reinforcement learning with a GRPO-style approach. The framework emphasizes minimalism—tiny models, small parameter updates, and a low hardware and budget footprint. Tina models were trained using public datasets and replicated setups from models like STILL-3, DeepScaleR, and Open-RS. Training leveraged the OpenR1 codebase, minimal hyperparameter tuning, and just two NVIDIA L40S GPUs, occasionally RTX 6000 Ada GPUs. Training and evaluation costs were low, averaging well under a $100 budget per experiment, making Tina a highly accessible platform for reasoning research.

    To ensure fair comparisons, the authors reevaluated baseline reasoning models using a consistent setup with the LightEval framework and vLLM engine, thereby eliminating variations introduced by previous studies. Six reasoning benchmarks, including AIME 24/25, AMC 23, MATH 500, GPQA, and Minerva, were utilized. They then evaluated Tina models—small, LoRA-trained versions of baseline models—showing that Tina models often outperformed their full-parameter counterparts despite using minimal training (19–57% of an epoch). Further ablation studies revealed that smaller, high-quality datasets, appropriate learning rates, moderate LoRA ranks, and careful choice of RL algorithm significantly impacted performance, confirming the efficiency and robustness of their LoRA-based reasoning approach.

    In conclusion, Tina, a series of lightweight reasoning models that achieve strong performance using minimal computational resources. By applying LoRA during RL on a 1.5 B-parameter base model, they achieve reasoning abilities competitive with larger state-of-the-art models at a post-training cost of just $9. Tina models show over a 20% improvement in reasoning and 43.33% Pass@1 accuracy on AIME24. While showcasing impressive cost-performance efficiency, limitations remain, including the smaller model scale, limited diversity in reasoning tasks, and minimal hyperparameter tuning. All code, logs, and model checkpoints are open-sourced to promote accessible research and further exploration.


    Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDevin AI Introduces DeepWiki: A New AI-Powered Interface to Understand GitHub Repositories
    Next Article What Are the Different Font Styles?

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 28, 2025
    Machine Learning

    Zhipu AI Just Released GLM-4.5 Series: Redefining Open-Source Agentic AI with Hybrid Reasoning

    July 28, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Lysa Myers: “There are still only a handful of women in the security field”

    Development

    Google fixes Chrome zero-day with in-the-wild exploit (CVE-2025-5419)

    Security
    Zero Gravity Tears: Welcome to the Future

    Zero Gravity Tears: Welcome to the Future

    Artificial Intelligence

    The Anatomy of a Perfect Poster: Essential Design Principles

    Web Development

    Highlights

    Development

    What Is Cloud Computing?

    July 16, 2025

    Learn what cloud computing is, its benefits, types, key providers, and cost-saving strategies to help…

    How to Build Slim and Fast Docker Images with Multi-Stage Builds

    May 14, 2025

    CVE-2025-49596: Critical RCE Vulnerability in MCP Inspector Exposes AI Developer Environments

    June 16, 2025

    DevOps won’t scale without platform engineering and here’s why your teams are still stuck

    July 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.