Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      7 MagSafe accessories that I recommend every iPhone user should have

      June 1, 2025

      I replaced my Kindle with an iPad Mini as my ebook reader – 8 reasons why I don’t regret it

      June 1, 2025

      Windows 11 version 25H2: Everything you need to know about Microsoft’s next OS release

      May 31, 2025

      Elden Ring Nightreign already has a duos Seamless Co-op mod from the creator of the beloved original, and it’ll be “expanded on in the future”

      May 31, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025
      Recent

      Photobooth is photobooth software for the Raspberry Pi and PC

      June 1, 2025

      Le notizie minori del mondo GNU/Linux e dintorni della settimana nr 22/2025

      June 1, 2025

      Rilasciata PorteuX 2.1: Novità e Approfondimenti sulla Distribuzione GNU/Linux Portatile Basata su Slackware

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Optimizing Training Data Allocation Between Supervised and Preference Finetuning in Large Language Models

    Optimizing Training Data Allocation Between Supervised and Preference Finetuning in Large Language Models

    February 24, 2025

    Large Language Models (LLMs) face significant challenges in optimizing their post-training methods, particularly in balancing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) approaches. While SFT uses direct instruction-response pairs and RL methods like RLHF use preference-based learning, the optimal allocation of limited training resources between these approaches remains unclear. Recent studies have shown that models can achieve task alignment and improved reasoning capabilities without extensive SFT, challenging traditional sequential post-training pipelines. Moreover, the substantial cost of collecting and annotating human data compared to compute costs creates a need to understand the effectiveness of different training methods under fixed data-annotation budgets.

    Existing research has explored various trade-offs in language model training under fixed budgets, including comparisons between pretraining versus finetuning and finetuning versus model distillation. Studies have examined the data and compute costs of SFT and RL methods in isolation along with cost-efficiency considerations in generating human and synthetic data. While some research shows the effects of high-quality preference data on RL methods like Direct Preference Optimization (DPO) and PPO, other studies focus on the relationship between SFT and RL methods regarding model forgetfulness, generalization, and alignment. However, these studies haven’t failed to address optimal resource allocation between SFT and RL-based approaches under strict data annotation constraints.

    Researchers from the Georgia Institute of Technology have proposed a comprehensive study examining the optimal allocation of training data budgets between SFT and Preference Finetuning (PFT) in LLMs. The study investigates this relationship across four diverse tasks, multiple model sizes, and various data annotation costs. It addresses the “cold start problem” in mathematical tasks, where eliminating SFT leads to suboptimal performance due to distribution shifts when applying DPO directly to the base model. Their findings suggest that while larger data budgets benefit from combining both methods, allocating even a small portion of the budget to SFT can significantly improve performance on analytical tasks.

    The study evaluates the cost-effectiveness and optimal resource allocation between SFT and PFT in post-training LLMs under 10 billion parameters. The research methodology measures data budgets through training examples or monetary annotation costs, assuming equal labor costs for both methods and the availability of training prompts. The experimental setup begins with no task-specific labeled data, using open-source datasets, or synthetically curated data for each target task. To maintain focus on task-specific improvements, general-purpose conversational datasets commonly used in PFT, such as UltraFeedback and Chatbot Arena preferences are excluded. This controlled approach allows for precise measurement of performance improvements resulting from targeted data annotation.

    The results reveal that optimal allocation of the training budget between SFT and PFT methods proves crucial, with properly balanced datasets outperforming suboptimally allocated datasets 2-5 times larger in size. Using 5K examples with 25% SFT allocation for tasks like Summarization, Helpfulness, and Grade School Math matches the performance of 20K examples with 75% SFT allocation. The study identifies that pure SFT excels in low-data scenarios, while larger data budgets benefit from higher proportions of preference data. Moreover, direct preference finetuning on base models shows limited success in mathematical tasks, and allocating even a small portion to SFT significantly improves performance by better aligning the reference model’s response style.

    Hostinger

    In conclusion, this paper provides crucial insights into optimizing LLM post-training under resource constraints, particularly regarding the interplay between SFT and PFT. The study identifies a significant “cold-start problem” when applying PFT directly to base models, which can be mitigated effectively by allocating even 10% of the budget to initial SFT. However, the research acknowledges limitations, including offline methods like DPO and KTO use for RL implementation, and potential biases from using GPT4 for synthetic data generation and evaluation. Moreover, the model size is limited to 10 Billion parameters otherwise it would be extremely compute resource intensive to run thousands of finetuning runs with larger model sizes like 70B parameters.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

    🚨 Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

    The post Optimizing Training Data Allocation Between Supervised and Preference Finetuning in Large Language Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleBuilding a Legal AI Chatbot: A Step-by-Step Guide Using bigscience/T0pp LLM, Open-Source NLP Models, Streamlit, PyTorch, and Hugging Face Transformers
    Next Article This AI Paper from Weco AI Introduces AIDE: A Tree-Search-Based AI Agent for Automating Machine Learning Engineering

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    BOND 2025 AI Trends Report Shows AI Ecosystem Growing Faster than Ever with Explosive User and Developer Adoption

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    That’s a wrap: GitHub Innovation Graph in 2024

    News & Updates

    AI agents, multimodal Phi-3 unveiled at Microsoft Build 2024

    Artificial Intelligence

    AlphaQubit tackles one of quantum computing’s biggest challenges

    Artificial Intelligence

    7 Best Free and Open Source Terminal-Based Batch Renamers

    Development

    Highlights

    Development

    How to write good and reusable Components in Vue.js

    January 9, 2025

    Vue.js Component Style Guide Continue reading on Vue.js Feed » Source: Read More

    Samsung Illustration system

    February 21, 2025

    Microsoft Build 2025: How AI Agents and the Agentic Web Will Reshape Everything

    May 20, 2025

    Cybercriminals Target Ethereum Developers with Fake Hardhat npm Packages

    January 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.