Demonstration ITerated Task Optimization (DITTO): A Novel AI Method that Aligns Language Model Outputs Directly with Userâ€™s Demonstrated Behaviors

Language models (LMs) are designed to reflect a broad range of voices, leading to outputs that donâ€™t perfectly match any single perspective. To avoid generic responses, one can use LLMs through supervised fine-tuning (SFT) or reinforcement learning with human feedback (RLHF). However, these methods need huge datasets, making them impractical for new and specific tasks. Moreover, there is often a mismatch between the universal style trained into an LLM through instruction and preference tuning needed for specific applications. This mismatch results in LLM outputs feeling generic and lacking a distinctive voice.

Several methods have been developed to address these challenges. One of the approaches involves LLMs and Preference Finetuning in which LLMs are trained on huge datasets to perform well with careful prompting. However, designing prompts can be difficult and sensitive to variations, so it is often necessary to finetune these models on large datasets and use RLHF. Another strategy is self-improvement, where iterative sampling is used to enhance LLMs. For example, methods like STaR are supervised by verifying the correctness of its outputs. Lastly, Online Imitation Learning can improve a policy beyond the demonstratorâ€™s performance. However, these approaches need to learn a reward function and are not applicable to LLMs.

Researchers from Standford University have introduced Demonstration ITerated Task Optimization (DITTO), a method that aligns language model outputs directly with the userâ€™s demonstrated behaviors. It is derived using ideas from online imitation learning and can generate online comparison data at a low cost. To generate these data, DITTO prioritizes usersâ€™ demonstrations over output from the LLM and its intermediate checkpoints. Moreover, the win rates of this method outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. Also, it provides a novel way to effectively customize LLMs using direct feedback from demonstrations.

DITTO is capable of learning fine-grained style and task alignment across domains like news articles, emails, and blog posts. It is an iterative process that contains three components: (a) On the set of expert demonstrations, supervised fine-tuning is executed for a limited number of gradient steps; (b) a New dataset is constructed during the training process by sampling completions for each demonstration and adding it to the ranking over policies,Â and (c) RLHF is used for updating the policy, particularly using batches sampled through the previously mentioned process.Â

The results of DITTO is evaluated with GPT-4 eval and averaged across all authors, where it outperforms all baselines with an average win rate of 77.09% across CMCC (71.67%) and CCAT50 (82.50%). It provides an average increase of 11.7% win rate as compared to SFT which serves as a strong baseline (56.78% on CMCC, 73.89% on CCAT). Further, in user study results, DITTO outperforms baseline methods with DITTO (72.1% win-rate) > SFT (60.1%) > few-shot (48.1%) > self-prompt (44.2%) > zero-shot (25.0%). Also, self-promoting performs a little worse than giving examples in a few-shot prompt and underperforms DITTO.

In conclusion, researchers from Standford University have introduced Demonstration ITerated Task Optimization (DITTO), a method that aligns language model outputs directly with the userâ€™s demonstrated behaviors and generates online comparison data from demonstrations. In this paper, researchers highlighted the importance of using demonstrations as feedback and proved that even a small number of demonstrated behaviors can provide a strong signal of an individualâ€™s specific preferences. However, other model sizes are not tested by researchers because of computational cost, and additional analysis is needed by the types of preference data needed. So, there is a need for future work in this domain.Â

Check out theÂ Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post Demonstration ITerated Task Optimization (DITTO): A Novel AI Method that Aligns Language Model Outputs Directly with Userâ€™s Demonstrated Behaviors appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

Demonstration ITerated Task Optimization (DITTO): A Novel AI Method that Aligns Language Model Outputs Directly with Userâ€™s Demonstrated Behaviors

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

Mirai Botnet Launches Record 5.6 Tbps DDoS Attack with 13,000+ IoT Devices

Previsioni sul Mondo GNU/Linux per il 2025

This is EVERYTHING Microsoft killed in 2024 — from handy apps to failed devices

Look Closer, Inspiration Lies Everywhere (February 2025 Wallpapers Edition)

Apple Rolls Out Critical AirPods Firmware Update to Fix Bluetooth Security Flaw

MEDEC: A Benchmark for Detecting and Correcting Medical Errors in Clinical Notes Using LLMs

Overviewing Main Principles and Upcoming Trends of DevOps

The anatomy of a React Island

Demonstration ITerated Task Optimization (DITTO): A Novel AI Method that Aligns Language Model Outputs Directly with Userâ€™s Demonstrated Behaviors

Related Posts