Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

      May 18, 2025

      5 ways you can plug the widening AI skills gap at your business

      May 18, 2025

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

      May 18, 2025
      Recent

      New Xbox games launching this week, from May 19 through May 25 — Onimusha 2 remaster arrives

      May 18, 2025

      Windows 11 KB5058411 install fails, File Explorer issues (May 2025 Update)

      May 18, 2025

      Microsoft Edge could integrate Phi-4 mini to enable “on device” AI on Windows 11

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs

    This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs

    March 31, 2025

    Creative writing is a domain that thrives on diversity and imagination. Unlike fact-based or task-specific writing, where a single correct output may exist, creative writing involves numerous valid responses to a prompt. Stories, poems, and narratives can branch in countless directions, each with stylistic flavor and meaning. This inherent open-mindedness makes creative writing a prime challenge for AI systems, which need to maintain narrative coherence while producing novel and distinct outputs.

    The core issue lies in how large language models are refined after their initial training. Post-training methods often emphasize quality improvements by aligning responses with user preferences or maximizing reward scores. However, these adjustments inadvertently cause the models to produce responses that are too similar across prompts. In creative settings, this leads to a noticeable drop in output diversity. A lack of variation limits the expressive power of the model, resulting in uniform storylines or similar sentence constructions even when prompts are vastly different.

    Earlier solutions attempted to address this by tweaking decoding methods or prompt strategies. Researchers used sampling temperature adjustment, top-k or top-p filtering, or iterative prompting to introduce randomness. Some explored methods, such as beam search modifications or self-critiquing, to encourage alternative responses. While these helped diversify outputs, they often came with a cost—sacrificing overall response quality, increasing generation time, or introducing inconsistencies in tone and grammar. More crucially, they did not adopt the model’s core training process to learn from diverse samples.

    Researchers from Midjourney and New York University proposed a novel adjustment during the post-training phase. They introduced “Diversified DPO” and “Diversified ORPO”—enhanced versions of two popular preference-based optimization techniques. Their innovation was incorporating a deviation score, quantifying how much a training example differs from others responding to the same prompt. Rare and diverse responses are given more importance during learning by using this score to weight training losses. The researchers specifically implemented these strategies on large models like Meta’s Llama-3.1-8B and Mistral-7B using parameter-efficient fine-tuning via LoRA.

    In this approach, deviation acts as a learning signal. For every training pair of a better and worse response to a prompt, the deviation of the better response is computed using both semantic and stylistic embeddings. These embeddings measure not only content differences but also stylistic uniqueness between responses. The resulting score then influences how much that training pair contributes to the model’s weight updates. This method increases the likelihood that the model generates distinct yet high-quality outputs. The training used over 400,000 prompt-response pairs with Reddit upvotes as quality signals and introduced mixing methods to effectively balance semantic and style deviations.

    Quantitative results demonstrated the success of the proposed method. The best-performing model, Llama-3.1-8B with Diversified DPO using semantic and style deviation (DDPO-both), achieved nearly the same reward score as GPT-4o while significantly outperforming it in diversity. Specifically, the model had semantic diversity approaching that of the human-crafted reference dataset and style diversity slightly below it. In head-to-head human evaluations, 68% of reviewers preferred DDPO-both’s outputs over GPT-4o’s for quality, and 100% chose them as more diverse. Compared to the baseline DPO, DDPO-both still came out ahead, selected 50% of the time for quality and 62% for diversity. When fewer responses per prompt were available during training, slight drops in reward scores were mitigated using a minimum deviation threshold or sampling higher-quality responses.

    This research highlighted a compelling solution to the diversity-quality trade-off in AI-generated creative writing. By emphasizing deviation in training, the researchers enabled models to value uniqueness without compromising coherence. The outcome is a model that delivers richer and more varied storytelling, marking a meaningful step forward in creative AI development.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch
    Next Article Build agentic systems with CrewAI and Amazon Bedrock

    Related Posts

    Machine Learning

    LLMs Struggle to Act on What They Know: Google DeepMind Researchers Use Reinforcement Learning Fine-Tuning to Bridge the Knowing-Doing Gap

    May 19, 2025
    Machine Learning

    Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency

    May 19, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    20+ Best Wedding Stationery Templates for InDesign in 2025

    Learning Resources

    How to Create an AI-Powered Newsletter Aggregator with React and AI Agents

    Development

    CodeSOD: Find the First Function to Cut

    News & Updates

    I replaced my high-end Wi-Fi 7 router with a more affordable option – and didn’t regret it

    News & Updates

    Highlights

    Development

    Elden Ring: How to access the DLC and start Shadow of the Erdtree

    June 12, 2024

    Elden Ring: Shadow of the Erdtree is almost here, but how will you actually start…

    Windows Administrators Blog Named One of FeedSpot’s Top 25 Microsoft Windows Blogs

    May 10, 2025

    gopass is a simple but powerful password manager for your terminal

    March 22, 2025

    20 Essential commands every user should know on Command Prompt for Windows 11

    January 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.