Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 9, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 9, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 9, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 9, 2025

      Your password manager is under attack, and this new threat makes it worse: How to defend yourself

      May 9, 2025

      EcoFlow’s new backyard solar energy system starts at $599 – no installation crews or permits needed

      May 9, 2025

      Why Sonos’ cheapest smart speaker is one of my favorites – even a year after its release

      May 9, 2025

      7 productivity gadgets I can’t live without (and why they make such a big difference)

      May 9, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025
      Recent

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025

      Preparing for AI? Here’s How PIM Gets Your Data in Shape

      May 9, 2025

      A Closer Look at the AI Assistant of Oracle Analytics

      May 9, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      kew v3.2.0 improves internet radio support and more

      May 9, 2025
      Recent

      kew v3.2.0 improves internet radio support and more

      May 9, 2025

      GNOME Replace Totem Video Player with Showtime

      May 9, 2025

      Placemark is a web-based tool for geospatial data

      May 9, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs

    This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs

    March 31, 2025

    Creative writing is a domain that thrives on diversity and imagination. Unlike fact-based or task-specific writing, where a single correct output may exist, creative writing involves numerous valid responses to a prompt. Stories, poems, and narratives can branch in countless directions, each with stylistic flavor and meaning. This inherent open-mindedness makes creative writing a prime challenge for AI systems, which need to maintain narrative coherence while producing novel and distinct outputs.

    The core issue lies in how large language models are refined after their initial training. Post-training methods often emphasize quality improvements by aligning responses with user preferences or maximizing reward scores. However, these adjustments inadvertently cause the models to produce responses that are too similar across prompts. In creative settings, this leads to a noticeable drop in output diversity. A lack of variation limits the expressive power of the model, resulting in uniform storylines or similar sentence constructions even when prompts are vastly different.

    Earlier solutions attempted to address this by tweaking decoding methods or prompt strategies. Researchers used sampling temperature adjustment, top-k or top-p filtering, or iterative prompting to introduce randomness. Some explored methods, such as beam search modifications or self-critiquing, to encourage alternative responses. While these helped diversify outputs, they often came with a cost—sacrificing overall response quality, increasing generation time, or introducing inconsistencies in tone and grammar. More crucially, they did not adopt the model’s core training process to learn from diverse samples.

    Researchers from Midjourney and New York University proposed a novel adjustment during the post-training phase. They introduced “Diversified DPO” and “Diversified ORPO”—enhanced versions of two popular preference-based optimization techniques. Their innovation was incorporating a deviation score, quantifying how much a training example differs from others responding to the same prompt. Rare and diverse responses are given more importance during learning by using this score to weight training losses. The researchers specifically implemented these strategies on large models like Meta’s Llama-3.1-8B and Mistral-7B using parameter-efficient fine-tuning via LoRA.

    In this approach, deviation acts as a learning signal. For every training pair of a better and worse response to a prompt, the deviation of the better response is computed using both semantic and stylistic embeddings. These embeddings measure not only content differences but also stylistic uniqueness between responses. The resulting score then influences how much that training pair contributes to the model’s weight updates. This method increases the likelihood that the model generates distinct yet high-quality outputs. The training used over 400,000 prompt-response pairs with Reddit upvotes as quality signals and introduced mixing methods to effectively balance semantic and style deviations.

    Quantitative results demonstrated the success of the proposed method. The best-performing model, Llama-3.1-8B with Diversified DPO using semantic and style deviation (DDPO-both), achieved nearly the same reward score as GPT-4o while significantly outperforming it in diversity. Specifically, the model had semantic diversity approaching that of the human-crafted reference dataset and style diversity slightly below it. In head-to-head human evaluations, 68% of reviewers preferred DDPO-both’s outputs over GPT-4o’s for quality, and 100% chose them as more diverse. Compared to the baseline DPO, DDPO-both still came out ahead, selected 50% of the time for quality and 62% for diversity. When fewer responses per prompt were available during training, slight drops in reward scores were mitigated using a minimum deviation threshold or sampling higher-quality responses.

    This research highlighted a compelling solution to the diversity-quality trade-off in AI-generated creative writing. By emphasizing deviation in training, the researchers enabled models to value uniqueness without compromising coherence. The outcome is a model that delivers richer and more varied storytelling, marking a meaningful step forward in creative AI development.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post This AI Paper Introduces Diversified DPO and ORPO: Post-Training Methods to Boost Output Diversity in Creative Writing with LLMs appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Build a Prototype X-ray Judgment Tool (Open Source Medical Inference System) Using TorchXRayVision, Gradio, and PyTorch
    Next Article Build agentic systems with CrewAI and Amazon Bedrock

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 9, 2025
    Machine Learning

    Meta AI Open-Sources LlamaFirewall: A Security Guardrail Tool to Help Build Secure AI Agents

    May 9, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Empirical evidence for code modularity

    Development

    Windows 11’s Store uses Winget to update Win32 apps with custom installer (Discord, OBS)

    Development

    Unlocking Google Workspace Security: Are You Doing Enough to Protect Your Data?

    Development

    CVE-2025-32964 – ManageWiki Extension Privilege Escalation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Artificial Intelligence

    A Practical Guide to Purchase Order Systems

    May 29, 2024

    Purchase orders (PO) are legal documents that are drawn to finalize contracts between a buyer…

    SMART Filtering: Enhancing Benchmark Quality and Efficiency for NLP Model Evaluation

    November 4, 2024

    The future of embedded analytics and how it’s shaping decision making

    February 10, 2025

    Transcribe audio with Ruby using Universal-1

    April 18, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.