Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 17, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 17, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 17, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 17, 2025

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025

      Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

      May 17, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025
      Recent

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025

      Apps in Generative AI – Transforming the Digital Experience

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025
      Recent

      Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

      May 17, 2025

      If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

      May 17, 2025

      Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

      May 17, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»GenAI-Arena: An Open Platform for Community-Based Evaluation of Generative AI Models

    GenAI-Arena: An Open Platform for Community-Based Evaluation of Generative AI Models

    June 13, 2024

    Generative AI has made remarkable progress in revolutionizing fields like image and video generation, driven by innovative algorithms, architectures, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, and FVD often fail to capture the nuanced quality and user satisfaction associated with generative outputs. While image generation and manipulation technologies have advanced rapidly, enabling applications across domains like art, visual enhancement, and medical imaging, navigating the multitude of available models and assessing their performance remains challenging. Traditional metrics like PSNR, SSIM, LPIPS, and FID provide valuable but specific insights into precise aspects of visual content generation, often falling short in comprehensively evaluating overall model performance, especially regarding subjective qualities like aesthetics and user satisfaction.

    Numerous methods have been proposed to evaluate the performance of multimodal generative models across various aspects. For image generation, methods like CLIPScore measure text-alignment, while IS, FID, PSNR, SSIM, and LPIPS assess image fidelity and perceptual similarity. Recent works use multimodal large language models (MLLMs) as judges, such as T2I-CompBench using miniGPT4, TIFA adapting visual question answering, and VIEScore reporting MLLMs’ potential to replace human judges. For video generation, metrics like FVD measure frame coherence and quality, while CLIPSIM utilizes image-text similarity models. However, these automatic metrics still lag behind human preferences, with low correlation raising doubts about their reliability. Generative AI evaluation platforms aim to systematically rank models, with benchmark suites like T2ICompBench, HEIM, ImagenHub for images, and VBench, EvalCrafter for videos. Despite functionality, these benchmarks rely on model-based metrics less reliable than human evaluation. Model arenas have emerged to collect direct human preferences for ranking, but no existing arena focuses specifically on generative AI models.

    The researchers from the University of Waterloo have introduced GenAI-Arena, a robust platform designed to enable fair evaluation of generative AI models. Inspired by successful implementations in other domains, GenAI-Arena offers a dynamic and interactive platform where users can generate images, compare them side-by-side, and vote for their preferred models. This platform simplifies the process of comparing different models and provides a ranking system that reflects human preferences, offering a more holistic evaluation of model capabilities. GenAI-Arena is the first evaluation platform with comprehensive evaluation capabilities across multiple properties, supporting a wide range of tasks including text-to-image generation, text-guided image editing, and text-to-video generation, along with a public voting process to ensure labeling transparency. The votes are utilized to assess the evaluation ability of MLLM evaluators. The platform excels in its versatility and transparency. It has collected over 6000 votes for three multimodal generative tasks and it has constructed leaderboards for each task, identifying the state-of-the-art models.

    GenAI-Arena supports text-to-image generation, image editing, and text-to-video generation tasks with features like anonymous side-by-side voting, battle playground, direct generation tab, and leaderboards. The platform standardizes model inference with fixed hyper-parameters and prompts for fair comparison. It enforces unbiased voting through anonymity, where users vote their preferences between anonymously generated outputs, calculating Elo rankings. This architecture allows for a democratic, accurate assessment of model performance across multiple tasks.

    The researchers report their leaderboard ranking at the time of writing. For image generation with 4443 votes collected, Playground V2.5 and Playground V2 models from Playground.ai top the ranks, following the same SDXL architecture but trained on a private dataset, significantly outperforming the 7th-ranked SDXL which highlights the importance of training data. StableCascade utilizing an efficient cascade architecture ranks next, beating SDXL despite only 10% of SD-2.1’s training cost, underscoring the importance of diffusion architecture. For image editing with 1083 votes, MagicBrush, InFEdit, CosXLEdit, and InstructPix2Pix enabling localized editing rank higher, while older methods like Prompt-to-Prompt producing completely different images rank lower despite high-quality outputs. In text-to-video with 1568 votes, T2VTurbo leads with the highest Elo score as the most effective model, followed closely by StableVideoDiffusion, VideoCrafter2, AnimateDiff, and others like LaVie, OpenSora, ModelScope with decreasing performance.

    In this study, GenAI-Arena, an open platform driven by community voting is introduced to rank generative models across text-to-image, image editing, and text-to-video tasks based on user preferences for transparency. Over 6000 votes collected from February to June 2024 were used to compile Elo leaderboards, identifying state-of-the-art models while analysis revealed potential biases. The high-quality human preference data was released as GenAI-Bench, exposing the poor correlation of existing multimodal language models with human judgments on generated content quality and other aspects.

    Check out the Paper and HF Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    The post GenAI-Arena: An Open Platform for Community-Based Evaluation of Generative AI Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLarge Generative Graph Models (LGGMs): A New Class of Graph Generative Model Trained on a Large Corpus of Graphs
    Next Article Strategic Partnerships in Biotech

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 17, 2025
    Development

    Learn A1 Level Spanish

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Dynasty Warriors: Origins crushes as the largest debut in series history

    News & Updates

    Summary of “AI Leaders Spill Their Secrets” Webinar

    Development

    Error’d: Too Spicy For My Hat

    Development

    OpenAI brings ChatGPT Deep Research to Microsoft OneDrive and SharePoint, restricted by region and membership

    News & Updates

    Highlights

    Development

    Master Object Oriented Programming in Python

    January 29, 2025

    Object-oriented programming (OOP) is one of the most essential paradigms in modern software development. It…

    Rilasciato PeerTube 7.1: Miglioramenti per i Podcast e una Riproduzione più Stabile

    March 19, 2025

    CVE-2025-32390 – EspoCRM HTML Injection Vulnerability

    May 12, 2025

    Updated open source JavaScript library Tads Basic Game Objects with new examples

    December 20, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.