Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini

    Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini

    March 25, 2025

    ​In the evolving field of artificial intelligence, vision-language models (VLMs) have become essential tools, enabling machines to interpret and generate insights from both visual and textual data. Despite advancements, challenges remain in balancing model performance with computational efficiency, especially when deploying large-scale models in resource-limited settings.​

    Qwen has introduced the Qwen2.5-VL-32B-Instruct, a 32-billion-parameter VLM that surpasses its larger predecessor, the Qwen2.5-VL-72B, and other models like GPT-4o Mini, while being released under the Apache 2.0 license. This development reflects a commitment to open-source collaboration and addresses the need for high-performing yet computationally manageable models.​

    Technically, the Qwen2.5-VL-32B-Instruct model offers several enhancements:​

    • Visual Understanding: The model excels in recognizing objects and analyzing texts, charts, icons, graphics, and layouts within images.​
    • Agent Capabilities: It functions as a dynamic visual agent capable of reasoning and directing tools for computer and phone interactions.​
    • Video Comprehension: The model can understand videos over an hour long and pinpoint relevant segments, demonstrating advanced temporal localization.​
    • Object Localization: It accurately identifies objects in images by generating bounding boxes or points, providing stable JSON outputs for coordinates and attributes.​
    • Structured Output Generation: The model supports structured outputs for data like invoices, forms, and tables, benefiting applications in finance and commerce.​

    These features enhance the model’s applicability across various domains requiring nuanced multimodal understanding. ​

    Empirical evaluations highlight the model’s strengths:​

    • Vision Tasks: On the Massive Multitask Language Understanding (MMMU) benchmark, the model scored 70.0, surpassing the Qwen2-VL-72B’s 64.5. In MathVista, it achieved 74.7 compared to the previous 70.5. Notably, in OCRBenchV2, the model scored 57.2/59.1, a significant improvement over the prior 47.8/46.1. In Android Control tasks, it achieved 69.6/93.3, exceeding the previous 66.4/84.4.​
    • Text Tasks: The model demonstrated competitive performance with a score of 78.4 on MMLU, 82.2 on MATH, and an impressive 91.5 on HumanEval, outperforming models like GPT-4o Mini in certain areas.​

    These results underscore the model’s balanced proficiency across diverse tasks. ​

    In conclusion, the Qwen2.5-VL-32B-Instruct represents a significant advancement in vision-language modeling, achieving a harmonious blend of performance and efficiency. Its open-source availability under the Apache 2.0 license encourages the global AI community to explore, adapt, and build upon this robust model, potentially accelerating innovation and application across various sectors.


    Check out the Model Weights. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    The post Qwen Releases the Qwen2.5-VL-32B-Instruct: A 32B Parameter VLM that Surpasses Qwen2.5-VL-72B and Other Models like GPT-4o Mini appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop Agentic AI Frameworks You Need in 2025
    Next Article A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 16, 2025
    Machine Learning

    DanceGRPO: A Unified Framework for Reinforcement Learning in Visual Generation Across Multiple Paradigms and Tasks

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Transgate | Convert Audio to text in min

    Web Development

    Can AI detectors save us from ChatGPT? I tried 6 online tools to find out

    Development

    The Emergence of Sustainable UX Design

    Development

    50+ Best Free Lightroom Presets for Photographers

    Development

    Highlights

    Cisco: hardcoded token in wireless controller software geeft aanvaller rootrechten

    May 8, 2025

    Cisco: hardcoded token in wireless controller software geeft aanvaller rootrechten

    Cisco waarschuwt voor een kritieke kwetsbaarheid in de IOS XE wireless controller software waardoor een ongeauthenticeerde remote aanvaller willekeurige commando’s als root kan uitvoeren. De impact va …
    Read more

    Published Date:
    May 08, 2025 (3 hours, 22 minutes ago)

    Vulnerabilities has been mentioned in this article.

    CVE-2025-20188

    Plop Linux – distribution designed for advanced Linux users

    January 12, 2025

    Surface Pro 11 vs. MacBook Pro 14 (M3): Comparing design, features, and performance

    June 22, 2024

    CVE-2025-3607 – WordPress Frontend Login and Registration Blocks Privilege Escalation Vulnerability

    April 24, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.