Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 21, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 21, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 21, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 21, 2025

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025

      Windows 11 is getting AI Actions in File Explorer — here’s how to try them right now

      May 21, 2025

      Is The Alters on Game Pass?

      May 21, 2025

      I asked Copilot’s AI to predict the outcome of the Europa League final, and now I’m just sad

      May 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Celebrating GAAD by Committing to Universal Design: Equitable Use

      May 21, 2025
      Recent

      Celebrating GAAD by Committing to Universal Design: Equitable Use

      May 21, 2025

      GAAD and Universal Design in Healthcare – A Deeper Look

      May 21, 2025

      GAAD and Universal Design in Pharmacy – A Deeper Look

      May 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025
      Recent

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025

      Windows 11 is getting AI Actions in File Explorer — here’s how to try them right now

      May 21, 2025

      Is The Alters on Game Pass?

      May 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    June 21, 2024

    There has been a marked movement in the field of AGI systems towards using pretrained, adaptable representations known for their task-agnostic benefits in various applications. Natural language processing (NLP) is a clear example of this tendency since more sophisticated models demonstrate adaptability by learning new tasks and domains from scratch with only basic instructions. The success of natural language processing inspires a similar strategy in computer vision. 

    One of the main obstacles to universal representation for various vision-related tasks is the requirement for broad perceptual ability. In contrast to natural language processing (NLP), computer vision works with complex visual data such as object location, masked contours, and properties. Mastery of various challenging tasks is required to achieve universal representation in computer vision. Distinctiveness and severe hurdles define this endeavor. The lack of thorough visual annotations is a major obstacle that prevents us from building a basic model that can capture the subtleties of spatial hierarchy and semantic granularity. A further obstacle is that there currently needs to be a unified pretraining framework in computer vision that uses a single network architecture to integrate semantic granularity and spatial hierarchy seamlessly.

    A team of Microsoft researchers introduces Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. This solves the problems of needing a consistent architecture and limiting comprehensive data by creating a single, prompt-based representation for all vision activities. Annotated data of high quality and broad scale is required for multitask learning. Using FLD-5B, the data engine generates a complete visual dataset with a total of 5.4B annotations for 126M images—a significant improvement over labor-intensive manual annotation. The engine’s two processing modules are highly efficient. Instead of using a single person to annotate each image, as was done in the past, the first module employs specialized models to do it automatically and in collaboration. A more trustworthy and objective picture interpretation is achieved when numerous models collaborate to attain a consensus, reminiscent of the wisdom of crowds’ ideas. 

    The Florence-2 model stands out for its unique features. It integrates an image encoder and a multi-modality encoder-decoder into a sequence-to-sequence (seq2seq) architecture, following the NLP community’s goal of developing flexible models with a consistent framework. This architecture can handle a variety of vision tasks without requiring task-specific architectural alterations. The model’s unified multitask learning technique with consistent optimization, using the same loss function as the aim, is made possible by uniformizing all annotations in the FLD-5B dataset into textual outputs. Florence-2 is a multi-purpose vision foundation model that can ground, caption, and detect objects using just one model and a standard set of parameters, activated by textual cues.

    Despite its compact size, Florence-2 stands tall in the field, able to compete with larger specialized models. After fine-tuning using publicly available human-annotated data, Florence-2 achieves new state-of-the-art performances on the benchmarks on RefCOCO/+/g. This pre-trained model outperforms supervised and self-supervised models on downstream tasks, including ADE20K semantic segmentation and COCO object detection and instance segmentation. The results speak for themselves, showing significant improvements of 6.9, 5.5, and 5.9 points on the COCO and ADE20K datasets using Mask-RCNN, DIN, and the training efficiency is 4 times better than pre-trained models on ImageNet. This performance is a testament to the effectiveness and reliability of Florence-2.

    Florence-2, with its pre-trained universal representation, has proven to be highly effective. The experimental results demonstrate its prowess in improving a multitude of downstream tasks, instilling confidence in its capabilities. 

    Hostinger

    Check out the Paper and Model Card. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleManage Amazon SageMaker JumpStart foundation model access with private hubs
    Next Article Open-Sora 1.2 by HPC AI Tech: Transforming Video Generation With Advanced, Open-Source Video Generation and Compression

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 22, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4094 – “Acunetix DIGITS WordPress OTP Brute Force Vulnerability”

    May 22, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    US House to Vote on $3 Billion Funding for Removal of Chinese Telecom Equipment

    Development

    CVE-2025-47665 – Bistromatic N360 Splash Screen Stored Cross-site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How Breaches Start: Breaking Down 5 Real Vulns

    Development

    The 19 best Black Friday headphone deals 2024: Early sales live now

    Development
    Hostinger

    Highlights

    Bats – Bash Automated Testing System

    February 1, 2025

    Bats is a TAP-compliant testing framework. It provides a simple way to verify that the…

    Xbox fans may not be able to play Death Stranding 2, but you can buy a $1,500 watch to mark the occasion

    March 16, 2025

    Google’s Gemini AI might soon back up Siri on your iPhone – just like ChatGPT

    February 25, 2025

    HP just announced the world’s first Copilot+ All-in-One PC with a 32-inch 4K display — THIS is my next computer

    January 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.