Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 21, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 21, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 21, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 21, 2025

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025

      Windows 11 is getting AI Actions in File Explorer — here’s how to try them right now

      May 21, 2025

      Is The Alters on Game Pass?

      May 21, 2025

      I asked Copilot’s AI to predict the outcome of the Europa League final, and now I’m just sad

      May 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Celebrating GAAD by Committing to Universal Design: Equitable Use

      May 21, 2025
      Recent

      Celebrating GAAD by Committing to Universal Design: Equitable Use

      May 21, 2025

      GAAD and Universal Design in Healthcare – A Deeper Look

      May 21, 2025

      GAAD and Universal Design in Pharmacy – A Deeper Look

      May 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025
      Recent

      Google DeepMind’s CEO says Gemini’s upgrades could lead to AGI — but he still thinks society isn’t “ready for it”

      May 21, 2025

      Windows 11 is getting AI Actions in File Explorer — here’s how to try them right now

      May 21, 2025

      Is The Alters on Game Pass?

      May 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks

    June 21, 2024

    There has been a marked movement in the field of AGI systems towards using pretrained, adaptable representations known for their task-agnostic benefits in various applications. Natural language processing (NLP) is a clear example of this tendency since more sophisticated models demonstrate adaptability by learning new tasks and domains from scratch with only basic instructions. The success of natural language processing inspires a similar strategy in computer vision. 

    One of the main obstacles to universal representation for various vision-related tasks is the requirement for broad perceptual ability. In contrast to natural language processing (NLP), computer vision works with complex visual data such as object location, masked contours, and properties. Mastery of various challenging tasks is required to achieve universal representation in computer vision. Distinctiveness and severe hurdles define this endeavor. The lack of thorough visual annotations is a major obstacle that prevents us from building a basic model that can capture the subtleties of spatial hierarchy and semantic granularity. A further obstacle is that there currently needs to be a unified pretraining framework in computer vision that uses a single network architecture to integrate semantic granularity and spatial hierarchy seamlessly.

    A team of Microsoft researchers introduces Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. This solves the problems of needing a consistent architecture and limiting comprehensive data by creating a single, prompt-based representation for all vision activities. Annotated data of high quality and broad scale is required for multitask learning. Using FLD-5B, the data engine generates a complete visual dataset with a total of 5.4B annotations for 126M images—a significant improvement over labor-intensive manual annotation. The engine’s two processing modules are highly efficient. Instead of using a single person to annotate each image, as was done in the past, the first module employs specialized models to do it automatically and in collaboration. A more trustworthy and objective picture interpretation is achieved when numerous models collaborate to attain a consensus, reminiscent of the wisdom of crowds’ ideas. 

    The Florence-2 model stands out for its unique features. It integrates an image encoder and a multi-modality encoder-decoder into a sequence-to-sequence (seq2seq) architecture, following the NLP community’s goal of developing flexible models with a consistent framework. This architecture can handle a variety of vision tasks without requiring task-specific architectural alterations. The model’s unified multitask learning technique with consistent optimization, using the same loss function as the aim, is made possible by uniformizing all annotations in the FLD-5B dataset into textual outputs. Florence-2 is a multi-purpose vision foundation model that can ground, caption, and detect objects using just one model and a standard set of parameters, activated by textual cues.

    Despite its compact size, Florence-2 stands tall in the field, able to compete with larger specialized models. After fine-tuning using publicly available human-annotated data, Florence-2 achieves new state-of-the-art performances on the benchmarks on RefCOCO/+/g. This pre-trained model outperforms supervised and self-supervised models on downstream tasks, including ADE20K semantic segmentation and COCO object detection and instance segmentation. The results speak for themselves, showing significant improvements of 6.9, 5.5, and 5.9 points on the COCO and ADE20K datasets using Mask-RCNN, DIN, and the training efficiency is 4 times better than pre-trained models on ImageNet. This performance is a testament to the effectiveness and reliability of Florence-2.

    Florence-2, with its pre-trained universal representation, has proven to be highly effective. The experimental results demonstrate its prowess in improving a multitude of downstream tasks, instilling confidence in its capabilities. 

    Check out the Paper and Model Card. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post Microsoft Releases Florence-2: A Novel Vision Foundation Model with a Unified, Prompt-based Representation for a Variety of Computer Vision and Vision-Language Tasks appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleManage Amazon SageMaker JumpStart foundation model access with private hubs
    Next Article Open-Sora 1.2 by HPC AI Tech: Transforming Video Generation With Advanced, Open-Source Video Generation and Compression

    Related Posts

    Development

    How JavaScript Lint Rules Work (and Why Abstract Syntax Trees Matter)

    May 21, 2025
    Development

    Will “Vibe Coders” Take Our Dev Jobs?

    May 21, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Alien Kids Academy

    Artificial Intelligence

    Get a confident smile with a dental implant in Fort Worth at Prestige Dental. Our skilled professionals offer high-quality implants that look and function like natural teeth. Restore your smile and improve your oral health with our advanced dental implant services.

    Development

    MS Exchange Server Flaws Exploited to Deploy Keylogger in Targeted Attacks

    Development

    Why SQL is Forever followup

    Development

    Highlights

    Artificial Intelligence

    Build or buy? What industry leaders are choosing

    November 21, 2024

    In 2024, the benefits of integrating AI into business processes and products is clear—but the…

    Buying a Mac or iPad for school? You can get a $150 Apple gift card. Here’s how

    July 3, 2024

    CVE-2025-41399 – F5 BIG-IP SCTP Profile Memory Exhaustion Vulnerability

    May 7, 2025

    Laravel Debounce

    January 14, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.