Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 21, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 21, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 21, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 21, 2025

      The best smart glasses unveiled at I/O 2025 weren’t made by Google

      May 21, 2025

      Google’s upcoming AI smart glasses may finally convince me to switch to a pair full-time

      May 21, 2025

      I tried Samsung’s Project Moohan XR headset at I/O 2025 – and couldn’t help but smile

      May 21, 2025

      Is Google’s $250-per-month AI subscription plan worth it? Here’s what’s included

      May 21, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025
      Recent

      IOT and API Integration With MuleSoft: The Road to Seamless Connectivity

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Low Physical Effort

      May 21, 2025

      Celebrating GAAD by Committing to Universal Design: Flexibility in Use

      May 21, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025
      Recent

      Microsoft open-sources Windows Subsystem for Linux at Build 2025

      May 21, 2025

      Microsoft Brings Grok 3 AI to Azure with Guardrails and Enterprise Controls

      May 21, 2025

      You won’t have to pay a fee to publish apps to Microsoft Store

      May 21, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper from Apple Introduces a Weakly-Supervised Pre-Training Method for Vision Models Using Publicly Available Web-Scale Image-Text Data

    This AI Paper from Apple Introduces a Weakly-Supervised Pre-Training Method for Vision Models Using Publicly Available Web-Scale Image-Text Data

    April 29, 2024

    In recent times, contrastive learning has become a potent strategy for training models to learn efficient visual representations by aligning image and text embeddings. However, one of the difficulties with contrastive learning is the computation needed for pairwise similarity between image and text pairs, especially when working with large-scale datasets.

    In recent research, a team of researchers has presented a new method for pre-training vision models with web-scale image-text data in a weakly supervised manner. Called CatLIP (Categorical Loss for Image-text Pre-training), this approach solves the trade-off between efficiency and scalability on web-scale image-text datasets with weak labeling.

    By extracting labels from text captions, CatLIP views image-text pre-training as a classification problem. The team has shared that this method maintains performance on downstream tasks like ImageNet-1k classification and is much more efficient to train than CLIP. Comprehensive tests have been showcased to confirm CatLIP’s effectiveness.

    The effectiveness of CatLIP was assessed by the team through a comprehensive set of tests involving a range of vision tasks, such as object detection and image segmentation. They showed that this approach preserves high-quality representations that perform well in a variety of visual tests, even with a change in training paradigm.

    The team has summarized their primary contributions as follows.

    By recasting image-text data as a classification job, this study presents a unique way to expedite the pre-training of vision models on such data. 

    CatLIP performs better with data and model scaling, which is especially noticeable in tests utilizing tiny amounts of image-text data. When training the model for longer periods of time than with conventional contrastive learning techniques such as CLIP, the model performs far better.

    Using embeddings linked to target labels from the classification layer, the team has suggested a technique that allows the pre-trained model to transfer information to target tasks in an efficient manner. With this method, embeddings acquired during pre-training can be used to initialize the classification layer in subsequent tasks, enabling data-efficient transfer learning. 

    By means of extensive tests covering multiple downstream tasks, including object recognition and semantic segmentation, the team has demonstrated the effectiveness of the representations that CatLIP has learned. CatLIP achieves similar performance as CLIP but with a much shorter pre-training time, as demonstrated by a pre-training time that is 2.7× faster on the DataComp-1.3B dataset.

    In conclusion, by rephrasing the job as a classification problem, this research proposes a new approach to pre-train vision models on large-scale image-text data. This strategy not only retains good representation quality across varied visual tasks but also significantly speeds up training times. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post This AI Paper from Apple Introduces a Weakly-Supervised Pre-Training Method for Vision Models Using Publicly Available Web-Scale Image-Text Data appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTop Data Science Courses in 2024
    Next Article 50+ Free Seamless Pattern Packs for Designers

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 21, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-20152 – Cisco ISE RADIUS Message Processing Denial of Service Vulnerability

    May 21, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Lilium’s financial collapse triggers CustomCells insolvency at core German sites

    News & Updates

    Centralize HTTP Client Configuration with Laravel’s globalOptions Method

    Development

    CVE-2024-12543 – OpenText Content Management Barcode Attribute Manipulation

    Common Vulnerabilities and Exposures (CVEs)

    How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

    Development

    Highlights

    Development

    Huawei Research Developed MatMulScan: A Parallel Scan Algorithm Transforming Parallel Computing with Tensor Core Units, Enhancing Efficiency and Scalability for Large-Scale Matrix Operations

    November 30, 2024

    Parallel computing continues to advance, addressing the demands of high-performance tasks such as deep learning,…

    [A]synchronous Functional Programming – Intro

    November 27, 2024

    Improved Installation and Frontend Hooks in Laravel Echo 2.1

    May 15, 2025

    RansomHub affiliates linked to rival RaaS gangs

    April 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.