Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Researchers at Apple Propose MobileCLIP: A New Family of Image-Text Models Optimized for Runtime Performance through Multi-Modal Reinforced Training

    Researchers at Apple Propose MobileCLIP: A New Family of Image-Text Models Optimized for Runtime Performance through Multi-Modal Reinforced Training

    April 11, 2024

    In Multi-modal learning, large image-text foundation models have demonstrated outstanding zero-shot performance and improved stability across a wide range of downstream tasks. Models such as Contrastive Language-Image Pretraining (CLIP) show a significant improvement in Multi-modal AI because of its ability to analyze both images and text simultaneously. Recently, a wide range of architectures have proved their ability and performance in achieving vision tasks on resource constraint devices, e.g., pruning ViT architectures helps obtain smaller and faster CLIP models.

    However, models like CLIP utilize large transformer-based encoders with significant memory and latency overhead, which pose challenges for deployment on mobile devices. Also, there are two problems that this paper addresses, first one is the trade-off between runtime performance and the accuracy of different architectures, which slows down the analysis of architectural designs. Further, large-scale training of CLIP models is expensive and disturbs the rapid growth and exploration of DataCompDR-12M and DataCompDR-1B. The second problem highlights the reduced capacity of smaller architectures, which leads to subpar accuracy.

    Researchers from Apple introduced MobileCLIP, a new family of image-text models optimized for runtime performance through an efficient training approach, namely multi-modal reinforced training. MobileCLIP sets a new state-of-the-art system to balance speed and accuracy and retrieve tasks across multiple datasets. Moreover, the training approach utilizes knowledge transfer from an image captioning model and a collection of robust CLIP encoders to enhance the accuracy of efficient models. Additional knowledge is stored in a reinforced dataset to avoid the train-time compute overhead for this training method. 

    The proposed multi-modal reinforced training approach is combined with DataCompDR to solve the challenges addressed in this paper. Its accuracy is higher than the original dataset for a given compute budget. This is achieved by storing synthetic captions and teacher embeddings in the dataset, followed by a dataset reinforcement strategy, which helps to avoid extra training time. Its main components are (a) leveraging the knowledge of an image captioning model via synthetic captions and (b) knowledge distillation of image-text alignments from a collection of robust pre-trained CLIP models.

    .

    Three small variants of MobileCLIP are created with a base of 12-layer transformer, and the fastest variant, MobileCLIP-S0, is five times faster and three times smaller than the standard ViT-B/16 CLIP model. Further, multi-modal reinforced training achieves +2.9% average performance growth on 38 evaluation benchmarks by training the ViT-B/16 image backbone. Also, to avoid noisy datasets, DataComp and data filtering networks are used to enhance the quality of web-sourced datasets, and the CoCa model is used to boost the visual descriptiveness of the captions and generate multiple synthetic captions for each image.

    In conclusion, the proposed model, MobileCLIP, is a new family of efficient image-text models optimized for runtime performance through an efficient training approach, i.e., multi-modal reinforced training. Researchers also introduced DataCompDR, a reinforced training dataset with knowledge from a pre-trained image captioning model and a collection of robust CLIP models. MobileCLIP models trained on DataCompDR set a new state-of-the-art to balance speed and accuracy and retrieve tasks across multiple datasets.

    Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post Researchers at Apple Propose MobileCLIP: A New Family of Image-Text Models Optimized for Runtime Performance through Multi-Modal Reinforced Training appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSnowflake Brings SQL Copilot in Public Preview: A Generative AI-Powered SQL Assistant
    Next Article UC Berkeley Researchers Introduce ThoughtSculpt: Enhancing Large Language Model Reasoning with Innovative Monte Carlo Tree Search and Revision Techniques

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    LLMs vs SLMs vs STLMs: A Comprehensive Analysis

    Development

    Google Uncovers LOSTKEYS Malware Used by Russian COLDRIVER for Cyber Espionage

    Security

    Learnpal AI

    Development

    ACECODER: Enhancing Code Generation Models Through Automated Test Case Synthesis and Reinforcement Learning

    Machine Learning
    Hostinger

    Highlights

    XPipe is an awesome shell connection hub and remote file manager

    May 6, 2025

    XPipe is open source desktop software which lets you create and manage connections to remote…

    Microsoft reveals upcoming changes to Microsoft 365 Developer Program

    April 23, 2025

    Windows 11 finally lets you remove Android/iPhone from Phone Link / Mobile devices

    May 13, 2025

    Newsletter #34: AssemblyAI API Reference & Latest Tutorials

    May 3, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.