Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies

    NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies

    June 18, 2024

    In supervised multi-modal learning, data is mapped from various modalities to a target label using information about the boundaries between the modalities. Different fields have been interested in this issue: autonomous vehicles, healthcare, robots, and many more. Although multi-modal learning is a fundamental paradigm in machine learning, its efficacy differs depending on the task at hand. In some situations, a multi-modal learner performs better than a uni-modal learner. Still, in other cases, it might not be better than a single uni-modal learner or a mixture of only two. These conflicting findings highlight the need for a guiding framework to clarify the reasons behind the performance gaps between multi-modal models and to lay out a standard procedure for developing models that better use multi-modal data. 

    Researchers from New York University, Genentech, and CIFAR are embarking on a groundbreaking journey to resolve these inconsistencies. They are introducing a novel, more principled approach to multi-modal learning, one that has never been explored before, and by identifying the underlying variables that cause them. Using a unique probabilistic perspective, they propose a mechanism that generates data and examines the supervised multi-modal learning problem.

    Since this selection variable produces the interdependence between the modalities and the label, it is always set to one. This selection mechanism’s efficacy differs throughout datasets. Dependencies between modalities and labels, known as inter-modality dependencies, are amplified in cases of strong selection effects. In contrast, when the selection impact is modest, intra-modality dependencies—dependencies between individual modalities and the label—become increasingly important. 

    The proposed paradigm assumes that labels are the primary source of modalities-specific data. It further specifies the connection between the label, the selection process, and the various modalities. From one use case to the next, the amount to which the output relies on data from different modalities and the relationships between them varies. A multi-modal system has to simulate the inter- and intra-modality dependencies because it’s important to know how strong these dependencies are regarding the ultimate goal. The team accomplished this by developing and merging classifiers for each modality to capture the dependencies within each modality and a classifier to capture the dependencies between the output label and the interactions across different modes. 

    The I2M2 method is derived from the multi-modal generative model, a widely used approach in multi-modal learning. However, the prior research on multi-modal learning can be divided into two groups using the suggested framework. The methods of inter-modal modeling, which are grouped in the first group, rely heavily on detecting inter-modal relationships to predict the target. Despite their theoretical capability to capture connections between and within modalities, they often fail in practice due to unfulfilled assumptions about the multi-modal learning-generating model. The methods used in intra-modality modeling, which fall under the second group, rely solely on labels for interactions between modalities, limiting their effectiveness. 

    In contradiction to the goal of multi-modal learning, these methods fail to grasp the interdependence of the modalities for prediction. When predicting the label, inter-modality methods work well when modalities exchange substantial information, but intra-modality methods work well when cross-modality information is scarce or nonexistent. 

    Because it is not necessary to know in advance how strong these dependencies are, the suggested I2M2 architecture overcomes this drawback. Because it explicitly describes interdependence across and within modalities, it can adapt to different contexts and still be effective. The results demonstrate that I2M2 is not just superior, but a game-changer, to both intra- and inter-modality approaches by validating researcher’s claims on various datasets. Automatic diagnosis utilizing knee MRI scans and mortality and ICD-9 code prediction in the MIMIC-III dataset are two examples of the many healthcare jobs to which this technology is applied. Findings on vision-and-language tasks like NLVR2 and VQA further prove the transformative potential of I2M2.

    Dependencies differ in strength between datasets, as our comprehensive evaluation indicates; the fastMRI dataset benefits more from intra-modality dependencies, whereas the NLVR2 dataset finds more relevance in inter-modality dependencies. The AV-MNIST, MIMIC-III, and VQA datasets are affected by both dependencies. In every respect, I2M2 succeeds, guaranteeing solid performance independent of the relative importance of its dependencies. This thorough research and its robust findings instill confidence in the effectiveness of I2M2.  

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 44k+ ML SubReddit

    Why does multi-modal modeling struggle compared to using a single modality or naive combinations of multiple modalities?@taromakino, @suchop, @kchonyc, and I reveal factors behind these challenges and proposes a modality-agnostic framework to overcome them.

    1/7 pic.twitter.com/qahQKEGqnY

    — Divyam Madaan (@dmadaan_) June 12, 2024

    The post NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleApple Releases 4M-21: A Very Effective Multimodal AI Model that Solves Tens of Tasks and Modalities
    Next Article Exploring Offline Reinforcement Learning RL: Offering Practical Advice for Domain-Specific Practitioners and Future Algorithm Development

    Related Posts

    Machine Learning

    LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

    May 17, 2025
    Machine Learning

    This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Avowed Ygwulf guide: Should you spare the assassin in An Untimely End quest?

    News & Updates

    How do NVIDIA’s RTX 5000 GPUs perform without DLSS? We just got our first look.

    News & Updates

    Got a Microsoft Teams invite? Storm-2372 gang exploit device codes in global phishing attacks

    Development

    This AI Paper from MIT Offers a Guide for Fine-Tuning Specific Material Properties Using Machine Learning

    Development
    Hostinger

    Highlights

    News & Updates

    The Outer Worlds 2: Xbox Game Pass, gameplay, and everything you need to know

    April 17, 2025

    Here are all the details you need to know about The Outer Worlds 2, including…

    The Godfather of Modern AI “Mr Mohan” Plans to Launch 92 AI Startups

    April 19, 2024

    萬通教育進軍線上教育市場成績亮眼, MongoDB Atlas 扮演幕後功臣

    February 11, 2025

    ast-grep performs structural search, lint and rewriting

    April 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.