NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies

In supervised multi-modal learning, data is mapped from various modalities to a target label using information about the boundaries between the modalities. Different fields have been interested in this issue: autonomous vehicles, healthcare, robots, and many more. Although multi-modal learning is a fundamental paradigm in machine learning, its efficacy differs depending on the task at hand. In some situations, a multi-modal learner performs better than a uni-modal learner. Still, in other cases, it might not be better than a single uni-modal learner or a mixture of only two. These conflicting findings highlight the need for a guiding framework to clarify the reasons behind the performance gaps between multi-modal models and to lay out a standard procedure for developing models that better use multi-modal data.Â

Researchers from New York University, Genentech, and CIFAR are embarking on a groundbreaking journey to resolve these inconsistencies. They are introducing a novel, more principled approach to multi-modal learning, one that has never been explored before, and by identifying the underlying variables that cause them. Using a unique probabilistic perspective, they propose a mechanism that generates data and examines the supervised multi-modal learning problem.

Since this selection variable produces the interdependence between the modalities and the label, it is always set to one. This selection mechanismâ€™s efficacy differs throughout datasets. Dependencies between modalities and labels, known as inter-modality dependencies, are amplified in cases of strong selection effects. In contrast, when the selection impact is modest, intra-modality dependenciesâ€”dependencies between individual modalities and the labelâ€”become increasingly important.Â

The proposed paradigm assumes that labels are the primary source of modalities-specific data. It further specifies the connection between the label, the selection process, and the various modalities. From one use case to the next, the amount to which the output relies on data from different modalities and the relationships between them varies. A multi-modal system has to simulate the inter- and intra-modality dependencies because itâ€™s important to know how strong these dependencies are regarding the ultimate goal. The team accomplished this by developing and merging classifiers for each modality to capture the dependencies within each modality and a classifier to capture the dependencies between the output label and the interactions across different modes.Â

The I2M2 method is derived from the multi-modal generative model, a widely used approach in multi-modal learning. However, the prior research on multi-modal learning can be divided into two groups using the suggested framework. The methods of inter-modal modeling, which are grouped in the first group, rely heavily on detecting inter-modal relationships to predict the target. Despite their theoretical capability to capture connections between and within modalities, they often fail in practice due to unfulfilled assumptions about the multi-modal learning-generating model. The methods used in intra-modality modeling, which fall under the second group, rely solely on labels for interactions between modalities, limiting their effectiveness.Â

In contradiction to the goal of multi-modal learning, these methods fail to grasp the interdependence of the modalities for prediction. When predicting the label, inter-modality methods work well when modalities exchange substantial information, but intra-modality methods work well when cross-modality information is scarce or nonexistent.Â

Because it is not necessary to know in advance how strong these dependencies are, the suggested I2M2 architecture overcomes this drawback. Because it explicitly describes interdependence across and within modalities, it can adapt to different contexts and still be effective. The results demonstrate that I2M2 is not just superior, but a game-changer, to both intra- and inter-modality approaches by validating researcherâ€™s claims on various datasets. Automatic diagnosis utilizing knee MRI scans and mortality and ICD-9 code prediction in the MIMIC-III dataset are two examples of the many healthcare jobs to which this technology is applied. Findings on vision-and-language tasks like NLVR2 and VQA further prove the transformative potential of I2M2.

Dependencies differ in strength between datasets, as our comprehensive evaluation indicates; the fastMRI dataset benefits more from intra-modality dependencies, whereas the NLVR2 dataset finds more relevance in inter-modality dependencies. The AV-MNIST, MIMIC-III, and VQA datasets are affected by both dependencies. In every respect, I2M2 succeeds, guaranteeing solid performance independent of the relative importance of its dependencies. This thorough research and its robust findings instill confidence in the effectiveness of I2M2.Â Â

Check out theÂ Paper and GitHub. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â

Join ourÂ Telegram Channel andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 44k+ ML SubReddit

Why does multi-modal modeling struggle compared to using a single modality or naive combinations of multiple modalities?@taromakino, @suchop, @kchonyc, and I reveal factors behind these challenges and proposes a modality-agnostic framework to overcome them.

1/7 pic.twitter.com/qahQKEGqnY

â€” Divyam Madaan (@dmadaan_) June 12, 2024

The post NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

Gears of War: E-Day FAQ â€” Xbox Game Pass, story, and everything you need to know

Alleged RCE Vulnerability Threatens Subdomains of Italian Ministry of Defence

Cross-Platform JavaScript Stealer Targets Crypto Wallets in New Lazarus Group Campaign

The Surface you know and love (or hate) is dead — Microsoft’s Windows hardware enters a new era

Debunking the AI Hype: Inside Real Hacker Tactics

Will AI end humanity? The p(doom) scales of an OpenAI insider and AI researcher are alarmingly high, peaking at a 99.9% probability

Transform Your Office Skills With This $40 Training Bundle

City of Columbus Offers Credit Monitoring to Employees After Massive Cyberattack Hits Government Facilities

NYU Researchers Propose Inter- & Intra-Modality Modeling (I2M2) for Multi-Modal Learning, Capturing both Inter-Modality and Intra-Modality Dependencies

Related Posts