FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models

The rapid advancement of generative AI has made image manipulation easier, complicating the detection of tampered content. While effective, current Image Forgery Detection and Localization (IFDL) methods need to work on two key challenges: the black-box nature of their detection principles and limited generalization across various tampering methods like Photoshop, DeepFake, and AIGC-Editing. The rise of powerful image editing models has further blurred the line between real and fake content, posing risks such as misinformation and legal issues. To address these challenges, researchers are exploring Multimodal Large Language Models (M-LLMs) for more explainable IFDL, enabling clearer identification and localization of manipulated regions.

Current IFDL methods often focus on specific tampering types, while universal techniques aim to detect a wider range of manipulations by identifying image artifacts and irregularities. Models like MVSS-Net and HiFi-Net employ multi-scale feature learning and multi-branch modules to improve detection accuracy. Although these methods achieve satisfactory performance, they need more explainability and help to generalize across different datasets. Meanwhile, LLMs have demonstrated exceptional text-generation and visual understanding abilities. Recent studies have integrated LLMs with image encoders, but their use for universal tamper detection and localization still needs to be explored.

Researchers from Peking University and the South China University of Technology introduced FakeShield, an explainable Image Forgery Detection and Localization (e-IFDL) framework. FakeShield evaluates image authenticity, generates tampered region masks, and explains using pixel-level and image-level tampering clues. They enhanced existing datasets using GPT-4o to create the Multi-Modal Tamper Description Dataset (MMTD-Set) for training. Additionally, they developed the Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and Multi-modal Forgery Localization Module (MFLM) to interpret different tampering types and align visual-language features. Extensive experiments show FakeShieldâ€™s superior performance in detecting and localizing various tampering methods compared to traditional IFDL techniques.

The proposed MMTD-Set enhances traditional IFDL datasets by integrating text descriptions with visual tampering information. Using GPT-4o, tampered images and their corresponding masks are paired with detailed descriptions, focusing on tampering artifacts. The FakeShield framework comprises two key modules: the DTE-FDM for tamper detection and explanation and the MFLM for precise mask generation. These modules work together to improve detection accuracy and interpretability. Experiments show that FakeShield outperforms previous methods across PhotoShop, DeepFake, and AIGC-Editing datasets in detecting and localizing image forgeries.

The MMTD-Set dataset uses Photoshop, DeepFake, and self-constructed AIGC-Editing tampered images for training and testing. The proposed FakeShield framework, incorporating the DTE-FDM and MFLM, is compared against state-of-the-art methods like SPAN, MantraNet, and HiFi-Net. Results demonstrate superior performance in detecting and localizing forgeries across multiple datasets. FakeShieldâ€™s integration of GPT-4o and domain tags enhances its ability to handle diverse tampering types, making it more robust and accurate than competing image forgery detection and localization methods.

In conclusion, the study introduces FakeShield, a pioneering application of M-LLMs for explainable IFDL. FakeShield can detect manipulations, generate tampered region masks, and provide explanations by analyzing pixel-level and semantic clues. It leverages the MMTD-Set built using GPT-4o to enhance tampering analysis. By incorporating the DTE-FDM and the MFLM, FakeShield achieves robust detection and localization across diverse tampering types like Photoshop edits, DeepFake, and AIGC-based modifications, outperforming existing methods in explainability and accuracy.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter.. Donâ€™t Forget to join ourÂ 50k+ ML SubReddit

Interested in promoting your company, product, service, or event to over 1 Million AI developers and researchers? Letâ€™s collaborate!

The post FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models appeared first on MarkTechPost.

Source: Read MoreÂ

CodeSOD: Enterprise Code Coverage

Mastering SVG Arcs

CodeSOD: A Set of Mistakes

CodeSOD: While This Works

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Finally, a luxury soundbar that’s compact and delivers immersive audio (and it’s $500 off)

This affordable Lenovo gaming PC is the one I recommend to most people. Here’s why

The last day of ’12 days of OpenAI’ is expected to bring biggest drop yet

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PECL Releases (12.10.2024)

Community News: Latest PEAR Releases (12.09.2024)

Community News: Latest PECL Releases (12.17.2024)

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

Windows 11 hidden toggle reveals how to turn on or off Administrator protection

10 Must-Have Apps for 3 Monitors You Should Know About

FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models

Qualcomm scores BIG win against Arm, can continue to sell Snapdragon X chips for PCs

What do the State of CSS and HTML surveys tell us?

Steam users spot leaked new ‘helpfulness system’ that filters out useless PC game reviews â€” is Valve finally addressing the problem?

From Static to Conversational: MathChat and MathChatsync Open New Doors for Dialogue-Based Math with LLMs

Get 2 of the best single-player FPS games for less than your morning coffee, and their top-notch sequel for cheaper than a sandwich

Microsoft updated with Windows 11 OOBE with new visuals. Take a look

Test Tools Need Testing

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

API with NestJS #165. Time intervals with the Drizzle ORM and PostgreSQL

Major Flaw in Microsoft Mac Apps Could Let Hackers Spy Through Mic and Camera

FakeShield: An Explainable AI Framework for Universal Image Forgery Detection and Localization Using Multimodal Large Language Models

Related Posts