This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse

Artificial Intelligence (AI) systems are rigorously tested before they are released to determine whether they can be used for dangerous activities like bioterrorism, manipulation, or automated cybercrimes. This is especially crucial for powerful AI systems, as they are programmed to reject commands that can negatively affect them. Conversely, less powerful open-source models frequently have weaker rejection mechanisms that are easily overcome with more training.

In recent research, a team of researchers from UC Berkeley has shown that even with these safety measures, guaranteeing the security of individual AI models is insufficient. Even while each model seems safe on its own, adversaries can abuse combinations of models. They accomplish this by using a tactic known as task decomposition, which divides a difficult malicious activity into smaller tasks. Then, distinct models are given subtasks, in which competent frontier models handle the benign but difficult subtasks, whereas weaker models with laxer safety precautions handle the malicious but easy subtasks.

To demonstrate this, the team has formalized a threat model in which an adversary uses a set of AI models to attempt to produce a detrimental output, an example of which is a malicious Python script. The adversary chooses models and prompts iteratively to get the intended harmful result. In this instance, success indicates that the adversary has used the joint efforts of several models to produce a detrimental output.

The team has studied both automated and manual task decomposition techniques. In manual task decomposition, a human determines how to divide a task into manageable portions. For tasks that are too complicated for manual decomposition, the team has used automatic decomposition. This method involves the following steps: a strong model solves related benign tasks, a weak model suggests them and the weak model uses the solutions to carry out the initial malicious task.

The results have shown that combining models can greatly boost the success rate of producing damaging effects compared to employing individual models alone. For example, while developing susceptible code, the success rate of merging Llama 2 70B and Claude 3 Opus models was 43%, but neither model worked better than 3% by itself.

The team has also found that the quality of both the weaker and stronger models correlates with the likelihood of misuse. This implies that the likelihood of multi-model misuse will rise as AI models get better. This misuse potential could be further increased by employing other decomposition techniques, such as training the weak model to exploit the strong model through reinforcement learning or using the weak model as a general agent that continually calls the strong model.

In conclusion, this study has highlighted the necessity of ongoing red-teaming, which includes experimenting with different AI model configurations to find potential misuse hazards. This is a procedure that should be followed by developers for the duration of an AI modelâ€™s deployment lifecycle because updates can create new vulnerabilities.Â

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â

Join ourÂ Telegram Channel andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generallyÂ available! [Advertisement]

The post This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

“com.jacob.com.ComFailException: Can’t co-create object” with java 11

Google Patches New Android Kernel Vulnerability Exploited in the Wild

Intel chip bug FAQ: Which PCs are affected, how to get the patch, and everything else you need to know

Open-source app brings Apple Intelligence Writing Tools to Windows 11

Ransomware-hit vodka maker Stoli files for bankruptcy in the United States

Is Edge Webview2 Runtime a Virus? Should I Remove it

World Cybercon 3.0 META Awards Celebrate Champions of Cybersecurity in the Middle East

Website Performance Optimization

This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse

Related Posts