Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

Information extraction (IE) is a pivotal area of artificial intelligence that transforms unstructured text into structured, actionable data. Despite their expansive capacities, traditional large language models (LLMs) often fail to comprehend and execute the nuanced directives required for precise IE. These challenges primarily manifest in closed IE tasks, where a model must adhere to stringent, pre-defined schemas.

IE tasks compel models to discern and categorize text in formats that align with predefined structures, such as named entity recognition and relation classification. However, existing LLMs typically falter when tasked with the nuanced understanding and alignment necessary for effective IE. Researchers have traditionally employed strategies such as prompt engineering, which involves providing detailed annotations and guidelines to assist LLMs without altering underlying model parameters.

The research community has observed a critical need for a methodology that enhances LLMsâ€™ understanding of structured tasks and improves execution accuracy. In response, researchers from Tsinghua University have introduced a new approach called ADELIE (Aligning large language moDELs on Information Extraction). This approach leverages a specialized dataset, IEInstruct, comprising over 83,000 instances across various IE formats, including triplets, natural language responses, and JSON outputs.Â

ADELIE diverges from conventional methods by integrating supervised fine-tuning with an innovative Direct Preference Optimization (DPO) strategy. This blend enables the model to align more closely with the intricacies of human-like IE processing. Initial training involves a mix of IE-specific and generic data, using the LLAMA 2 model over 6,306 gradient steps, which ensures the retention of broad linguistic capabilities alongside specialized IE performance.

Performance metrics reveal that ADELIE models, ADELIESFT and ADELIEDPO, achieve benchmark-setting results. In evaluations against held-out datasets, ADELIESFT shows an average F1 score improvement of 5% over standard LLM outputs in closed IE tasks. The improvements are even more pronounced for open IE, with ADELIE models outperforming state-of-the-art alternatives by 3-4% margins in robustness and extraction accuracy. In the realm of on-demand IE, the models demonstrate a nuanced understanding of user instructions, translating into highly accurate data structuring.

In conclusion, ADELIEâ€™s methodical training and optimization translate into a potent alignment of LLMs with IE tasks, demonstrating that a focused approach to data diversity and instruction specificity can bridge the gap between human expectations and machine performance. This alignment does not compromise the modelsâ€™ general capabilities, which is often a concern with task-specific tuning. The impressive results across various metrics and task types underscore the potential of ADELIE to set new standards in information extraction, making it a valuable tool for multiple applications, from academic research to real-world data processing.

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-48187 – RAGFlow Authentication Bypass

The Secret of the World’s Darkest Prison

Itâ€™s Time to Disrupt Visual Regression Testing

Smashing Animations Part 1: How Classic Cartoons Inspire Modern CSS

Essential Tools and Frameworks for Mastering Ethical Hacking on Linux

Black Hat Europe 2024: Hacking a car – or rather, its infotainment system

Edward Snowden labels OpenAI’s new board appointment a “willful, calculated betrayal of the rights of every person on Earth”

Rilasciata Finnix 250: La distribuzione GNU/Linux per amministratori di sistema celebra 25 anni

Middle Eastâ€™s Top 100 Cybersecurity Leaders to Follow

Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

Related Posts