Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

Information extraction (IE) is a pivotal area of artificial intelligence that transforms unstructured text into structured, actionable data. Despite their expansive capacities, traditional large language models (LLMs) often fail to comprehend and execute the nuanced directives required for precise IE. These challenges primarily manifest in closed IE tasks, where a model must adhere to stringent, pre-defined schemas.

IE tasks compel models to discern and categorize text in formats that align with predefined structures, such as named entity recognition and relation classification. However, existing LLMs typically falter when tasked with the nuanced understanding and alignment necessary for effective IE. Researchers have traditionally employed strategies such as prompt engineering, which involves providing detailed annotations and guidelines to assist LLMs without altering underlying model parameters.

The research community has observed a critical need for a methodology that enhances LLMsâ€™ understanding of structured tasks and improves execution accuracy. In response, researchers from Tsinghua University have introduced a new approach called ADELIE (Aligning large language moDELs on Information Extraction). This approach leverages a specialized dataset, IEInstruct, comprising over 83,000 instances across various IE formats, including triplets, natural language responses, and JSON outputs.Â

ADELIE diverges from conventional methods by integrating supervised fine-tuning with an innovative Direct Preference Optimization (DPO) strategy. This blend enables the model to align more closely with the intricacies of human-like IE processing. Initial training involves a mix of IE-specific and generic data, using the LLAMA 2 model over 6,306 gradient steps, which ensures the retention of broad linguistic capabilities alongside specialized IE performance.

Performance metrics reveal that ADELIE models, ADELIESFT and ADELIEDPO, achieve benchmark-setting results. In evaluations against held-out datasets, ADELIESFT shows an average F1 score improvement of 5% over standard LLM outputs in closed IE tasks. The improvements are even more pronounced for open IE, with ADELIE models outperforming state-of-the-art alternatives by 3-4% margins in robustness and extraction accuracy. In the realm of on-demand IE, the models demonstrate a nuanced understanding of user instructions, translating into highly accurate data structuring.

In conclusion, ADELIEâ€™s methodical training and optimization translate into a potent alignment of LLMs with IE tasks, demonstrating that a focused approach to data diversity and instruction specificity can bridge the gap between human expectations and machine performance. This alignment does not compromise the modelsâ€™ general capabilities, which is often a concern with task-specific tuning. The impressive results across various metrics and task types underscore the potential of ADELIE to set new standards in information extraction, making it a valuable tool for multiple applications, from academic research to real-world data processing.

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

CVE-2025-4832 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

So many tokens, so little time: Introducing a faster, more flexible byte-pair tokenizer

Fine-tune Meta Llama 3.1 models for generative AI inference using Amazon SageMaker JumpStart

Generative AI-powered game design: Accelerating early development with Stability AI models on Amazon Bedrock

Promises Made Simple: Understanding Async/Await in JavaScript

Introducing Hypervel: A Coroutine Framework for Laravel Artisans

Testing Lightning Components from a QA Perspective

35L Rupees in India, $150K in the US, or Tax-Free Dubai? An Indian Techie Breaks Down the Bitter Truth

Quick Glossary: Payroll

Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

Related Posts