KGGen: Advancing Knowledge Graph Extraction with Language Models and Clustering Techniques

Knowledge graphs (KGs) are the foundation of artificial intelligence applications but are incomplete and sparse, affecting their effectiveness. Well-established KGs such as DBpedia and Wikidata lack essential entity relationships, diminishing their utility in retrieval-augmented generation (RAG) and other machine-learning tasks. Traditional extraction methods are likely to provide sparse graphs with absent important connections or noisy, redundant representations. Therefore it is difficult to obtain high-quality structured knowledge from unstructured text. Overcoming these challenges is critical to enable improved knowledge retrieval, reasoning, and insights with the help of artificial intelligence.

State-of-the-art methods for extracting KGs from raw text are Open Information Extraction (OpenIE) and GraphRAG. OpenIE, a dependency parsing technique, produces structured (subject, relation, object) triples but produces extremely complex and redundant nodes, reducing coherence. GraphRAG, which combines graph-based retrieval and language models, enhances entity linking but does not produce densely connected graphs, restricting downstream reasoning processes. Both techniques are plagued by low entity resolution consistency, sparsity in connectivity, and poor generalizability, rendering them ineffective for high-quality KG extraction.

Researchers from Stanford University, the University of Toronto, and FAR AI introduce KGGen, a novel text-to-KG generator that leverages language models and clustering algorithms to extract structured knowledge from plain text. Unlike earlier methods, KGGen introduces an iterative LM-based clustering method that enhances the extracted graph by merging synonymous entities and grouping relations. This enhances sparsity and redundancy, offering a more coherent and well-connected KG. KGGen also introduces MINE (Measure of Information in Nodes and Edges), the first benchmark for text-to-KG extraction performance, enabling standardized measurement of extraction methods.

KGGen operates through a modular Python package with modules for entity and relation extraction, aggregation, and entity and edge clustering. The module for entity and relation extraction employs GPT-4o to obtain structured triples (subject, predicate, object) from unstructured text. The aggregation module combines extracted triples from different sources into a unified knowledge graph (KG), hence ensuring a homogeneous representation of entities. The module for entity and edge clustering uses an iterative clustering algorithm to disambiguate synonymous entities, cluster similar edges, and enhance graph connectivity. Through the enforcement of strict constraints on the language model using DSPy, KGGen enables the attainment of structured and high-fidelity extractions. The output knowledge graph is distinguished by its dense connectivity, semantic relevance, and optimization for artificial intelligence purposes.

The benchmarking outcomes indicate the success of the method in extracting structured knowledge from text sources. KGGen gets an accuracy rate of 66.07%, which is significantly greater than GraphRAG at 47.80% and OpenIE at 29.84%. The system facilitates the capability to extract and structure knowledge without redundancy and enhancing connectivity and coherence. Comparative analysis confirms an 18% improvement in extraction fidelity over existing methods, highlighting its capability to generate well-structured knowledge graphs. Tests also demonstrate that produced graphs are denser and more informative, making them particularly suitable in the context of knowledge retrieval tasks and AI-based reasoning.

KGGen is a breakthrough in the field of knowledge graph extraction because it pairs language model-based entity recognition with iterative clustering techniques to generate higher-quality structured data. By achieving radically improved accuracy on the MINE benchmark, it raises the bar for transforming unstructured text into impactful representations. This breakthrough has far-reaching implications for artificial intelligence-driven knowledge retrieval, reasoning operations, and embedding-based learning, thus paving the way for further development of larger and more comprehensive knowledge graphs. Future development will focus on refining clustering techniques and expanding benchmark tests to cover larger datasets.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

The post KGGen: Advancing Knowledge Graph Extraction with Language Models and Clustering Techniques appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

One of Microsoft’s biggest hardware partners joins its “bold strategy, Cotton” moment over upgrading to Windows 11, suggesting everyone just buys a Copilot+ PC

LatAm’s First Databricks Champion at Perficient

LatAm’s First Databricks Champion at Perficient

Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

KGGen: Advancing Knowledge Graph Extraction with Language Models and Clustering Techniques

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

CVE-2025-4190 – WordPress CSV Mass Importer File Upload Privilege Escalation Vulnerability

This prism-shaped power bank I tested looks odd, but it makes so much sense

6 Best Free and Open Source Software KVM Switches

Blackhat: Lessons from the Michael Mann, Chris Hemsworth movie?

DeepStack: Enhancing Multimodal Models with Layered Visual Token Integration for Superior High-Resolution Performance

Vue.js avatar component vue-avatar

StormBambooâ€™s DNS Poisoning Attack Targets Software Updates

HydePHP is a Laravel-powered Static Site Generator

KGGen: Advancing Knowledge Graph Extraction with Language Models and Clustering Techniques

Related Posts