GraphRAG â€“ SD Times Open Source Project of the Week

GraphRAG is an open source research project out of Microsoft for creating knowledge graphs from datasets that can be used in retrieval-augmented generation (RAG).

RAG is an approach in which data is fed into an LLM to give more accurate responses. For instance, a company might use RAG to be able to use its own private data in a generative AI app so that employees can get responses specific to their companyâ€™s own data, such as HR policies, sales data, etc.Â

How GraphRAG works is that the LLM creates the knowledge graph by processing the private dataset and creating references to entities and relationships in the source data. Then the knowledge graph is used to create a bottom-up clustering where data is organized into semantic clusters. At query time, both the knowledge graph and the clusters are provided to the LLM context window.Â

According to Microsoft researchers, it performs well in two areas that baseline RAG typically struggles with: connecting the dots between information and summarizing large data collections.Â

As a test of GraphRAGâ€™s effectiveness, the researchers used the Violent Incident Information from News Articles (VIINA) dataset, which compiles information from news reports on the war in Ukraine. This was chosen because of its complexity, presence of differing opinions and partial information, and its recency, meaning it wouldnâ€™t be included in the LLMâ€™s training dataset.Â

Both the baseline RAG and GraphRAG were able to answer the question â€œWhat is Novorossiya?â€ Only GraphRAG was able to answer the follow-up question â€œWhat has Novorossiya done?â€

â€œBaseline RAG fails to answer this question. Looking at the source documents inserted into the context window, none of the text segments discuss Novorossiya, resulting in this failure. In comparison, the GraphRAG approach discovered an entity in the query, Novorossiya. This allows the LLM to ground itself in the graph and results in a superior answer that contains provenance through links to the original supporting text,â€ the researchers wrote in a blog post.Â Â

The second area that GraphRAG succeeds at is summarizing large datasets. Using the same VIINA dataset, the researchers ask the question â€œWhat are the top 5 themes in the data?â€ Baseline RAG returns back five items about Russia in general with no relation to the conflict, while GraphRAG returns much more detailed answers that more closely reflect the themes of the dataset.Â

â€œBy combining LLM-generated knowledge graphs and graph machine learning, GraphRAG enables us to answer important classes of questions that we cannot attempt with baseline RAG alone. We have seen promising results after applying this technology to a variety of scenarios, including social media, news articles, workplace productivity, and chemistry. Looking forward, we plan to work closely with customers on a variety of new domains as we continue to apply this technology while working on metrics and robust evaluation. We look forward to sharing more as our research continues,â€ the researchers wrote.

Read about other recent Open-Source Projects of the Week:

Theia IDE
LibreChat
Unity CatalogÂ

The post GraphRAG â€“ SD Times Open Source Project of the Week appeared first on SD Times.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

GraphRAG â€“ SD Times Open Source Project of the Week

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

From finding to fixing: GitHub Advanced Security integrates Endor Labs SCA

Business in the age of AI: From economies of scale to ecosystems of success

Performance optimization of full load and ongoing replication tasks from self-managed Db2 to Amazon RDS for Db2

SlideGar: A Novel AI Approach to Use LLMs in Retrieval Reranking, Solving the Challenge of Bound Recall

ERROR_ALLOCATE_BUCKET: 5 Ways to Fix it in Windows

I went hands-on with Samsung Galaxy S25 Ultra – it’s the AI phone to beat in 2025

I tested the standard Galaxy S25, and it beats Google and Apple’s offerings in several ways

AI-Powered Call Centers: A New Era of Customer Service

GraphRAG â€“ SD Times Open Source Project of the Week

Related Posts