Tool‑Augmented RAG Chatbot: GPT‑4, pgVector & Next.js

This is Part 3 of a three-part series (links at the bottom).

In Part Two, we moved from concept to execution by building the foundation of a Retrieval‑Augmented Generation (RAG) system. We set up a Postgres database with pgvector, defined a schema, wrote a script to embed and chunk text, and validated vector search with cosine similarity.

In this final installment, we’ll build a Next.js chatbot interface that streams GPT‑4 responses powered by your indexed content and demonstrates how to use GPT‑4 function‑calling (“tool calling”) for type‑safe, server‑side operations. Along the way, we’ll integrate polished components from shadcn UI to level‑up the front‑end experience.

Overview

Prerequisite You should already have the rag‑chatbot‑demo repo from Parts 1 & 2, with Dockerised PostgreSQL 17 + pgvector, the content_chunks schema, and embeddings ingested. Link to the repo here.

By the end of this guide you will:

Install new dependencies — `zod` for schema validation, `@openai/agents` for tool definitions, and `shadcn/ui` for UI components.
Define a vectorSearch tool using Zod to embed a user query, run a pgvector search, and return the top results.
Extend the RAG API route so GPT‑4 can decide when to call `vectorSearch`, merging the tool’s output into its streamed answer.
Build a streaming chat UI that swaps vanilla elements for shadcn UI inputs, buttons, and cards.
Deploy to Vercel with a single click.

If you already have the project folder from earlier parts, skip straight to Install Dependencies.

Tool Calling Explained

Before diving into implementation, it’s helpful to understand what tool calling is and why it matters for a robust RAG-based chatbot.

Tool calling lets your LLM not only generate free-form text but also invoke predefined functions, or “tools”, with strictly validated arguments. By exposing only a controlled set of server-side capabilities (for example, looking up the current time, querying an external API, or managing user sessions), you:

Keep responses grounded in live data or protected operations, reducing hallucinations.
Enforce type safety at runtime via Zod schemas, so GPT-4 can’t supply malformed parameters.
Enable multi-step workflows, where the model reasons about which tool to call, what arguments to pass, and how to incorporate the tool’s output back into its natural-language answer.

In our setup, we register each tool with a name, description, and a Zod schema that describes the allowed parameters. When GPT-4 decides to call a tool, the AI SDK intercepts that intent, validates the arguments against the Zod schema, runs the tool’s execute function on the server, and then feeds the result back into the model’s next generation step. This orchestration happens entirely within the streaming response, so the user sees a seamless, conversational experience even when live data or actions are involved.

Tool-Augmented RAG Flow

User question is sent to the chat endpoint.
GPT-4 analyzes the prompt and, if it requires external knowledge, emits a tool call to vector_search with a `query` argument.
The vector_search tool embeds that query, performs a pgvector cosine search in `content_chunks`, and returns the most relevant snippets.
GPT-4 receives those snippets, constructs a final prompt that includes the retrieved context, and generates a grounded answer.
The response is streamed back to the client UI, giving users a real-time chat experience enriched by your custom knowledge base.

Installing Dependencies

npm install ai @ai-sdk/openai @openai/agents zod shadcn-ui pg

Package	Purpose
`@openai/agents`	Registers functions as callable tools
`zod`	Runtime schema validation
`shadcn-ui`	Tailwind‑friendly React components
`ai` & `@ai-sdk/openai`	Manage LLM calls & streaming
`pg`	PostgreSQL client

Initialise shadcn UI and select a few components:

npx shadcn@latest init
npx shadcn@latest add button input card scroll-area

Defining the `vectorSearch` Tool

Use the OpenAI Agents SDK to create a vectorSearch tool that embeds user queries, searches your Postgres vector store, and returns results:

// tools/vectorSearch.ts
import { embed, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import { Pool } from 'pg';

const db = new Pool({ connectionString: process.env.DATABASE_URL });

// Define the vector search tool
export const vectorSearchTool = tool({
  description: 'Search for relevant information in the knowledge base',
  parameters: z.object({
    query: z.string().describe('The search query to find relevant information'),
  }),
  execute: async ({ query }) => {
    console.log('Searching for:', query);

    // Embed the search query
    const { embedding: qVec } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: query,
    });

    const qVecString = `[${qVec.join(',')}]`;

    // Retrieve top-5 most similar chunks
    const { rows } = await db.query<{ content: string; source: string }>(
      `SELECT content, source
         FROM content_chunks
     ORDER BY embedding <=> $1
        LIMIT 5`,
      [qVecString]
    );

    const results = rows.map((r, i) => ({
      content: r.content,
      source: r.source,
      rank: i + 1,
    }));

    return { results };
  },
});

We declare a Zod schema `{ query: string }` to validate incoming parameters.
The tool embeds text and runs an indexed cosine search in pgvector.

Extending the RAG API Route with Function Calling

Modify app/api/chat/route.ts to register the tool and let the model decide when to call it:

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { NextRequest } from 'next/server';
import { vectorSearchTool } from '@/tools/vectorSearch';

export const POST = async (req: NextRequest) => {
  const { messages } = await req.json();

  const systemMsg = {
    role: 'system',
    content: `You are a helpful support assistant.
    When users ask questions, use the vector search tool to find relevant information from the knowledge base.
    Base your answers on the search results.
    Always provide a response after using the tool.
    If the user asks a question that is not related to the knowledge base, say that you are not sure about the answer.`,
  };

  try {
    // Stream GPT-4's response with tool calling
    const result = streamText({
      model: openai('gpt-4.1'),
      messages: [systemMsg, ...messages],
      tools: {
        vectorSearch: vectorSearchTool,
      },
      maxSteps: 5, // Allow multiple tool calls and responses
    });

    return result.toDataStreamResponse();
  } catch (error) {
    console.error('Error in chat API:', error);
    return new Response('Internal Server Error', { status: 500 });
  }
};

We import `vectorSearch` and register it in the `tools` array.
The SDK uses Zod to validate tool arguments (`strict: true`).
GPT-4 can now output a JSON payload `{tool: “vector_search”, arguments:{query:”…”}}` to trigger your function.

Building the Streaming Chat UI with Shadcn UI

Create app/chat/page.tsx, selectively importing Shadcn components and wiring up useChat:

'use client';

import { useChat } from '@ai-sdk/react';
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
import { ScrollArea } from '@/components/ui/scroll-area';
import { Input } from '@/components/ui/input';
import { Button } from '@/components/ui/button';
import { useState } from 'react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
  });

  const customHandleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    await handleSubmit(e); // Call the handleSubmit from useChat
  };

  const renderMessage = (message: any, index: number) => {
    const isUser = message.role === 'user';
    const hasToolInvocations = message.toolInvocations && message.toolInvocations.length > 0;

    return (
      <div className={`mb-4 ${isUser ? 'text-right' : 'text-left'}`}>
        <div className={`inline-block p-2 rounded-lg ${isUser ? 'bg-primary text-primary-foreground' : 'bg-muted'}`}>{message.content}</div>

        {/* Debug section for tool calls */}
        {!isUser && hasToolInvocations && <ToolCallDebugSection toolInvocations={message.toolInvocations} />}
      </div>
    );
  };

  return (
    <Card className="w-full max-w-2xl mx-auto">
      <CardHeader>
        <CardTitle>Chat with AI</CardTitle>
      </CardHeader>
      <CardContent>
        <ScrollArea className="h-[60vh] mb-4 p-4 border rounded">
          {messages.map((message, index) => (
            <div key={index}>{renderMessage(message, index)}</div>
          ))}
        </ScrollArea>
        <form onSubmit={customHandleSubmit} className="flex space-x-2">
          <Input type="text" value={input} onChange={handleInputChange} placeholder="Type your message here..." className="flex-1" />
          <Button type="submit">Send</Button>
        </form>
      </CardContent>
    </Card>
  );
}

function ToolCallDebugSection({ toolInvocations }: { toolInvocations: any[] }) {
  const [isExpanded, setIsExpanded] = useState(false);

  return (
    <div className="mt-2 text-left">
      <button onClick={() => setIsExpanded(!isExpanded)} className="text-xs text-gray-500 hover:text-gray-700 flex items-center gap-1">
        <span>{isExpanded ? '▼' : '▶'}</span>
        <span>Debug: Tool calls ({toolInvocations.length})</span>
      </button>

      {isExpanded && (
        <div className="mt-2 space-y-2 text-xs bg-gray-50 dark:bg-gray-900 p-2 rounded border">
          {toolInvocations.map((tool: any, index: number) => (
            <div key={index} className="bg-white dark:bg-gray-800 p-2 rounded border">
              <div className="font-semibold text-blue-600 dark:text-blue-400 mb-1">🔧 {tool.toolName}</div>
              <div className="text-gray-600 dark:text-gray-300 mb-2">
                <strong>Query:</strong> {tool.args?.query}
              </div>
              {tool.result && (
                <div>
                  <div className="font-semibold text-green-600 dark:text-green-400 mb-1">Results:</div>
                  <div className="space-y-1 max-h-32 overflow-y-auto">
                    {tool.result.results?.map((result: any, idx: number) => (
                      <div key={idx} className="bg-gray-100 dark:bg-gray-700 p-1 rounded">
                        <div className="text-gray-800 dark:text-gray-200 text-xs">{result.content}</div>
                        <div className="text-gray-500 text-xs mt-1">
                          Source: {result.source} | Rank: {result.rank}
                        </div>
                      </div>
                    ))}
                  </div>
                </div>
              )}
            </div>
          ))}
        </div>
      )}
    </div>
  );
}

We import `Button` and `Input` from Shadcn to style controls.
`useChat()` auto-posts to our API route, handling tool calls under the hood.

One‑Click Deploy to Vercel

Push your repo to GitHub.
In Vercel → Add Project, import the repo and set environment variables:
- `DATABASE_URL`
- `OPENAI_API_KEY`
Click Deploy. Vercel auto‑detects Next.js and streams responses out‑of‑the‑box.

Your chatbot now features type‑safe tool calling, a vector‑powered knowledge base, and a refined shadcn UI front‑end—ready for users.

References:

Part 1: Vector Search Embeddings and RAG

Part 2: Postgres RAG Stack: Embedding, Chunking & Vector Search

Repo: https://github.com/aberhamm/rag-chatbot-demo

Source: Read MoreÂ

Designing Better UX For Left-Handed People

This week in AI dev tools: Gemini 2.5 Flash-Lite, GitLab Duo Agent Platform beta, and more (July 25, 2025)

Tenable updates Vulnerability Priority Rating scoring method to flag fewer vulnerabilities as critical

Google adds updated workspace templates in Firebase Studio that leverage new Agent mode

Trump’s AI plan says a lot about open source – but here’s what it leaves out

Google’s new Search mode puts classic results back on top – how to access it

These AR swim goggles I tested have all the relevant metrics (and no subscription)

Google’s new AI tool Opal turns prompts into apps, no coding required

Laravel Scoped Route Binding for Nested Resource Management

Laravel Scoped Route Binding for Nested Resource Management

Add Reactions Functionality to Your App With Laravel Reactions

saasykit/laravel-open-graphy

Sam Altman won’t trust ChatGPT with his “medical fate” unless a doctor is involved — “Maybe I’m a dinosaur here”

Sam Altman won’t trust ChatGPT with his “medical fate” unless a doctor is involved — “Maybe I’m a dinosaur here”

“It deleted our production database without permission”: Bill Gates called it — coding is too complex to replace software engineers with AI

Top 6 new features and changes coming to Windows 11 in August 2025 — from AI agents to redesigned BSOD screens

Tool‑Augmented RAG Chatbot: GPT‑4, pgVector & Next.js

Overview

Tool Calling Explained

Tool-Augmented RAG Flow

Installing Dependencies

Defining the `vectorSearch` Tool

Extending the RAG API Route with Function Calling

Building the Streaming Chat UI with Shadcn UI

One‑Click Deploy to Vercel

References:

Laravel Scoped Route Binding for Nested Resource Management

Add Reactions Functionality to Your App With Laravel Reactions

Emergency patch for potential SAP zero-day that could grant full system control

CVE-2025-6851 – “WordPress Broken Link Notifier SSRF”

lanos/laravel-cashier-stripe-connect

Google’s New IDE Redefines Developer Productivity

CVE-2025-52709 – Everest Forms Object Injection Vulnerability

Cisco ISE-servers via kritieke kwetsbaarheden volledig over te nemen

Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

How to Open NVIDIA Control Panel on Windows 11 Super Fast

Tool‑Augmented RAG Chatbot: GPT‑4, pgVector & Next.js

Overview

Tool Calling Explained

Tool-Augmented RAG Flow

Installing Dependencies

Defining the `vectorSearch` Tool

Extending the RAG API Route with Function Calling

Building the Streaming Chat UI with Shadcn UI

One‑Click Deploy to Vercel

References:

Related Posts