Your chatbot just told a customer you offer 24/7 phone support. You don't. Now you're dealing with an angry email and wondering why AI keeps inventing "facts."
This is the hallucination problem—and it's why 73% of businesses don't trust AI chatbots for customer-facing interactions, according to a 2024 Zendesk study.
Retrieval-Augmented Generation (RAG) solves this by grounding every answer in your actual documentation. Here's how to build one.
Why Standard Chatbots Hallucinate
Standard AI chatbots generate answers from their training data. When they don't know something, they don't say "I don't know." They confidently make something up.
The core problem:
- The AI has no access to your specific policies, pricing, or procedures
- It fills knowledge gaps with plausible-sounding guesses
- Users can't verify if answers are accurate
- Trust erodes with every wrong answer
RAG flips this model. Instead of generating from memory, the chatbot first retrieves relevant passages from your documentation, then generates an answer strictly from those passages—with citations.
How RAG Actually Works
RAG combines two capabilities: semantic search and text generation. Here's the process for every user question:
Step 1: Retrieve
The system converts the user's question into a vector (a mathematical representation of meaning) and searches your document index for the most relevant passages.
Step 2: Augment
Retrieved passages get injected into the AI's context window along with the original question and strict instructions to answer only from the provided content.
Step 3: Generate
The AI generates a response based exclusively on the retrieved passages, including citations back to the source documents.
The key constraint: The system prompt explicitly forbids answering questions when no relevant documentation exists. Instead, it says "I don't have information about that" and offers to connect the user with a human.
Building Your RAG System: The Stack
You don't need to build RAG from scratch. Modern tools make implementation straightforward.
Our recommended stack:
- Frontend: Next.js widget (embeds on any website)
- Vector database: Convex, Pinecone, or Supabase pgvector
- Embeddings: OpenAI text-embedding-3-small or Cohere
- Generation: Claude, GPT-4, or Llama 3
- Ingestion: Firecrawl for websites, LangChain for documents
The pipeline looks like this:
- Crawl/upload your docs
- Chunk into 300-800 token segments
- Generate embeddings for each chunk
- Store in vector database
- At query time: embed question → search → retrieve top 5-8 chunks → generate answer
Total setup time with modern tools: 2-3 days for a basic implementation, 2-3 weeks for production-ready.
Configuration That Prevents Hallucinations
The difference between a RAG chatbot that hallucinates and one that doesn't comes down to configuration details.
System prompt requirements:
- "Answer only using the provided context passages"
- "If the context doesn't contain the answer, say so"
- "Always cite which document your answer comes from"
- "Never speculate or provide information not in the context"
Retrieval settings:
- Top-k: Start with 5-8 retrieved chunks
- Similarity threshold: Reject chunks below 0.7 similarity
- Metadata filters: Allow filtering by document type, date, or category
Confidence thresholds:
- High confidence (>0.85 similarity): Answer directly
- Medium confidence (0.7-0.85): Answer with caveat
- Low confidence (<0.7): Refuse and offer human handoff
What Results Look Like
Businesses implementing RAG chatbots see measurable improvements within weeks.
Support deflection: 30-50% of support tickets get resolved by the bot without human intervention. Users get instant answers from your FAQ, docs, and knowledge base.
Response accuracy: Properly configured RAG systems achieve 95%+ accuracy on questions within scope. The remaining 5% correctly identify they don't know the answer.
User trust: Citation links let users verify answers. Trust scores improve 40-60% compared to non-RAG chatbots, based on our client data.
Team efficiency: Support staff handle only complex issues that require human judgment. Average handle time drops as bot handles routine questions.
Common Pitfalls and Fixes
Even well-built RAG systems fail if you miss these details.
Problem: Bot cites a document but answers from memory
The AI sometimes ignores retrieved context and falls back to training data. Fix this with explicit extract-then-answer prompting: "First, identify relevant quotes from the context. Then, answer using only those quotes."
Problem: Answers are too long or too vague
Without constraints, the AI generates verbose responses. Fix with output requirements: "Maximum 3 sentences. Start with a one-line summary. Use bullet points for multiple items."
Problem: Stale information
Your docs change but embeddings don't. Fix with scheduled re-crawling: daily for pricing/availability, weekly for general content, on-change for policies.
Problem: Cost creep
Every query costs money (embeddings + generation). Fix with caching: store common question-answer pairs and serve cached responses for exact or near-exact matches.
The Minimum Viable RAG Chatbot
You don't need a perfect system on day one. Start with the minimum that delivers value.
MVP scope:
- Your FAQ page (50-100 questions)
- Core product/service documentation
- Pricing and policy pages
- Basic handoff to human support
MVP timeline: 1-2 weeks with existing tools
MVP cost: $500-2,000 for setup + $50-200/month for API costs at low volume
Once the basic system works, expand the knowledge base and add features like conversation history, multi-turn clarification, and proactive suggestions.
Key Takeaways
- Standard chatbots hallucinate because they generate from training data, not your docs
- RAG grounds every answer in retrieved documentation with citations
- Proper configuration (system prompts, thresholds, handoffs) prevents most hallucinations
- Start with FAQ and core docs—expand after proving the concept works
- Expect 30-50% support deflection and 95%+ accuracy within scope
Get Started With RAG
The fastest path to a working RAG chatbot:
- Gather your FAQ, docs, and policy pages in one place
- Choose your stack (or use a managed solution like Andy)
- Configure strict grounding rules and citation requirements
- Test with real questions before going live
- Monitor accuracy and expand coverage over time
Need help building a RAG chatbot for your business? We specialize in custom AI solutions that actually work. Book a free consultation and let's discuss your knowledge base.