RAG for Business Websites: Build a No-Hallucination Knowledge-Base Chatbot
Most website chatbots still "guess." Retrieval-Augmented Generation (RAG) flips the script: your bot retrieves relevant passages from your knowledge (docs, FAQs, contracts) and then generates an answer grounded in those sources—with citations.
The Limitation of Vanilla Chatbots
Non-RAG chatbots (or poorly configured ones) tend to hallucinate when they lack context or stray off-topic. They ignore source truth because they don't retrieve documents before answering. They go stale without a recrawl/re-embed schedule. And they break trust by hiding where answers came from.
The RAG Advantage
RAG, done right, gives you:
1. Grounded Answers with Citations
Answers include source links/snippets so users (and your team) can verify the facts.
2. Scope Control
You can hard-limit the bot to approved corpora (policies, pricing, SLAs) and block out-of-scope topics.
3. Continuous Freshness
A recrawl + re-embed cadence keeps knowledge updated without manual rework.
4. Lower Risk + Better UX
Clear fallbacks ("I don't have that info yet") beat confident nonsense. Trust rises, tickets drop.
Real-World Examples
We've used RAG patterns to ship outcomes like support deflection 30–50%: Bot answers from FAQs, past tickets, and guides—with "handoff when unsure." Sales enablement: Pricing and feature comparisons grounded in your product sheet; lead quality improves. Ops clarity: Internal playbooks searchable in chat; fewer "where is the doc?" interruptions.
The Andy Approach
Minimal, fast, measurable:
Stack: Next.js widget (web), Andy as the agent layer, Convex for vector store and real-time database, Firecrawl for website ingestion, and OpenAI for embeddings and generation.
Pipelines: Upload/Crawl → Chunk → Embed → Store in Convex RAG → Retrieve → Answer (+ cite) → Log → Evaluate.
Guardrails: Scope filters, cost caps, confidence thresholds with human handoff, and plan-based limits for usage control.
Getting Started
Before you build, decide:
Sources of truth: Which spaces are canonical (FAQ, Docs, PDFs, Websites)?
Scope limits: What must it not answer (legal advice, roadmap, custom quotes)?
Refresh cadence: Weekly for marketing docs, daily for pricing/FAQs, on-change for policies.
KPIs: Deflection rate, first-contact resolution, citation click-through, "I don't know" rate, CSAT.
Minimal Implementation Blueprint
Data prep
Normalize to PDF/Markdown where possible. Chunking: 300–800 tokens with semantic overlap (titles, section IDs as metadata). Add metadata (language, version, doc type, product area).
Retrieval
Vector search with semantic similarity. Tune top-k (start 5–8) and add filters (doc_type:faq, product:pro). Penalize stale versions via metadata.
Answering
System prompt enforces: cite sources, refuse out-of-scope, use brand tone. Format: short answer → bullet details → Sources with anchors. Confidence threshold: below X → handoff or ask a clarifying question.
Freshness & Ops
Convex schedulers: crawl on schedule, diff detection → selective re-embed. Alerts if ingestion fails or deflection drops week-over-week. Versioning: keep embeddings keyed by doc hash to avoid drift.
Evaluation (non-negotiable)
Golden-set Q&A: 30–100 questions your team cares about; re-run on each deployment. Track precision/recall proxy (did the cited doc actually contain the answer?). Regression gate: block releases that degrade accuracy.
Common Pitfalls (and Fixes)
Problem: Bot cites a doc but answers from memory. Fix: Force extract-then-answer and require inline citations with IDs.
Problem: Answers are too long or too vague. Fix: Cap tokens, prefer bullets, and include a one-sentence summary first.
Problem: Index bloat slows retrieval. Fix: Archive old versions, add filters, and split indices by domain.
Problem: Costs creep up. Fix: Cache frequent Q&As, batch embeddings, and compress long sources before chunking.
What "Good" Looks Like
Every answer includes sources (title + anchor). "I don't know" path is explicit (handoff, ticket, or contact form). Weekly eval report with trendlines (deflection, accuracy, CSAT). Freshness SLA defined per doc type. Security: PII masked in logs; private indices per tenant if multi-client.
Ready to launch a RAG chatbot that doesn't hallucinate?
---
Install Andy on your site—then plug in your Knowledge Manager. (WhatsApp and Slack integrations coming in Q2 2025)