Skip to content

Ask AI in Docs

Ask AI is the lightweight assistant built into the documentation site header. It lets someone ask a question in plain English, searches the published docs, and streams back an answer with source links.

It looks like chat, but it is not a full conversational workspace. Each question stands on its own. The UI keeps a browser session ID so feedback and analytics can be tied back to the same visitor, but the backend does not reuse earlier questions as conversation memory.

From the docs site, a visitor can:

  • Open the modal from the Ask AI button beside search
  • Use Ctrl + . on Windows/Linux or Cmd + . on macOS to open it from the keyboard
  • Start from suggested prompts or type a custom question
  • Watch the answer stream in as it is generated
  • Open linked source pages at the end of the answer
  • Send thumbs up or thumbs down feedback on completed answers

The modal also sets expectations right in the UI:

  • Answers may be wrong
  • Each question is processed independently
  • The assistant only knows what it can find in the docs index

The docs app wires the feature in through two main files:

  • apps/docs/src/components/Header.astro
  • apps/docs/src/components/AskAiChat.astro

Header.astro injects the Ask AI trigger beside the Starlight search box. Clicking that button opens a custom web component named <ask-ai-chat>.

AskAiChat.astro handles the rest:

  • Renders the modal UI
  • Creates or reuses an anonymous browser session ID in localStorage
  • Sends the question to the docs AI API
  • Reads Server-Sent Events from the response stream
  • Renders markdown with marked
  • Sanitizes the rendered HTML with DOMPurify
  • Builds source links with safe DOM APIs instead of string interpolation
  • Shows feedback controls only after a successful answer finishes streaming

There are a few small implementation details worth knowing:

  • Questions must be between 3 and 500 characters
  • The markdown render is throttled during streaming so the modal does not feel jumpy
  • The modal appends itself to document.body so it can sit above the sticky header cleanly
  • Retrieved source pages are deduplicated by slug before they are shown

This is the request path from a user’s question to the final answer:

  1. The visitor opens Ask AI and submits a question.
  2. The docs app sends POST /api/docs-ai/ask with the question and the anonymous session ID.
  3. The backend rate-limits the request by client IP.
  4. The backend embeds the question with the configured embedding model.
  5. It searches the docs_embeddings table for the closest matching chunks.
  6. It stores an initial row in docs_ai_interactions with retrieval metrics and request metadata.
  7. It calls the chat model with the retrieved excerpts and starts streaming tokens back over SSE.
  8. The frontend renders tokens as they arrive, then adds source links when the stream ends.
  9. The backend stores the finished answer and marks the request as completed or error.
  10. If self-rating is enabled, the backend runs a second pass that scores the answer for quality.

Ask AI does not scrape the live DOM of the docs site. It indexes a static export generated by the docs app:

  • apps/docs/src/pages/docs-export.json.ts

That route is prerendered at build time and exports every non-draft document in the docs collection. Each exported record includes:

  • slug
  • title
  • description
  • body
  • lastUpdated

Draft pages are filtered out before export, so unpublished content does not end up in the search index.

This setup keeps indexing predictable. The backend reads one clean JSON file instead of trying to reconstruct content from HTML after the fact.

The shared indexing pipeline lives here:

  • apps/backend/src/scripts/services/docs_ai/indexer.py
  • apps/backend/src/scripts/services/docs_ai/chunker.py
  • apps/backend/src/scripts/services/docs_ai/embedder.py
  • apps/backend/cli_tools/index_docs.py

The pipeline does the following:

  1. Load docs-export.json from a URL or local file.
  2. Split each page into chunks.
  3. Compare chunk checksums with what is already in the database.
  4. Skip chunks that have not changed.
  5. Delete orphaned chunks for pages or sections that no longer exist.
  6. Generate embeddings only for new or changed chunks.
  7. Upsert the final rows into docs_embeddings.

That checksum step matters. It keeps reindexing cheap because unchanged content does not get re-embedded.

Chunking is simple on purpose:

  • Split on ## and ### headings
  • Prefix each chunk with the page title
  • Prefix section chunks with the section heading too
  • Aim for about 400 tokens per chunk
  • Carry about 50 tokens of overlap from the previous chunk
  • Fall back from paragraph splitting to sentence splitting when a section is too long

The overlap helps when an important idea straddles two chunks. It is a small detail, but it improves answer quality more than people usually expect.

Vector retrieval happens in:

  • apps/backend/src/scripts/services/docs_ai/retriever.py

Current retrieval settings in code:

  • Similarity metric: cosine similarity through pgvector
  • Minimum similarity threshold: 0.3
  • Maximum chunks returned per question: 5
  • Customer scope: fixed to the public docs scope

The embeddings are stored in PostgreSQL with pgvector in the docs_embeddings table. Each row keeps:

  • The doc slug and title
  • The chunk text
  • The chunk index
  • The embedding vector
  • Token count
  • Checksum

The current embedding column is vector(4096). Because of that size, the implementation uses an exact scan instead of an HNSW index. That is a practical tradeoff for the current corpus size. The code comments call this out directly.

Retrieval uses two different thresholds:

  • 0.3 decides whether a chunk is eligible to be returned at all
  • DOCS_AI_LOW_MATCH_THRESHOLD, which defaults to 0.45, marks whether the best match looked weak

That second threshold does not block the answer. It feeds telemetry so the team can find questions the docs do not support well yet.

Answer generation lives in:

  • apps/backend/src/scripts/services/docs_ai/generator.py
  • apps/backend/src/scripts/services/docs_ai/client.py

The backend sends the retrieved chunks to the chat model with a system prompt that tells Eric to:

  • Answer only from the provided documentation excerpts
  • Say so clearly when the docs do not contain the answer
  • Cite pages in [Page Title](/slug/) format
  • Keep the tone warm but restrained

The default model settings are currently:

  • Embeddings: qwen/qwen3-embedding-8b
  • Chat: minimax/minimax-m2.5

Both go through OpenRouter using the OpenAI-compatible SDK.

The response is streamed as Server-Sent Events. The frontend handles these event types:

Event typeWhat it contains
metaA JSON payload with the response_id
tokenOne chunk of answer text
sourcesA JSON-encoded list of retrieved source docs
doneEnd-of-stream marker
errorA user-facing error message

The frontend appends token text as it arrives, then does one final markdown render at the end so the last chunk is not left half-parsed.

Ask AI records more than the final answer. It also keeps enough context to tell whether the feature is helping or drifting.

The telemetry table is:

  • docs_ai_interactions

That row stores:

  • The question and final answer
  • The anonymous session_id
  • A hash of the client IP, not the raw IP
  • The user agent
  • The chat and embedding model names
  • Retrieved source metadata
  • Retrieval scores
  • Feedback state
  • Self-rating state
  • Failure signals
  • Request status: pending, completed, or error

Once a streamed answer finishes successfully, the frontend shows:

  • Thumbs up
  • Thumbs down

If the user picks thumbs down, they can choose one reason:

  • incorrect
  • missing_docs
  • outdated
  • unclear
  • other

They can also add an optional note up to 500 characters.

Feedback is posted to POST /api/docs-ai/feedback. The backend only accepts it if the session_id matches the original ask request. That keeps one browser session from rating another session’s answer.

If DOCS_AI_SELF_RATING_ENABLED is on, the backend runs a follow-up scoring pass after the answer is stored.

That pass:

  • Grades the answer from 1 to 5
  • Stores a short reason
  • Falls back to heuristics if the model returns empty or malformed output

The self-rating logic is opinionated in one important way: if the answer says the docs do not cover the question, the score is forced low. That keeps a polite non-answer from being counted as a success.

The backend rolls several signals into a single failure classification:

  • low_match
  • thumbs_down
  • self_low_score

An interaction is marked as failed when one or more of those signals are present.

This feature is public, so the guardrails are straightforward and mostly practical.

  • POST /api/docs-ai/ask is public, but rate-limited per IP
  • POST /api/docs-ai/feedback is public, and separately rate-limited
  • POST /api/docs-ai/reindex requires X-Reindex-Secret
  • Swagger and OpenAPI routes are only available from localhost
  • CORS is limited to the docs site origin, plus local dev origins in local environments
  • Answer markdown is sanitized before insertion into the page
  • Source links are built from encoded path segments
  • Malformed SSE lines are logged and ignored instead of crashing the modal
  • Missing response bodies are treated as errors instead of assumed to be streamable

The feature stores a small amount of request metadata for debugging and quality tracking:

  • Anonymous browser session_id
  • Hashed client IP
  • User agent
  • Question text
  • Final answer
  • Feedback and scoring metadata

The raw client IP is not written to the interaction table. It is hashed first with EXTERNAL_API_SALT.

The docs AI backend lives in apps/backend/src/docs_api.py.

Public endpoints:

EndpointPurpose
GET /api/docs-ai/healthHealth check
POST /api/docs-ai/askAsk a question and receive a streamed answer
POST /api/docs-ai/feedbackSubmit thumbs up or thumbs down feedback

Protected endpoint:

EndpointPurpose
POST /api/docs-ai/reindexRebuild the docs embedding index

The FastAPI docs and OpenAPI schema are also exposed, but only to localhost:

  • /api/docs-ai/docs
  • /api/docs-ai/openapi.json

The main backend settings live in apps/backend/settings.py.

The most important ones are:

  • OPENROUTER_API_KEY
  • DOCS_AI_EMBEDDING_MODEL
  • DOCS_AI_CHAT_MODEL
  • DOCS_AI_RATE_LIMIT_RPM
  • DOCS_AI_FEEDBACK_RATE_LIMIT_RPM
  • DOCS_AI_LOW_MATCH_THRESHOLD
  • DOCS_AI_SELF_RATING_ENABLED
  • DOCS_AI_SELF_RATING_MODEL
  • DOCS_AI_SELF_RATING_FAILURE_THRESHOLD
  • DOCS_AI_TRUSTED_PROXY_IPS
  • DOCS_SITE_URL
  • DOCS_AI_REINDEX_SECRET

The frontend uses:

  • PUBLIC_DOCS_AI_API_URL

If that frontend value is not set, the component falls back to http://localhost:8042.

If you want to run the whole feature locally, the usual flow is:

  1. Start the docs API from apps/backend/
  2. Start the docs site from apps/docs/
  3. Build or serve the docs site so docs-export.json exists
  4. Run the indexer against that export

Example commands:

Terminal window
# Start the docs AI API
cd apps/backend
PYTHONPATH=.:src POSTGRESQL_HOST=localhost uv run fastapi dev src/docs_api.py --host 0.0.0.0 --port 8042
Terminal window
# Start the docs frontend
npx nx serve docs
Terminal window
# Index the docs export into pgvector
cd apps/backend
POSTGRESQL_HOST=localhost uv run python -m cli_tools.index_docs --source ../docs/dist/docs-export.json --clear

You can also trigger reindexing through the protected API endpoint if DOCS_AI_REINDEX_SECRET is configured.

If you need to debug or extend the feature, these are the main files:

AreaFiles
Docs UIapps/docs/src/components/Header.astro, apps/docs/src/components/AskAiChat.astro
Exported corpusapps/docs/src/pages/docs-export.json.ts
API serverapps/backend/src/docs_api.py
OpenRouter clientapps/backend/src/scripts/services/docs_ai/client.py
Embeddingsapps/backend/src/scripts/services/docs_ai/embedder.py
Retrievalapps/backend/src/scripts/services/docs_ai/retriever.py
Answer generationapps/backend/src/scripts/services/docs_ai/generator.py
Chunkingapps/backend/src/scripts/services/docs_ai/chunker.py
Indexingapps/backend/src/scripts/services/docs_ai/indexer.py, apps/backend/cli_tools/index_docs.py
Feedback and scoringapps/backend/src/scripts/services/docs_ai/feedback.py
Data modelsapps/backend/src/scripts/models/docs_ai_models.py, apps/backend/src/scripts/models/models.py
Database migrationsapps/backend/migrations/versions/cc2d0adef6af_add_docs_embeddings_table.py, apps/backend/migrations/versions/d4e2f6a19c41_add_docs_ai_interactions.py

Ask AI is intentionally narrow. It answers questions about the published docs. It does not have the full session memory, permissions model, tool access, or long-running workflow support that the in-app Eric assistant has.

That narrower shape is a good thing here. The docs-site assistant is fast, simple, public, and easy to reason about. It only works if the documentation is actually there. When it fails, the team can usually trace the problem to one of four places: the page was never documented, the index is stale, retrieval picked weak chunks, or the answer model could not turn good context into a clean response.