Ask AI in Docs
Ask AI is the lightweight assistant built into the documentation site header. It lets someone ask a question in plain English, searches the published docs, and streams back an answer with source links.
It looks like chat, but it is not a full conversational workspace. Each question stands on its own. The UI keeps a browser session ID so feedback and analytics can be tied back to the same visitor, but the backend does not reuse earlier questions as conversation memory.
What users actually get
Section titled “What users actually get”From the docs site, a visitor can:
- Open the modal from the Ask AI button beside search
- Use
Ctrl + .on Windows/Linux orCmd + .on macOS to open it from the keyboard - Start from suggested prompts or type a custom question
- Watch the answer stream in as it is generated
- Open linked source pages at the end of the answer
- Send thumbs up or thumbs down feedback on completed answers
The modal also sets expectations right in the UI:
- Answers may be wrong
- Each question is processed independently
- The assistant only knows what it can find in the docs index
What the frontend does
Section titled “What the frontend does”The docs app wires the feature in through two main files:
apps/docs/src/components/Header.astroapps/docs/src/components/AskAiChat.astro
Header.astro injects the Ask AI trigger beside the Starlight search box. Clicking that button opens a custom web component named <ask-ai-chat>.
AskAiChat.astro handles the rest:
- Renders the modal UI
- Creates or reuses an anonymous browser session ID in
localStorage - Sends the question to the docs AI API
- Reads Server-Sent Events from the response stream
- Renders markdown with
marked - Sanitizes the rendered HTML with
DOMPurify - Builds source links with safe DOM APIs instead of string interpolation
- Shows feedback controls only after a successful answer finishes streaming
There are a few small implementation details worth knowing:
- Questions must be between 3 and 500 characters
- The markdown render is throttled during streaming so the modal does not feel jumpy
- The modal appends itself to
document.bodyso it can sit above the sticky header cleanly - Retrieved source pages are deduplicated by slug before they are shown
End-to-end flow
Section titled “End-to-end flow”This is the request path from a user’s question to the final answer:
- The visitor opens Ask AI and submits a question.
- The docs app sends
POST /api/docs-ai/askwith the question and the anonymous session ID. - The backend rate-limits the request by client IP.
- The backend embeds the question with the configured embedding model.
- It searches the
docs_embeddingstable for the closest matching chunks. - It stores an initial row in
docs_ai_interactionswith retrieval metrics and request metadata. - It calls the chat model with the retrieved excerpts and starts streaming tokens back over SSE.
- The frontend renders tokens as they arrive, then adds source links when the stream ends.
- The backend stores the finished answer and marks the request as
completedorerror. - If self-rating is enabled, the backend runs a second pass that scores the answer for quality.
Where the content comes from
Section titled “Where the content comes from”Ask AI does not scrape the live DOM of the docs site. It indexes a static export generated by the docs app:
apps/docs/src/pages/docs-export.json.ts
That route is prerendered at build time and exports every non-draft document in the docs collection. Each exported record includes:
slugtitledescriptionbodylastUpdated
Draft pages are filtered out before export, so unpublished content does not end up in the search index.
This setup keeps indexing predictable. The backend reads one clean JSON file instead of trying to reconstruct content from HTML after the fact.
How indexing works
Section titled “How indexing works”The shared indexing pipeline lives here:
apps/backend/src/scripts/services/docs_ai/indexer.pyapps/backend/src/scripts/services/docs_ai/chunker.pyapps/backend/src/scripts/services/docs_ai/embedder.pyapps/backend/cli_tools/index_docs.py
The pipeline does the following:
- Load
docs-export.jsonfrom a URL or local file. - Split each page into chunks.
- Compare chunk checksums with what is already in the database.
- Skip chunks that have not changed.
- Delete orphaned chunks for pages or sections that no longer exist.
- Generate embeddings only for new or changed chunks.
- Upsert the final rows into
docs_embeddings.
That checksum step matters. It keeps reindexing cheap because unchanged content does not get re-embedded.
Chunking rules
Section titled “Chunking rules”Chunking is simple on purpose:
- Split on
##and###headings - Prefix each chunk with the page title
- Prefix section chunks with the section heading too
- Aim for about 400 tokens per chunk
- Carry about 50 tokens of overlap from the previous chunk
- Fall back from paragraph splitting to sentence splitting when a section is too long
The overlap helps when an important idea straddles two chunks. It is a small detail, but it improves answer quality more than people usually expect.
How retrieval works
Section titled “How retrieval works”Vector retrieval happens in:
apps/backend/src/scripts/services/docs_ai/retriever.py
Current retrieval settings in code:
- Similarity metric: cosine similarity through pgvector
- Minimum similarity threshold:
0.3 - Maximum chunks returned per question:
5 - Customer scope: fixed to the public docs scope
The embeddings are stored in PostgreSQL with pgvector in the docs_embeddings table. Each row keeps:
- The doc slug and title
- The chunk text
- The chunk index
- The embedding vector
- Token count
- Checksum
The current embedding column is vector(4096). Because of that size, the implementation uses an exact scan instead of an HNSW index. That is a practical tradeoff for the current corpus size. The code comments call this out directly.
Low-match detection
Section titled “Low-match detection”Retrieval uses two different thresholds:
0.3decides whether a chunk is eligible to be returned at allDOCS_AI_LOW_MATCH_THRESHOLD, which defaults to0.45, marks whether the best match looked weak
That second threshold does not block the answer. It feeds telemetry so the team can find questions the docs do not support well yet.
How answer generation works
Section titled “How answer generation works”Answer generation lives in:
apps/backend/src/scripts/services/docs_ai/generator.pyapps/backend/src/scripts/services/docs_ai/client.py
The backend sends the retrieved chunks to the chat model with a system prompt that tells Eric to:
- Answer only from the provided documentation excerpts
- Say so clearly when the docs do not contain the answer
- Cite pages in
[Page Title](/slug/)format - Keep the tone warm but restrained
The default model settings are currently:
- Embeddings:
qwen/qwen3-embedding-8b - Chat:
minimax/minimax-m2.5
Both go through OpenRouter using the OpenAI-compatible SDK.
Streaming behavior
Section titled “Streaming behavior”The response is streamed as Server-Sent Events. The frontend handles these event types:
| Event type | What it contains |
|---|---|
meta | A JSON payload with the response_id |
token | One chunk of answer text |
sources | A JSON-encoded list of retrieved source docs |
done | End-of-stream marker |
error | A user-facing error message |
The frontend appends token text as it arrives, then does one final markdown render at the end so the last chunk is not left half-parsed.
Feedback, telemetry, and quality signals
Section titled “Feedback, telemetry, and quality signals”Ask AI records more than the final answer. It also keeps enough context to tell whether the feature is helping or drifting.
The telemetry table is:
docs_ai_interactions
That row stores:
- The question and final answer
- The anonymous
session_id - A hash of the client IP, not the raw IP
- The user agent
- The chat and embedding model names
- Retrieved source metadata
- Retrieval scores
- Feedback state
- Self-rating state
- Failure signals
- Request status:
pending,completed, orerror
Feedback flow
Section titled “Feedback flow”Once a streamed answer finishes successfully, the frontend shows:
- Thumbs up
- Thumbs down
If the user picks thumbs down, they can choose one reason:
incorrectmissing_docsoutdatedunclearother
They can also add an optional note up to 500 characters.
Feedback is posted to POST /api/docs-ai/feedback. The backend only accepts it if the session_id matches the original ask request. That keeps one browser session from rating another session’s answer.
Self-rating
Section titled “Self-rating”If DOCS_AI_SELF_RATING_ENABLED is on, the backend runs a follow-up scoring pass after the answer is stored.
That pass:
- Grades the answer from 1 to 5
- Stores a short reason
- Falls back to heuristics if the model returns empty or malformed output
The self-rating logic is opinionated in one important way: if the answer says the docs do not cover the question, the score is forced low. That keeps a polite non-answer from being counted as a success.
Failure signals
Section titled “Failure signals”The backend rolls several signals into a single failure classification:
low_matchthumbs_downself_low_score
An interaction is marked as failed when one or more of those signals are present.
Safety and guardrails
Section titled “Safety and guardrails”This feature is public, so the guardrails are straightforward and mostly practical.
API-level guardrails
Section titled “API-level guardrails”POST /api/docs-ai/askis public, but rate-limited per IPPOST /api/docs-ai/feedbackis public, and separately rate-limitedPOST /api/docs-ai/reindexrequiresX-Reindex-Secret- Swagger and OpenAPI routes are only available from localhost
- CORS is limited to the docs site origin, plus local dev origins in local environments
Frontend guardrails
Section titled “Frontend guardrails”- Answer markdown is sanitized before insertion into the page
- Source links are built from encoded path segments
- Malformed SSE lines are logged and ignored instead of crashing the modal
- Missing response bodies are treated as errors instead of assumed to be streamable
Data handling
Section titled “Data handling”The feature stores a small amount of request metadata for debugging and quality tracking:
- Anonymous browser
session_id - Hashed client IP
- User agent
- Question text
- Final answer
- Feedback and scoring metadata
The raw client IP is not written to the interaction table. It is hashed first with EXTERNAL_API_SALT.
API surface
Section titled “API surface”The docs AI backend lives in apps/backend/src/docs_api.py.
Public endpoints:
| Endpoint | Purpose |
|---|---|
GET /api/docs-ai/health | Health check |
POST /api/docs-ai/ask | Ask a question and receive a streamed answer |
POST /api/docs-ai/feedback | Submit thumbs up or thumbs down feedback |
Protected endpoint:
| Endpoint | Purpose |
|---|---|
POST /api/docs-ai/reindex | Rebuild the docs embedding index |
The FastAPI docs and OpenAPI schema are also exposed, but only to localhost:
/api/docs-ai/docs/api/docs-ai/openapi.json
Configuration
Section titled “Configuration”The main backend settings live in apps/backend/settings.py.
The most important ones are:
OPENROUTER_API_KEYDOCS_AI_EMBEDDING_MODELDOCS_AI_CHAT_MODELDOCS_AI_RATE_LIMIT_RPMDOCS_AI_FEEDBACK_RATE_LIMIT_RPMDOCS_AI_LOW_MATCH_THRESHOLDDOCS_AI_SELF_RATING_ENABLEDDOCS_AI_SELF_RATING_MODELDOCS_AI_SELF_RATING_FAILURE_THRESHOLDDOCS_AI_TRUSTED_PROXY_IPSDOCS_SITE_URLDOCS_AI_REINDEX_SECRET
The frontend uses:
PUBLIC_DOCS_AI_API_URL
If that frontend value is not set, the component falls back to http://localhost:8042.
Local development
Section titled “Local development”If you want to run the whole feature locally, the usual flow is:
- Start the docs API from
apps/backend/ - Start the docs site from
apps/docs/ - Build or serve the docs site so
docs-export.jsonexists - Run the indexer against that export
Example commands:
# Start the docs AI APIcd apps/backendPYTHONPATH=.:src POSTGRESQL_HOST=localhost uv run fastapi dev src/docs_api.py --host 0.0.0.0 --port 8042# Start the docs frontendnpx nx serve docs# Index the docs export into pgvectorcd apps/backendPOSTGRESQL_HOST=localhost uv run python -m cli_tools.index_docs --source ../docs/dist/docs-export.json --clearYou can also trigger reindexing through the protected API endpoint if DOCS_AI_REINDEX_SECRET is configured.
Implementation map
Section titled “Implementation map”If you need to debug or extend the feature, these are the main files:
| Area | Files |
|---|---|
| Docs UI | apps/docs/src/components/Header.astro, apps/docs/src/components/AskAiChat.astro |
| Exported corpus | apps/docs/src/pages/docs-export.json.ts |
| API server | apps/backend/src/docs_api.py |
| OpenRouter client | apps/backend/src/scripts/services/docs_ai/client.py |
| Embeddings | apps/backend/src/scripts/services/docs_ai/embedder.py |
| Retrieval | apps/backend/src/scripts/services/docs_ai/retriever.py |
| Answer generation | apps/backend/src/scripts/services/docs_ai/generator.py |
| Chunking | apps/backend/src/scripts/services/docs_ai/chunker.py |
| Indexing | apps/backend/src/scripts/services/docs_ai/indexer.py, apps/backend/cli_tools/index_docs.py |
| Feedback and scoring | apps/backend/src/scripts/services/docs_ai/feedback.py |
| Data models | apps/backend/src/scripts/models/docs_ai_models.py, apps/backend/src/scripts/models/models.py |
| Database migrations | apps/backend/migrations/versions/cc2d0adef6af_add_docs_embeddings_table.py, apps/backend/migrations/versions/d4e2f6a19c41_add_docs_ai_interactions.py |
One last clarification
Section titled “One last clarification”Ask AI is intentionally narrow. It answers questions about the published docs. It does not have the full session memory, permissions model, tool access, or long-running workflow support that the in-app Eric assistant has.
That narrower shape is a good thing here. The docs-site assistant is fast, simple, public, and easy to reason about. It only works if the documentation is actually there. When it fails, the team can usually trace the problem to one of four places: the page was never documented, the index is stale, retrieval picked weak chunks, or the answer model could not turn good context into a clean response.