RAG pipelines, retrieval that earns the answer, citations users can verify.
RAG fails quietly. The model returns a confident answer; the answer cites the wrong section; the user doesn't notice until they're talking to their boss. We build retrieval pipelines where the citation is the proof, evaluation is continuous, and the answer falls back to 'I don't know' instead of hallucinating.
What we build
Chunking that respects document structure
Markdown gets chunked on heading boundaries, PDFs on page breaks, code on function boundaries, transcripts on speaker turns. No 512-token sliding window for every input type. Chunks carry their parent document, position, and structural breadcrumb back.
Hybrid retrieval, not just embeddings
Vector similarity for semantic match plus BM25 for exact terms (model names, API methods, error codes). Reranked with a small cross-encoder before the LLM call. Pure embedding retrieval misses too many keyword-shaped queries to ship to production.
Eval harness for retrieval, not just generation
Recall@k and MRR measured against a labelled question/answer set that ships with the product. Every retrieval-layer change runs the eval suite. We don't ship 'the embeddings feel better' without numbers backing it.
Citations that link to the source span
Every claim in the LLM output is bound to a retrieval chunk via citation markers the UI renders as inline footnotes. Click the footnote, see the source span highlighted in the original document. Hallucinations become reportable, not invisible.
Refusal over hallucination
If retrieval returns nothing above the relevance floor, the model is instructed to refuse instead of synthesizing. 'I don't have a source for that' beats a confident wrong answer when the user is making a decision off the response.
Re-indexing without downtime
Embeddings change (new model version, new chunker). Re-index runs to a shadow table, validates against the eval suite, and atomic-swaps. The product never serves a half-indexed corpus to a customer.
Where this fits
You shipped a RAG demo that works on the seed corpus and fails the moment the customer's real documents arrive.
Your AI feature is generating answers but the support team can't verify them because there are no citations.
You're embedding documents with one model version, querying with another, and the relevance has been drifting for months.
Tech stack
- TypeScript
- pgvector
- OpenAI Embeddings
- Postgres
- BullMQ
Want this for your team?
30 minutes with a founder or senior engineer. We'll scope what you need and tell you straight whether Stacklane fits.
Book a Free CallRelated capabilities

