Production
AI orchestration layer for a materials engineering assistant
Intent resolution, hybrid retrieval, and response assembly for a domain-specific AI assistant.
Context
The system existed to help materials engineers search a large technical corpus through natural language without losing precision. The challenge was to keep the system useful for real work, not just conversationally fluent.
Architecture
I split the pipeline into explicit stages: resolver, query builder, hybrid retrieval, and response assembly. That made intent handling testable and kept retrieval logic separate from generation.
Query flow
User Query
-> Resolver
-> Query Builder
-> Hybrid Retrieval
-> Response Assembly
Key components
- Resolver for intent, scope, and entity extraction
- Query builder for deterministic structured retrieval
- Hybrid retrieval across lexical, vector, and relational sources
- Response assembly layer for context construction and answer generation
Tradeoffs
- Precision versus recall when translating ambiguous language into structured search
- Deterministic routing versus flexible model-driven behavior
- Latency budgets across retrieval, ranking, and generation
Lessons
- Separating retrieval preparation from retrieval execution improved maintainability
- Clear interfaces between stages made relevance tuning much easier
- The architecture became more reliable once generation stopped owning search behavior
Related writing
Hybrid search in practiceHybrid search becomes a systems design problem once score normalization, query routing, and operational tuning enter the picture.Why a RAG pipeline needs a resolverMost RAG stacks move too quickly from prompt to retrieval. A resolver layer improves precision by decomposing intent, extracting entities, and converting ambiguous language into structured search actions.