An end-to-end Computer Vision and RAG pipeline that transforms static documents into actionable intelligence using YOLOv8 and Llama 3.
The pipeline utilizes a decoupled architecture: **FastAPI** handles high-concurrency requests, while the **CV Pipeline** processes documents asynchronously to ensure zero-latency user experience.
YOLOv8 detects bounding boxes for headers, paragraphs, and tables to preserve document hierarchy.
PaddleOCR extracts raw text while maintaining spatial coordinates for accurate data mapping.
Custom algorithms convert visual table grids into clean, structured JSON formats for LLM ingestion.
*The system uses a Cross-Encoder re-ranking step to ensure 95%+ retrieval relevance for complex user queries.