Skip to main content
75 words·1 min

Lightweight PDF-based RAG System with LLaMA 3.1 8B

The Challenge
#

Large language models require internet connectivity and expensive cloud APIs. Users need offline, efficient document Q&A systems.

The Solution
#

Built a fully Dockerized RAG system using quantized LLaMA 3.1 8B (INT4) that runs entirely offline, answering questions from PDF documents.

Key Achievement
#

Achieved 70% reduction in model size while maintaining accuracy, enabling deployment on modest hardware without internet dependency.

Technologies Used
#

Python, OpenVINO, NNCF, Docker, RAG, Vector Databases, LLMs