Skip to main content

ViD-LLM Powered Visual Question Answering System

The Challenge
#

Understanding images through natural language requires expensive cloud APIs. Local deployment of vision-language models is complex.

The Solution
#

Deployed Salesforce/blip-vqa-base model on CPU with a Gradio interface, enabling real-time visual question answering without GPU requirements.

Key Achievement
#

Enabled multimodal AI capabilities on standard hardware, opening possibilities for AR/VR applications in EdTech and MedTech.

Technologies Used
#

Transformers, PyTorch, Gradio, Computer Vision, NLP