RAG Workshop
Build a complete Retrieval-Augmented Generation system using ChromaDB, Ollama, and Python
RAG Workshop: Build Your Own AI-Powered Question-Answering System
Welcome to the RAG (Retrieval-Augmented Generation) Workshop! In this hands-on tutorial, you'll build a complete AI-powered question-answering system that runs entirely on your local machine. By the end of this workshop, you'll have a working application that can intelligently answer questions about food by combining the power of vector databases, embeddings, and large language models.
You'll create a Food RAG application - a smart question-answering system that understands and responds to natural language questions about various foods, including Indian dishes, fruits, and their nutritional information. This application demonstrates the core principles of RAG technology that powers modern AI assistants and chatbots.
Local AI infrastructure running on your computer - no cloud services required
Vector database powered by ChromaDB for efficient similarity search
Text embedding model (mxbai-embed-large) that converts text into meaningful numerical vectors
Large language model (llama3.2) for generating human-like responses
Interactive command-line interface for asking questions and receiving answers
Complete dataset of food information stored and searchable via semantic search
RAG is the foundation of modern AI assistants like ChatGPT's custom GPTs, Microsoft Copilot, and enterprise AI solutions
Understanding RAG enables you to build AI applications that are grounded in your own data and documents
Local deployment means privacy, control, and no ongoing API costs
The skills you learn apply to any domain - customer support, documentation search, knowledge management, and more
Starting with a simple food example makes the concepts clear before scaling to complex enterprise applications
Learn how Retrieval-Augmented Generation combines information retrieval with language generation to produce accurate, context-aware responses
Understand how text is converted into numerical vectors that capture semantic meaning, enabling AI to find similar content
Master the setup and use of Ollama to run large language models on your own computer without relying on external APIs
Discover how ChromaDB stores and retrieves embeddings efficiently for fast similarity search
Build the complete data flow: question → embedding → retrieval → context augmentation → response generation
Data Preparation
The food dataset (containing information about dishes, ingredients, and nutrition) is loaded into the system
Embedding Generation
Each food item's text description is converted into a vector embedding using the mxbai-embed-large model
Vector Storage
All embeddings are stored in ChromaDB, creating a searchable vector database of food knowledge
Question Processing
When you ask a question, it's also converted into a vector embedding using the same model
Similarity Search
ChromaDB finds the most similar food items by comparing vector embeddings mathematically
Context Retrieval
The relevant food information is retrieved and compiled as context for the language model
Response Generation
The llama3.2 model generates a natural language answer using both your question and the retrieved context
Answer Delivery
You receive a comprehensive, contextually accurate answer based on the actual food data in the system
You'll have a functioning RAG application running on your local machine
You'll understand how to convert any text dataset into a searchable knowledge base
You'll be able to ask natural language questions and receive contextually relevant answers
You'll grasp the fundamental architecture that powers modern AI question-answering systems
You'll have hands-on experience with industry-standard tools: Ollama, ChromaDB, and Python
You'll be ready to adapt this system for your own use cases - whether that's company documentation, research papers, or any other text-based knowledge domain
Required:
A computer with at least 8GB RAM (16GB recommended for better performance)
At least 5GB of free disk space for models and software
Basic familiarity with using a terminal or command prompt
Willingness to learn - no coding experience required!
Optional:
Python programming knowledge (helpful but not required)
Understanding of basic AI/ML concepts (we'll explain everything)
Previous experience with command-line tools (we provide detailed instructions)
Ready to Get Started?
Follow the steps below to build your RAG application from scratch!
AI's Secret Language: Vector Embeddings
This video explains how AI systems convert text into numerical vectors that capture semantic meaning. Understanding embeddings is crucial for RAG systems as they enable similarity search across documents. Watch this before proceeding to understand the foundation of how our RAG system will work.
Chapter 19: Install LLM on your laptop
Learn how to install and set up Ollama for running large language models locally
Chapter 20: Using LLM locally using Python
Understand how to interact with local LLMs through Python code
Chapter 21: Using LLM with Python (Code samples)
Practical code examples for integrating LLMs into your applications
You need to login to the LMS via Google authentication and have access via the coupon provided during enrollment.
Ollama is a tool that allows you to run large language models locally on your computer. This means you don't need to send your data to external services, providing better privacy and control.
Installation Steps:
Visit the Ollama website and download the installer for your operating system
Run the installer and follow the setup wizard
Open a terminal/command prompt to verify installation
Test that Ollama is working by running the version command
Check if Ollama is installed correctly
ollama --versionWe need two types of models: an embedding model (mxbai-embed-large) to convert text into vectors, and a language model (llama3.2) to generate responses. Follow the steps below to download these models to your computer.
How to Download Models:
Open a terminal window on your computer
On Windows: Press the Windows key, type 'PowerShell' or 'Command Prompt', and press Enter
On Mac: Press Command + Space, type 'Terminal', and press Enter
Copy the command for each model by clicking the copy button
Paste the command into your terminal window (Right-click and select Paste, or press Ctrl+V on Windows / Command+V on Mac)
Press Enter to start downloading the model
Wait for the download to complete before proceeding to the next model
The models will download in the background - you'll see progress indicators
Converts text into numerical vectors for similarity search
Purpose:
Text Embedding Model
ollama pull mxbai-embed-largeGenerates human-like text responses based on context
Purpose:
Language Model
ollama pull llama3.2Python is the programming language we'll use to build our RAG application. If you already have Python installed, you can skip this step.
Installation Steps:
Download Python from the official website
Run the installer (make sure to check 'Add Python to PATH')
Verify installation by checking the version
Install pip (Python package manager) if not included
Check Python version (should be 3.8 or higher)
python --versionCheck Python version on macOS/Linux (should be 3.8 or higher)
python3 --versionCheck pip (package manager) version
pip --versionCheck pip (package manager) version on macOS/Linux
pip3 --versionVS Code Insiders is the preview version of Visual Studio Code with the latest features, including enhanced AI capabilities and GitHub Copilot integration.
Installation Steps:
Download VS Code Insiders from the official website
Install the application following the setup wizard
Launch VS Code Insiders
Familiarize yourself with the interface
Git is a version control system that tracks changes in your code and allows you to collaborate with others. We'll use it to download the RAG project code.
Installation Steps:
Download Git from the official website
Run the installer with default settings
Open a new terminal/command prompt
Verify Git installation
Check if Git is installed correctly
git --versionGitHub Copilot is an AI coding assistant that helps you write code faster and more efficiently. It provides intelligent code suggestions, completions, and can help you understand and debug code throughout this workshop.
Complete GitHub Copilot Setup
Follow our comprehensive guide to set up GitHub Copilot, including account creation, installation, and configuration
Learn GitHub Copilot Features
Discover how to use Copilot's chat, inline suggestions, and other productivity features
Get Subscription Recommendations
Find out which GitHub Copilot plan is right for you and how to sign up
Setting up GitHub Copilot is highly recommended for this workshop as it will significantly speed up your coding and help you understand the RAG implementation better. Visit the Developer Productivity page for detailed setup instructions.
The RAG-Food repository contains a complete working example of a Retrieval-Augmented Generation system using local models. Clone this repository to get all the code and data needed for the workshop.
Access the RAG-Food repository that contains all the code and documentation for this workshop
git clone https://github.com/gocallum/ragfoodLocal LLM via Ollama
Local embeddings via mxbai-embed-large
ChromaDB as the vector database
Simple food dataset in JSON (Indian foods, fruits, etc.)
We'll clone the RAG-Food repository, which contains a complete working example of a Retrieval-Augmented Generation system using local models.
Clone the repository to your local machine
Navigate into the project directory
Explore the project structure
Verify all files are present
Activation Commands:
cd ragfoodcd ragfoodImportant Notes
Make sure you have Git installed before cloning
The repository contains everything needed to run the RAG system
Check the README.md for additional setup instructions
Python packages are pre-written code libraries that provide specific functionality. We need ChromaDB for vector storage and requests for HTTP communication.
pip install chromadb requestsAlternative (manual install):
pip install --user chromadb requestsHandles the storage and retrieval of text embeddings
Communicates with the Ollama API to get embeddings and responses
Troubleshooting
If installation fails, try updating pip: python -m pip install --upgrade pip
If installation fails on macOS/Linux, try updating pip3: python3 -m pip install --upgrade pip
For permission errors on macOS/Linux, use: pip3 install --user chromadb requests
If you get SSL errors, try: pip install --trusted-host pypi.org --trusted-host pypi.python.org chromadb requests
If you get SSL errors on macOS/Linux, try: pip3 install --trusted-host pypi.org --trusted-host pypi.python.org chromadb requests
Now we'll run the RAG application and test it with questions about food. The system will search through the food database and provide relevant answers.
Start the RAG application (Windows)
python rag_run.pyExpected: The system initializes ChromaDB, loads food data, and starts accepting questions
Start the RAG application (macOS/Linux)
python3 rag_run.pyExpected: The system initializes ChromaDB, loads food data, and starts accepting questions
Navigate to the ragfood directory
Ensure all dependencies are installed
Run the application with python rag_run.py (Windows) or python3 rag_run.py (macOS/Linux)
Wait for the system to initialize
Enter your questions about food when prompted
Tips
Start with simple questions to test the system
Try asking about specific ingredients or cuisines
Experiment with questions that require combining information from multiple foods
Your question is converted into a vector embedding using mxbai-embed-large
ChromaDB searches for similar food items in the vector database
Relevant food information is retrieved and sent to llama3.2
The language model generates a comprehensive answer based on the context
Congratulations! 🎉
You've successfully completed the basic RAG workshop! You now understand the fundamentals of Retrieval-Augmented Generation systems.
- Local vector database with ChromaDB
- Local LLM hosting with Ollama
- Understanding of embedding generation
- RAG query processing pipeline
- Interactive question-answering system