RAG Workshop

Build a complete Retrieval-Augmented Generation system using ChromaDB, Ollama, and Python

ChromaDBOllamaPythonVector Embeddings

RAG Workshop: Build Your Own AI-Powered Question-Answering System

Welcome to the RAG (Retrieval-Augmented Generation) Workshop! In this hands-on tutorial, you'll build a complete AI-powered question-answering system that runs entirely on your local machine. By the end of this workshop, you'll have a working application that can intelligently answer questions about food by combining the power of vector databases, embeddings, and large language models.

What You'll Build

You'll create a Food RAG application - a smart question-answering system that understands and responds to natural language questions about various foods, including Indian dishes, fruits, and their nutritional information. This application demonstrates the core principles of RAG technology that powers modern AI assistants and chatbots.

Local AI infrastructure running on your computer - no cloud services required

Vector database powered by ChromaDB for efficient similarity search

Text embedding model (mxbai-embed-large) that converts text into meaningful numerical vectors

Large language model (llama3.2) for generating human-like responses

Interactive command-line interface for asking questions and receiving answers

Complete dataset of food information stored and searchable via semantic search

Why This Workshop Matters

RAG is the foundation of modern AI assistants like ChatGPT's custom GPTs, Microsoft Copilot, and enterprise AI solutions

Understanding RAG enables you to build AI applications that are grounded in your own data and documents

Local deployment means privacy, control, and no ongoing API costs

The skills you learn apply to any domain - customer support, documentation search, knowledge management, and more

Starting with a simple food example makes the concepts clear before scaling to complex enterprise applications

What You'll Learn

Understanding RAG Architecture

Learn how Retrieval-Augmented Generation combines information retrieval with language generation to produce accurate, context-aware responses

Vector Embeddings

Understand how text is converted into numerical vectors that capture semantic meaning, enabling AI to find similar content

Local LLM Deployment

Master the setup and use of Ollama to run large language models on your own computer without relying on external APIs

Vector Databases

Discover how ChromaDB stores and retrieves embeddings efficiently for fast similarity search

RAG Pipeline

Build the complete data flow: question → embedding → retrieval → context augmentation → response generation

How the Food RAG System Works

Data Preparation

The food dataset (containing information about dishes, ingredients, and nutrition) is loaded into the system

Embedding Generation

Each food item's text description is converted into a vector embedding using the mxbai-embed-large model

Vector Storage

All embeddings are stored in ChromaDB, creating a searchable vector database of food knowledge

Question Processing

When you ask a question, it's also converted into a vector embedding using the same model

Similarity Search

ChromaDB finds the most similar food items by comparing vector embeddings mathematically

Context Retrieval

The relevant food information is retrieved and compiled as context for the language model

Response Generation

The llama3.2 model generates a natural language answer using both your question and the retrieved context

Answer Delivery

You receive a comprehensive, contextually accurate answer based on the actual food data in the system

By the End of This Workshop

You'll have a functioning RAG application running on your local machine

You'll understand how to convert any text dataset into a searchable knowledge base

You'll be able to ask natural language questions and receive contextually relevant answers

You'll grasp the fundamental architecture that powers modern AI question-answering systems

You'll have hands-on experience with industry-standard tools: Ollama, ChromaDB, and Python

You'll be ready to adapt this system for your own use cases - whether that's company documentation, research papers, or any other text-based knowledge domain

Prerequisites

Required:

A computer with at least 8GB RAM (16GB recommended for better performance)

At least 5GB of free disk space for models and software

Basic familiarity with using a terminal or command prompt

Willingness to learn - no coding experience required!

Optional:

Python programming knowledge (helpful but not required)

Understanding of basic AI/ML concepts (we'll explain everything)

Previous experience with command-line tools (we provide detailed instructions)

Ready to Get Started?

Follow the steps below to build your RAG application from scratch!

Step 1

Understanding Vector Embeddings

Learn the fundamental concepts behind RAG and vector embeddings

AI's Secret Language: Vector Embeddings

This video explains how AI systems convert text into numerical vectors that capture semantic meaning. Understanding embeddings is crucial for RAG systems as they enable similarity search across documents. Watch this before proceeding to understand the foundation of how our RAG system will work.

Step 2

LMS Training Videos

Watch comprehensive tutorials on local LLM installation and usage

Chapter 19: Install LLM on your laptop

Learn how to install and set up Ollama for running large language models locally

Open

Chapter 20: Using LLM locally using Python

Understand how to interact with local LLMs through Python code

Open

Chapter 21: Using LLM with Python (Code samples)

Practical code examples for integrating LLMs into your applications

Open

You need to login to the LMS via Google authentication and have access via the coupon provided during enrollment.

Step 3

Install Ollama

Set up Ollama to run large language models locally on your machine

Ollama is a tool that allows you to run large language models locally on your computer. This means you don't need to send your data to external services, providing better privacy and control.

Installation Steps:

Visit the Ollama website and download the installer for your operating system

Run the installer and follow the setup wizard

Open a terminal/command prompt to verify installation

Test that Ollama is working by running the version command

Check if Ollama is installed correctly

ollama --version

Download Ollama

Step 4

Download AI Models

Install the required language and embedding models

We need two types of models: an embedding model (mxbai-embed-large) to convert text into vectors, and a language model (llama3.2) to generate responses. Follow the steps below to download these models to your computer.

How to Download Models:

Open a terminal window on your computer

On Windows: Press the Windows key, type 'PowerShell' or 'Command Prompt', and press Enter

On Mac: Press Command + Space, type 'Terminal', and press Enter

Copy the command for each model by clicking the copy button

Paste the command into your terminal window (Right-click and select Paste, or press Ctrl+V on Windows / Command+V on Mac)

Press Enter to start downloading the model

Wait for the download to complete before proceeding to the next model

The models will download in the background - you'll see progress indicators

mxbai-embed-large

~670MB

Converts text into numerical vectors for similarity search

Purpose:

Text Embedding Model

ollama pull mxbai-embed-large

llama3.2

~2GB

Generates human-like text responses based on context

Purpose:

Language Model

ollama pull llama3.2

Step 5

Install Python

Set up Python programming environment (if not already installed)

Python is the programming language we'll use to build our RAG application. If you already have Python installed, you can skip this step.

Installation Steps:

Download Python from the official website

Run the installer (make sure to check 'Add Python to PATH')

Verify installation by checking the version

Install pip (Python package manager) if not included

Check Python version (should be 3.8 or higher)

python --version

Check Python version on macOS/Linux (should be 3.8 or higher)

python3 --version

Check pip (package manager) version

pip --version

Check pip (package manager) version on macOS/Linux

pip3 --version

Download Python

Step 6

Install VS Code Insiders

Set up the development environment with the latest VS Code features

VS Code Insiders is the preview version of Visual Studio Code with the latest features, including enhanced AI capabilities and GitHub Copilot integration.

Installation Steps:

Download VS Code Insiders from the official website

Install the application following the setup wizard

Launch VS Code Insiders

Familiarize yourself with the interface

Download VS Code Insiders

Step 7

Install Git

Set up Git version control system to clone and manage code repositories

Git is a version control system that tracks changes in your code and allows you to collaborate with others. We'll use it to download the RAG project code.

Installation Steps:

Download Git from the official website

Run the installer with default settings

Open a new terminal/command prompt

Verify Git installation

Check if Git is installed correctly

git --version

Download Git

Step 8

Setup GitHub Copilot

Set up GitHub Copilot for AI-powered coding assistance in VS Code

GitHub Copilot is an AI coding assistant that helps you write code faster and more efficiently. It provides intelligent code suggestions, completions, and can help you understand and debug code throughout this workshop.

Complete GitHub Copilot Setup

Follow our comprehensive guide to set up GitHub Copilot, including account creation, installation, and configuration

Open

Learn GitHub Copilot Features

Discover how to use Copilot's chat, inline suggestions, and other productivity features

Open

Get Subscription Recommendations

Find out which GitHub Copilot plan is right for you and how to sign up

Open

Setting up GitHub Copilot is highly recommended for this workshop as it will significantly speed up your coding and help you understand the RAG implementation better. Visit the Developer Productivity page for detailed setup instructions.

Step 9

Clone GitHub Repository

Download the RAG-Food project code from GitHub

The RAG-Food repository contains a complete working example of a Retrieval-Augmented Generation system using local models. Clone this repository to get all the code and data needed for the workshop.

Repository Access

Access the RAG-Food repository that contains all the code and documentation for this workshop

git clone https://github.com/gocallum/ragfood

View on GitHub

Repository Structure

📁 ragfood/

├── 📄 rag_run.py (main application)

├── 📄 food_data.json (dataset)

├── 📄 requirements.txt (dependencies)

└── 📄 README.md (documentation)

Key Features

Local LLM via Ollama

Local embeddings via mxbai-embed-large

ChromaDB as the vector database

Simple food dataset in JSON (Indian foods, fruits, etc.)

Step 10

Change to Project Directory

Change into the cloned RAG-Food project directory

We'll clone the RAG-Food repository, which contains a complete working example of a Retrieval-Augmented Generation system using local models.

Environment Setup

Clone the repository to your local machine

Navigate into the project directory

Explore the project structure

Verify all files are present

Activation Commands:

Windows

cd ragfood

macOS/Linux

cd ragfood

Important Notes

Make sure you have Git installed before cloning

The repository contains everything needed to run the RAG system

Check the README.md for additional setup instructions

Step 11

Install Python Dependencies

Install the required Python packages for the RAG application

Python packages are pre-written code libraries that provide specific functionality. We need ChromaDB for vector storage and requests for HTTP communication.

Install Dependencies

pip install chromadb requests

Alternative (manual install):

pip install --user chromadb requests

chromadb

latest

Handles the storage and retrieval of text embeddings

requests

latest

Communicates with the Ollama API to get embeddings and responses

Troubleshooting

If installation fails, try updating pip: python -m pip install --upgrade pip

If installation fails on macOS/Linux, try updating pip3: python3 -m pip install --upgrade pip

For permission errors on macOS/Linux, use: pip3 install --user chromadb requests

If you get SSL errors, try: pip install --trusted-host pypi.org --trusted-host pypi.python.org chromadb requests

If you get SSL errors on macOS/Linux, try: pip3 install --trusted-host pypi.org --trusted-host pypi.python.org chromadb requests

Step 12

Run the RAG Application

Execute the RAG system and test it with sample questions

Now we'll run the RAG application and test it with questions about food. The system will search through the food database and provide relevant answers.

Start the RAG application (Windows)

python rag_run.py

Expected: The system initializes ChromaDB, loads food data, and starts accepting questions

Start the RAG application (macOS/Linux)

python3 rag_run.py

Expected: The system initializes ChromaDB, loads food data, and starts accepting questions

Workflow Steps

Navigate to the ragfood directory

Ensure all dependencies are installed

Run the application with python rag_run.py (Windows) or python3 rag_run.py (macOS/Linux)

Wait for the system to initialize

Enter your questions about food when prompted

Tips

Start with simple questions to test the system

Try asking about specific ingredients or cuisines

Experiment with questions that require combining information from multiple foods

How It Works

Your question is converted into a vector embedding using mxbai-embed-large

ChromaDB searches for similar food items in the vector database

Relevant food information is retrieved and sent to llama3.2

The language model generates a comprehensive answer based on the context

Congratulations! 🎉

You've successfully completed the basic RAG workshop! You now understand the fundamentals of Retrieval-Augmented Generation systems.

Local vector database with ChromaDB
Local LLM hosting with Ollama
Understanding of embedding generation
RAG query processing pipeline
Interactive question-answering system

Advanced Cloud Workshop Explore Builder's Toolkit