Advanced RAG Workshop

Cloud-Ready RAG System

Migrate from Local to Production-Ready Cloud Infrastructure

Learn to deploy RAG systems using Upstash Vector Database and cloud-hosted LLMs. Replace ChromaDB with serverless vector storage and Ollama with Groq Cloud for scalable, production-ready applications.

Basic Workshop First

Prerequisites: Complete the Basic RAG Workshop first. You should have a working local RAG system with ChromaDB and Ollama before starting this advanced workshop.

Migration Overview

Transform your local RAG system into a cloud-ready solution

From Local Setup

ChromaDB (Local Vector Storage)

Ollama (Local LLM Hosting)

Manual Embedding Generation

Local Dependencies & Setup

To Cloud Infrastructure

Upstash Vector (Serverless)

Groq Cloud (Fast Inference)

Built-in Embeddings

Production-Ready Scaling

Advanced Workshop Steps

Follow these steps to migrate your RAG system to the cloud

Understanding Upstash Vector Database

Learn about Upstash Vector and built-in embedding models

This video explains how Upstash embeddings work, how the models we offer compare, and which model is best for your use case. Understanding this is crucial as we'll be using Upstash's built-in embedding model (mixedbread-ai/mxbai-embed-large-v1) which eliminates the need for external embedding services.

Upstash Embedding Models - Video Guide

Key Points:

Built-in embedding models eliminate external dependencies
mixedbread-ai/mxbai-embed-large-v1 with 1024 dimensions and 512 sequence length
MTEB score of 64.68 for high-quality embeddings
Automatic vectorization of text data

Watch on YouTube

Setup Vercel Account

Create your Vercel account and access the Storage dashboard

Vercel provides seamless integration with Upstash services. We'll use GitHub authentication for easy setup and access to storage integrations.

Setup Steps:

Visit vercel.com and click 'Sign Up'
Choose 'Continue with GitHub' for seamless integration
Complete the account setup process
Navigate to your Vercel dashboard
Click on 'Storage' in the main navigation

Vercel Sign Up

Create Upstash Vector Database

Set up your cloud-based vector database with built-in embeddings

Upstash Vector Database provides serverless vector storage with built-in embedding models. This eliminates the need for local ChromaDB and external embedding services.

1Access Upstash Vector

In your Vercel Storage dashboard, locate and click on 'Upstash'

→ Look for the Upstash logo and select 'Upstash Vector Database'

2Choose Free Tier

Select the free version to get started without any cost

→ Click on 'Free' plan option

3Configure Database

Set up your vector database with the optimal settings

Database name: Choose any name (e.g., 'rag-food-advanced')
Region: Select the closest region to your location
Embedding Model: Select 'mixedbread-ai/mxbai-embed-large-v1'
Similarity Function: Choose 'Cosine' for best semantic search results

4Create Database

Finalize the creation of your vector database

→ Click 'Create Database' and wait for provisioning

Key Benefits:

Built-in embedding model (no external API needed)
Serverless and scalable
1024-dimensional vectors with 512 sequence length
Automatic text vectorization
Cosine similarity for semantic search

Configure Environment Variables

Set up your Upstash Vector credentials in .env file

After creating your Upstash Vector database, you'll receive three important credentials. These will replace your local ChromaDB configuration.

UPSTASH_VECTOR_REST_TOKEN

Authentication token for API access

Usage: Used for all database operations

UPSTASH_VECTOR_REST_URL

REST API endpoint for your database

Usage: Base URL for all API calls

UPSTASH_VECTOR_REST_READONLY_TOKEN

Read-only token for query operations

Usage: Optional: for read-only access patterns

.env File Example

UPSTASH_VECTOR_REST_TOKEN="*************************"
UPSTASH_VECTOR_REST_URL="***********************"
UPSTASH_VECTOR_REST_READONLY_TOKEN="**********************************"

Setup Instructions:

Copy the credentials from your Upstash dashboard
Create or update your .env file in the project root
Paste the credentials exactly as provided
Save the file and ensure it's in your .gitignore

Create Design Document with AI

Use GitHub Copilot and Claude to plan the migration

Before making code changes, we'll create a comprehensive design document. This approach helps us understand the implications of switching from ChromaDB to Upstash Vector and ensures we don't miss important details.

Why Create a Design Document First?

Understand the scope of changes required
Identify potential issues before coding
Plan the migration strategy systematically
Document the differences between local and cloud solutions
Create a reference for future development

GitHub Copilot Prompt for Design Document

Create a detailed design document to replace ChromaDB with Upstash Vector Database. I have added the Upstash Vector credentials in the .env file.

Key Information about Upstash Vector:
- Built-in embedding model: mixedbread-ai/mxbai-embed-large-v1
- 1024 dimensions, 512 sequence length, MTEB score 64.68
- Automatic text vectorization (no need for external embedding API)
- Serverless and cloud-hosted
- Cosine similarity for semantic search

Current Implementation Details:
- Using ChromaDB for local vector storage
- Using Ollama's mxbai-embed-large for embeddings
- Python-based RAG system with food data
- Manual embedding generation and upsert process

Requirements for Migration:
1. Replace ChromaDB client with Upstash Vector client
2. Remove manual embedding generation (Upstash handles this automatically)
3. Update upsert process to use raw text data instead of pre-computed embeddings
4. Modify query process to work with Upstash Vector API
5. Handle authentication and error management
6. Maintain the same RAG functionality and user experience

Study the following sites to guide the design:
https://upstash.com/docs/vector/features/embeddingmodels 


Please provide:
- Architecture comparison (before vs after)
- Detailed implementation plan
- Code structure changes required
- API differences and implications
- Error handling strategies
- Performance considerations
- Cost implications of cloud vs local
- Security considerations for API keys

Usage: Create a new document eg upstash-migration-prd.md and use prompt to generate the content

Setup Groq Cloud Account

Create account and get API key for cloud-hosted LLMs

Groq Cloud provides fast inference for large language models. We'll replace local Ollama with Groq's cloud-hosted llama-3.1-8b-instant model for better performance and reliability.

Setup Steps:

Visit console.groq.com and create an account
Navigate to the API Keys section
Click 'Create API Key' and give it a descriptive name
Copy the API key immediately (you won't see it again)
Add the key to your .env file as GROQ_API_KEY

llama-3.1-8b-instant

Fast, efficient language model optimized for quick responses

8 billion parameters for good performance
Instant inference with low latency
Cost-effective for development and production
Compatible with OpenAI API format

Groq Console

Plan LLM Migration with AI

Create migration strategy from Ollama to Groq Cloud

Similar to the database migration, we'll use AI to plan the transition from local Ollama to cloud-hosted Groq. This ensures we understand all the changes needed.

GitHub Copilot Prompt for LLM Migration

Create a detailed plan to migrate from local Ollama LLM to Groq Cloud API. I have added the GROQ_API_KEY to the .env file.

Current Implementation:
- Using Ollama locally with llama3.2 model
- Direct HTTP requests to localhost:11434/api/generate
- Streaming disabled for simplicity
- Local inference with no API costs

Target Implementation:
- Groq Cloud API with llama-3.1-8b-instant model
- HTTP requests to Groq's API endpoints
- API key authentication required
- Cloud-based inference with usage-based pricing

Groq API Details:
- Model: "llama-3.1-8b-instant"
- Endpoint: Groq's REST API
- Authentication: Bearer token with API key
- Response format: Similar to OpenAI API
- Rate limits and usage tracking

Code Example Reference:
```python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": ""}],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=True,
    stop=None
)
```

Please provide:
- Detailed migration steps
- Code changes required
- Error handling for API failures
- Rate limiting considerations
- Cost implications and usage monitoring
- Fallback strategies
- Testing approach
- Performance comparison expectations

Usage: Use this prompt with GitHub Copilot or Claude to get detailed migration guidance

Groq API Integration Example

from groq import Groq

client = Groq()
completion = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
      {
        "role": "user",
        "content": ""
      }
    ],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

This shows the basic structure for Groq API calls. Note the different model name and API structure compared to Ollama.

Implement the Migration

Apply the planned changes with AI assistance

Now we'll implement the changes planned in our design documents. Use GitHub Copilot in 'agent mode' to assist with the actual code modifications.

1Database Migration

Replace ChromaDB with Upstash Vector

Install upstash-vector Python package
Replace ChromaDB client initialization
Update upsert process to use raw text
Modify query process for Upstash API
Remove manual embedding generation

2LLM Migration

Replace Ollama with Groq Cloud

Install groq Python package
Replace Ollama HTTP calls with Groq client
Update model name to llama-3.1-8b-instant
Add proper error handling for API calls
Implement rate limiting if needed

3Testing & Validation

Ensure everything works correctly

Test database connectivity
Verify embedding generation
Test query functionality
Validate LLM responses
Performance testing

Using GitHub Copilot Agent Mode

Open GitHub Copilot Chat in your IDE
Reference your design documents
Ask Copilot to help implement specific changes
Use '@workspace' to give Copilot context about your project
Request code reviews and suggestions for improvements

Understanding Cloud Pricing

Learn about the cost implications of cloud-hosted AI services

Moving from local to cloud services introduces costs. Understanding pricing helps you make informed decisions and optimize usage.

Vercel AI Gateway(Llama 3.1 8B)

Context:131K tokens

Input:$0.05/M

Output:$0.08/M

Vercel provides $5 credit when you sign up, giving you substantial usage for development and testing.

Anthropic Claude Sonnet 4

Context:200K tokens

Input:$3.00/M

Output:$15.00/M

Cache Read:$0.30/M

Premium model with state-of-the-art performance, especially strong in coding tasks with 72.7% on SWE-bench.

OpenAI GPT-5

Context:400K tokens

Input:$1.25/M

Output:$10.00/M

Cache Read:$0.13/M

OpenAI's flagship model excelling at complex reasoning and multi-step agentic tasks.

Cost Optimization Tips:

Start with free tiers and credits
Monitor usage through provider dashboards
Use caching to reduce repeated API calls
Optimize prompt length to reduce token usage
Consider model selection based on task complexity
Implement usage limits in your applications

Bonus: Vercel AI Gateway Integration

Advanced integration with Vercel's AI Gateway for production use

For production applications, Vercel AI Gateway provides additional benefits like caching, analytics, and unified API access to multiple AI providers.

Benefits:

Unified API for multiple AI providers
Built-in caching for cost optimization
Usage analytics and monitoring
Rate limiting and quota management
Seamless integration with Vercel deployments

Setup Steps:

Enable AI Gateway in your Vercel project settings
Configure your preferred AI providers
Update your code to use Vercel AI SDK
Implement caching strategies
Set up monitoring and alerts

Vercel AI SDK Integration

import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'What is love?'
})

The Vercel AI SDK provides a standardized interface for multiple AI providers with built-in optimizations.

Congratulations! 🚀

You've successfully migrated your RAG system to a production-ready cloud infrastructure! Your system now features serverless vector database, cloud-hosted LLMs, and production-ready scalability. Ready for the next step? Deploy your RAG system as a web application!

Serverless vector database with Upstash
Built-in embedding models (no external APIs)
Fast cloud-hosted LLM inference with Groq
Production-ready scalability
Cost-effective cloud deployment

Deploy to Web Explore Builder's Toolkit Back to Main Workshop