Advanced RAG Workshop

Cloud-Ready RAG System

Migrate from Local to Production-Ready Cloud Infrastructure

Learn to deploy RAG systems using Upstash Vector Database and cloud-hosted LLMs. Replace ChromaDB with serverless vector storage and Ollama with Groq Cloud for scalable, production-ready applications.

Basic Workshop First

Migration Overview

Transform your local RAG system into a cloud-ready solution

From Local Setup
ChromaDB (Local Vector Storage)
Ollama (Local LLM Hosting)
Manual Embedding Generation
Local Dependencies & Setup
To Cloud Infrastructure
Upstash Vector (Serverless)
Groq Cloud (Fast Inference)
Built-in Embeddings
Production-Ready Scaling

Advanced Workshop Steps

Follow these steps to migrate your RAG system to the cloud

1
Understanding Upstash Vector Database
Learn about Upstash Vector and built-in embedding models

This video explains how Upstash embeddings work, how the models we offer compare, and which model is best for your use case. Understanding this is crucial as we'll be using Upstash's built-in embedding model (mixedbread-ai/mxbai-embed-large-v1) which eliminates the need for external embedding services.

Upstash Embedding Models - Video Guide

Key Points:
  • Built-in embedding models eliminate external dependencies
  • mixedbread-ai/mxbai-embed-large-v1 with 1024 dimensions and 512 sequence length
  • MTEB score of 64.68 for high-quality embeddings
  • Automatic vectorization of text data
2
Setup Vercel Account
Create your Vercel account and access the Storage dashboard

Vercel provides seamless integration with Upstash services. We'll use GitHub authentication for easy setup and access to storage integrations.

Setup Steps:

  1. Visit vercel.com and click 'Sign Up'
  2. Choose 'Continue with GitHub' for seamless integration
  3. Complete the account setup process
  4. Navigate to your Vercel dashboard
  5. Click on 'Storage' in the main navigation
3
Create Upstash Vector Database
Set up your cloud-based vector database with built-in embeddings

Upstash Vector Database provides serverless vector storage with built-in embedding models. This eliminates the need for local ChromaDB and external embedding services.

1Access Upstash Vector

In your Vercel Storage dashboard, locate and click on 'Upstash'

Look for the Upstash logo and select 'Upstash Vector Database'

2Choose Free Tier

Select the free version to get started without any cost

Click on 'Free' plan option

3Configure Database

Set up your vector database with the optimal settings

  • Database name: Choose any name (e.g., 'rag-food-advanced')
  • Region: Select the closest region to your location
  • Embedding Model: Select 'mixedbread-ai/mxbai-embed-large-v1'
  • Similarity Function: Choose 'Cosine' for best semantic search results

4Create Database

Finalize the creation of your vector database

Click 'Create Database' and wait for provisioning

Key Benefits:

  • Built-in embedding model (no external API needed)
  • Serverless and scalable
  • 1024-dimensional vectors with 512 sequence length
  • Automatic text vectorization
  • Cosine similarity for semantic search
4
Configure Environment Variables
Set up your Upstash Vector credentials in .env file

After creating your Upstash Vector database, you'll receive three important credentials. These will replace your local ChromaDB configuration.

UPSTASH_VECTOR_REST_TOKEN

Authentication token for API access

Usage: Used for all database operations

UPSTASH_VECTOR_REST_URL

REST API endpoint for your database

Usage: Base URL for all API calls

UPSTASH_VECTOR_REST_READONLY_TOKEN

Read-only token for query operations

Usage: Optional: for read-only access patterns

.env File Example

UPSTASH_VECTOR_REST_TOKEN="*************************"
UPSTASH_VECTOR_REST_URL="***********************"
UPSTASH_VECTOR_REST_READONLY_TOKEN="**********************************"

Setup Instructions:

  1. Copy the credentials from your Upstash dashboard
  2. Create or update your .env file in the project root
  3. Paste the credentials exactly as provided
  4. Save the file and ensure it's in your .gitignore
5
Create Design Document with AI
Use GitHub Copilot and Claude to plan the migration

Before making code changes, we'll create a comprehensive design document. This approach helps us understand the implications of switching from ChromaDB to Upstash Vector and ensures we don't miss important details.

Why Create a Design Document First?

  • Understand the scope of changes required
  • Identify potential issues before coding
  • Plan the migration strategy systematically
  • Document the differences between local and cloud solutions
  • Create a reference for future development

GitHub Copilot Prompt for Design Document

Create a detailed design document to replace ChromaDB with Upstash Vector Database. I have added the Upstash Vector credentials in the .env file.

Key Information about Upstash Vector:
- Built-in embedding model: mixedbread-ai/mxbai-embed-large-v1
- 1024 dimensions, 512 sequence length, MTEB score 64.68
- Automatic text vectorization (no need for external embedding API)
- Serverless and cloud-hosted
- REST API based with authentication tokens
- Cosine similarity for semantic search

Current Implementation Details:
- Using ChromaDB for local vector storage
- Using Ollama's mxbai-embed-large for embeddings
- Python-based RAG system with food data
- Manual embedding generation and upsert process

Requirements for Migration:
1. Replace ChromaDB client with Upstash Vector client
2. Remove manual embedding generation (Upstash handles this automatically)
3. Update upsert process to use raw text data instead of pre-computed embeddings
4. Modify query process to work with Upstash Vector API
5. Handle authentication and error management
6. Maintain the same RAG functionality and user experience

Please provide:
- Architecture comparison (before vs after)
- Detailed implementation plan
- Code structure changes required
- API differences and implications
- Error handling strategies
- Performance considerations
- Cost implications of cloud vs local
- Security considerations for API keys

Usage: Copy this prompt into GitHub Copilot Chat or Claude Sonnet 3.5

6
Setup Groq Cloud Account
Create account and get API key for cloud-hosted LLMs

Groq Cloud provides fast inference for large language models. We'll replace local Ollama with Groq's cloud-hosted llama-3.1-8b-instant model for better performance and reliability.

Setup Steps:

  1. Visit console.groq.com and create an account
  2. Navigate to the API Keys section
  3. Click 'Create API Key' and give it a descriptive name
  4. Copy the API key immediately (you won't see it again)
  5. Add the key to your .env file as GROQ_API_KEY

llama-3.1-8b-instant

Fast, efficient language model optimized for quick responses

  • 8 billion parameters for good performance
  • Instant inference with low latency
  • Cost-effective for development and production
  • Compatible with OpenAI API format
7
Plan LLM Migration with AI
Create migration strategy from Ollama to Groq Cloud

Similar to the database migration, we'll use AI to plan the transition from local Ollama to cloud-hosted Groq. This ensures we understand all the changes needed.

GitHub Copilot Prompt for LLM Migration

Create a detailed plan to migrate from local Ollama LLM to Groq Cloud API. I have added the GROQ_API_KEY to the .env file.

Current Implementation:
- Using Ollama locally with llama3.2 model
- Direct HTTP requests to localhost:11434/api/generate
- Streaming disabled for simplicity
- Local inference with no API costs

Target Implementation:
- Groq Cloud API with llama-3.1-8b-instant model
- HTTP requests to Groq's API endpoints
- API key authentication required
- Cloud-based inference with usage-based pricing

Groq API Details:
- Model: "llama-3.1-8b-instant"
- Endpoint: Groq's REST API
- Authentication: Bearer token with API key
- Response format: Similar to OpenAI API
- Rate limits and usage tracking

Code Example Reference:
```python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content": ""}],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=True,
    stop=None
)
```

Please provide:
- Detailed migration steps
- Code changes required
- Error handling for API failures
- Rate limiting considerations
- Cost implications and usage monitoring
- Fallback strategies
- Testing approach
- Performance comparison expectations

Usage: Use this prompt with GitHub Copilot or Claude to get detailed migration guidance

Groq API Integration Example

from groq import Groq

client = Groq()
completion = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
      {
        "role": "user",
        "content": ""
      }
    ],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

This shows the basic structure for Groq API calls. Note the different model name and API structure compared to Ollama.

8
Implement the Migration
Apply the planned changes with AI assistance

Now we'll implement the changes planned in our design documents. Use GitHub Copilot in 'agent mode' to assist with the actual code modifications.

1Database Migration

Replace ChromaDB with Upstash Vector

  • Install upstash-vector Python package
  • Replace ChromaDB client initialization
  • Update upsert process to use raw text
  • Modify query process for Upstash API
  • Remove manual embedding generation

2LLM Migration

Replace Ollama with Groq Cloud

  • Install groq Python package
  • Replace Ollama HTTP calls with Groq client
  • Update model name to llama-3.1-8b-instant
  • Add proper error handling for API calls
  • Implement rate limiting if needed

3Testing & Validation

Ensure everything works correctly

  • Test database connectivity
  • Verify embedding generation
  • Test query functionality
  • Validate LLM responses
  • Performance testing

Using GitHub Copilot Agent Mode

  • Open GitHub Copilot Chat in your IDE
  • Reference your design documents
  • Ask Copilot to help implement specific changes
  • Use '@workspace' to give Copilot context about your project
  • Request code reviews and suggestions for improvements
9
Understanding Cloud Pricing
Learn about the cost implications of cloud-hosted AI services

Moving from local to cloud services introduces costs. Understanding pricing helps you make informed decisions and optimize usage.

Vercel AI Gateway(Llama 3.1 8B)

Context:131K tokens
Input:$0.05/M
Output:$0.08/M

Vercel provides $5 credit when you sign up, giving you substantial usage for development and testing.

Anthropic Claude Sonnet 4

Context:200K tokens
Input:$3.00/M
Output:$15.00/M
Cache Read:$0.30/M

Premium model with state-of-the-art performance, especially strong in coding tasks with 72.7% on SWE-bench.

OpenAI GPT-5

Context:400K tokens
Input:$1.25/M
Output:$10.00/M
Cache Read:$0.13/M

OpenAI's flagship model excelling at complex reasoning and multi-step agentic tasks.

Cost Optimization Tips:

  • Start with free tiers and credits
  • Monitor usage through provider dashboards
  • Use caching to reduce repeated API calls
  • Optimize prompt length to reduce token usage
  • Consider model selection based on task complexity
  • Implement usage limits in your applications
10
Bonus: Vercel AI Gateway Integration
Advanced integration with Vercel's AI Gateway for production use

For production applications, Vercel AI Gateway provides additional benefits like caching, analytics, and unified API access to multiple AI providers.

Benefits:

  • Unified API for multiple AI providers
  • Built-in caching for cost optimization
  • Usage analytics and monitoring
  • Rate limiting and quota management
  • Seamless integration with Vercel deployments

Setup Steps:

  1. Enable AI Gateway in your Vercel project settings
  2. Configure your preferred AI providers
  3. Update your code to use Vercel AI SDK
  4. Implement caching strategies
  5. Set up monitoring and alerts

Vercel AI SDK Integration

import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'

const { text } = await generateText({
  model: openai('gpt-4o'),
  prompt: 'What is love?'
})

The Vercel AI SDK provides a standardized interface for multiple AI providers with built-in optimizations.

Congratulations! 🚀

You've successfully migrated your RAG system to a production-ready cloud infrastructure! Your system now features:

  • Serverless vector database with Upstash
  • Built-in embedding models (no external APIs)
  • Fast cloud-hosted LLM inference with Groq
  • Production-ready scalability
  • Cost-effective cloud deployment