Cloud-Ready RAG System
Migrate from Local to Production-Ready Cloud Infrastructure
Learn to deploy RAG systems using Upstash Vector Database and cloud-hosted LLMs. Replace ChromaDB with serverless vector storage and Ollama with Groq Cloud for scalable, production-ready applications.
Migration Overview
Transform your local RAG system into a cloud-ready solution
Advanced Workshop Steps
Follow these steps to migrate your RAG system to the cloud
This video explains how Upstash embeddings work, how the models we offer compare, and which model is best for your use case. Understanding this is crucial as we'll be using Upstash's built-in embedding model (mixedbread-ai/mxbai-embed-large-v1) which eliminates the need for external embedding services.
Upstash Embedding Models - Video Guide
Key Points:
- Built-in embedding models eliminate external dependencies
- mixedbread-ai/mxbai-embed-large-v1 with 1024 dimensions and 512 sequence length
- MTEB score of 64.68 for high-quality embeddings
- Automatic vectorization of text data
Vercel provides seamless integration with Upstash services. We'll use GitHub authentication for easy setup and access to storage integrations.
Setup Steps:
- Visit vercel.com and click 'Sign Up'
- Choose 'Continue with GitHub' for seamless integration
- Complete the account setup process
- Navigate to your Vercel dashboard
- Click on 'Storage' in the main navigation
Upstash Vector Database provides serverless vector storage with built-in embedding models. This eliminates the need for local ChromaDB and external embedding services.
1Access Upstash Vector
In your Vercel Storage dashboard, locate and click on 'Upstash'
→ Look for the Upstash logo and select 'Upstash Vector Database'
2Choose Free Tier
Select the free version to get started without any cost
→ Click on 'Free' plan option
3Configure Database
Set up your vector database with the optimal settings
- Database name: Choose any name (e.g., 'rag-food-advanced')
- Region: Select the closest region to your location
- Embedding Model: Select 'mixedbread-ai/mxbai-embed-large-v1'
- Similarity Function: Choose 'Cosine' for best semantic search results
4Create Database
Finalize the creation of your vector database
→ Click 'Create Database' and wait for provisioning
Key Benefits:
- Built-in embedding model (no external API needed)
- Serverless and scalable
- 1024-dimensional vectors with 512 sequence length
- Automatic text vectorization
- Cosine similarity for semantic search
After creating your Upstash Vector database, you'll receive three important credentials. These will replace your local ChromaDB configuration.
UPSTASH_VECTOR_REST_TOKEN
Authentication token for API access
Usage: Used for all database operations
UPSTASH_VECTOR_REST_URL
REST API endpoint for your database
Usage: Base URL for all API calls
UPSTASH_VECTOR_REST_READONLY_TOKEN
Read-only token for query operations
Usage: Optional: for read-only access patterns
.env File Example
UPSTASH_VECTOR_REST_TOKEN="*************************" UPSTASH_VECTOR_REST_URL="***********************" UPSTASH_VECTOR_REST_READONLY_TOKEN="**********************************"
Setup Instructions:
- Copy the credentials from your Upstash dashboard
- Create or update your .env file in the project root
- Paste the credentials exactly as provided
- Save the file and ensure it's in your .gitignore
Before making code changes, we'll create a comprehensive design document. This approach helps us understand the implications of switching from ChromaDB to Upstash Vector and ensures we don't miss important details.
Why Create a Design Document First?
- Understand the scope of changes required
- Identify potential issues before coding
- Plan the migration strategy systematically
- Document the differences between local and cloud solutions
- Create a reference for future development
GitHub Copilot Prompt for Design Document
Create a detailed design document to replace ChromaDB with Upstash Vector Database. I have added the Upstash Vector credentials in the .env file. Key Information about Upstash Vector: - Built-in embedding model: mixedbread-ai/mxbai-embed-large-v1 - 1024 dimensions, 512 sequence length, MTEB score 64.68 - Automatic text vectorization (no need for external embedding API) - Serverless and cloud-hosted - REST API based with authentication tokens - Cosine similarity for semantic search Current Implementation Details: - Using ChromaDB for local vector storage - Using Ollama's mxbai-embed-large for embeddings - Python-based RAG system with food data - Manual embedding generation and upsert process Requirements for Migration: 1. Replace ChromaDB client with Upstash Vector client 2. Remove manual embedding generation (Upstash handles this automatically) 3. Update upsert process to use raw text data instead of pre-computed embeddings 4. Modify query process to work with Upstash Vector API 5. Handle authentication and error management 6. Maintain the same RAG functionality and user experience Please provide: - Architecture comparison (before vs after) - Detailed implementation plan - Code structure changes required - API differences and implications - Error handling strategies - Performance considerations - Cost implications of cloud vs local - Security considerations for API keys
Usage: Copy this prompt into GitHub Copilot Chat or Claude Sonnet 3.5
Groq Cloud provides fast inference for large language models. We'll replace local Ollama with Groq's cloud-hosted llama-3.1-8b-instant model for better performance and reliability.
Setup Steps:
- Visit console.groq.com and create an account
- Navigate to the API Keys section
- Click 'Create API Key' and give it a descriptive name
- Copy the API key immediately (you won't see it again)
- Add the key to your .env file as GROQ_API_KEY
llama-3.1-8b-instant
Fast, efficient language model optimized for quick responses
- 8 billion parameters for good performance
- Instant inference with low latency
- Cost-effective for development and production
- Compatible with OpenAI API format
Similar to the database migration, we'll use AI to plan the transition from local Ollama to cloud-hosted Groq. This ensures we understand all the changes needed.
GitHub Copilot Prompt for LLM Migration
Create a detailed plan to migrate from local Ollama LLM to Groq Cloud API. I have added the GROQ_API_KEY to the .env file. Current Implementation: - Using Ollama locally with llama3.2 model - Direct HTTP requests to localhost:11434/api/generate - Streaming disabled for simplicity - Local inference with no API costs Target Implementation: - Groq Cloud API with llama-3.1-8b-instant model - HTTP requests to Groq's API endpoints - API key authentication required - Cloud-based inference with usage-based pricing Groq API Details: - Model: "llama-3.1-8b-instant" - Endpoint: Groq's REST API - Authentication: Bearer token with API key - Response format: Similar to OpenAI API - Rate limits and usage tracking Code Example Reference: ```python from groq import Groq client = Groq() completion = client.chat.completions.create( model="llama-3.1-8b-instant", messages=[{"role": "user", "content": ""}], temperature=1, max_completion_tokens=1024, top_p=1, stream=True, stop=None ) ``` Please provide: - Detailed migration steps - Code changes required - Error handling for API failures - Rate limiting considerations - Cost implications and usage monitoring - Fallback strategies - Testing approach - Performance comparison expectations
Usage: Use this prompt with GitHub Copilot or Claude to get detailed migration guidance
Groq API Integration Example
from groq import Groq client = Groq() completion = client.chat.completions.create( model="llama-3.1-8b-instant", messages=[ { "role": "user", "content": "" } ], temperature=1, max_completion_tokens=1024, top_p=1, stream=True, stop=None ) for chunk in completion: print(chunk.choices[0].delta.content or "", end="")
This shows the basic structure for Groq API calls. Note the different model name and API structure compared to Ollama.
Now we'll implement the changes planned in our design documents. Use GitHub Copilot in 'agent mode' to assist with the actual code modifications.
1Database Migration
Replace ChromaDB with Upstash Vector
- Install upstash-vector Python package
- Replace ChromaDB client initialization
- Update upsert process to use raw text
- Modify query process for Upstash API
- Remove manual embedding generation
2LLM Migration
Replace Ollama with Groq Cloud
- Install groq Python package
- Replace Ollama HTTP calls with Groq client
- Update model name to llama-3.1-8b-instant
- Add proper error handling for API calls
- Implement rate limiting if needed
3Testing & Validation
Ensure everything works correctly
- Test database connectivity
- Verify embedding generation
- Test query functionality
- Validate LLM responses
- Performance testing
Using GitHub Copilot Agent Mode
- Open GitHub Copilot Chat in your IDE
- Reference your design documents
- Ask Copilot to help implement specific changes
- Use '@workspace' to give Copilot context about your project
- Request code reviews and suggestions for improvements
Moving from local to cloud services introduces costs. Understanding pricing helps you make informed decisions and optimize usage.
Vercel AI Gateway(Llama 3.1 8B)
Vercel provides $5 credit when you sign up, giving you substantial usage for development and testing.
Anthropic Claude Sonnet 4
Premium model with state-of-the-art performance, especially strong in coding tasks with 72.7% on SWE-bench.
OpenAI GPT-5
OpenAI's flagship model excelling at complex reasoning and multi-step agentic tasks.
Cost Optimization Tips:
- Start with free tiers and credits
- Monitor usage through provider dashboards
- Use caching to reduce repeated API calls
- Optimize prompt length to reduce token usage
- Consider model selection based on task complexity
- Implement usage limits in your applications
For production applications, Vercel AI Gateway provides additional benefits like caching, analytics, and unified API access to multiple AI providers.
Benefits:
- Unified API for multiple AI providers
- Built-in caching for cost optimization
- Usage analytics and monitoring
- Rate limiting and quota management
- Seamless integration with Vercel deployments
Setup Steps:
- Enable AI Gateway in your Vercel project settings
- Configure your preferred AI providers
- Update your code to use Vercel AI SDK
- Implement caching strategies
- Set up monitoring and alerts
Vercel AI SDK Integration
import { generateText } from 'ai' import { openai } from '@ai-sdk/openai' const { text } = await generateText({ model: openai('gpt-4o'), prompt: 'What is love?' })
The Vercel AI SDK provides a standardized interface for multiple AI providers with built-in optimizations.
Congratulations! 🚀
You've successfully migrated your RAG system to a production-ready cloud infrastructure! Your system now features:
- Serverless vector database with Upstash
- Built-in embedding models (no external APIs)
- Fast cloud-hosted LLM inference with Groq
- Production-ready scalability
- Cost-effective cloud deployment