Advanced to Expert

Digital Twin Workshop - Advanced Voice & Omni-Channel

Advanced Voice AI with OpenAI Realtime API

Transform your deployed MCP server into an advanced multi-channel AI agent with voice interaction and telephony integration

View Simple Workshop
Prerequisites

Required:

  • Completed Digital Twin Workshop (Simple version)
  • MCP server deployed on Vercel from the simple workshop
  • OpenAI API key with Realtime API access
  • Git and GitHub account for forking repositories
  • Twilio account (optional for telephony features)
  • VAPI.ai account (optional alternative)
  • Advanced understanding of AI integration patterns

Helpful Resources:

Digital Twin Workshop (Simple)

Complete the basic digital twin workshop first

View
Developer Productivity Workshop

Learn AI-powered development workflows

View

Workshop Steps

Follow these steps to build your advanced voice AI

1
15 minutes

OpenAI API Key Setup & Realtime Access

Obtain your OpenAI API key and ensure you have access to the Realtime API beta for voice AI functionality

📚 Understanding This Step

Before building voice AI applications, you need an OpenAI API key with access to the Realtime API, which is currently in beta. The Realtime API enables low-latency voice-to-voice conversations and is essential for professional voice AI applications. This step ensures you have the necessary credentials and access levels.

Tasks to Complete

Sign up for OpenAI account or sign in to existing account
Navigate to the API Keys section in your OpenAI dashboard
Create a new API key specifically for your voice AI project
Check if you have Realtime API beta access in your account
Request Realtime API beta access if not already available
Verify your account has sufficient usage limits and billing setup

API Key Setup Checklist

text

Complete checklist for obtaining and configuring your OpenAI API key

# OpenAI API Key Setup Guide

## Step 1: Account Setup
1. Visit: https://platform.openai.com/login
2. Sign in with existing account OR create new account
3. Complete email verification if creating new account
4. Set up billing information (required for API usage)

## Step 2: API Key Creation
1. Navigate to: https://platform.openai.com/api-keys
2. Click "Create new secret key"
3. Name your key: "Voice AI Workshop" (or similar)
4. Set permissions: "All" (for development) or "Custom" with required scopes
5. Copy the API key IMMEDIATELY (you won't see it again)
6. Store securely - never share or commit to version control

## Step 3: Realtime API Beta Access
1. Check your account dashboard for beta program access
2. Visit: https://platform.openai.com/docs/guides/realtime
3. If you don't see Realtime API access:
   - Contact OpenAI support for beta access request
   - Join the waitlist if available
   - Check back periodically as access expands

## Step 4: Verify API Key Format
Your API key should look like:
✅ sk-proj-abcd1234efgh5678ijkl... (starts with 'sk-proj-' or 'sk-')
❌ Never share: sk-1234567890abcdef...

## Step 5: Test API Access (Optional)
You can test your API key with a simple curl command:

curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"

## Step 6: Usage Limits & Billing
1. Check your usage limits: https://platform.openai.com/usage
2. Set up usage alerts to monitor spending
3. Understand Realtime API pricing (typically higher than text APIs)
4. Consider starting with low usage limits for testing

## Important Security Notes:
- Store API keys in environment variables, never in code
- Use different API keys for development vs production
- Rotate keys regularly for security
- Monitor usage dashboard for unexpected activity

## Troubleshooting:
- If API key doesn't work: Regenerate and try again
- If Realtime API access denied: Contact OpenAI support
- If billing issues: Check payment method and account status

✅ You're ready when you have:
• Valid OpenAI API key (starts with sk-)
• Confirmed Realtime API beta access
• Billing configured and tested
• API key stored securely

Next: Environment setup and development tools
2
10 minutes

Environment Setup & Prerequisites Check

Verify your development environment is ready for voice AI development with all required tools and accounts

📚 Understanding This Step

Before diving into voice AI development, we need to ensure your development environment has all the necessary tools and access. This includes modern Node.js for the OpenAI Agents SDK, Git for repository management, and most importantly, access to OpenAI's Realtime API which is currently in beta.

Tasks to Complete

Verify Node.js 22+ is installed (required for OpenAI Agents SDK)
Check Git is available for repository operations
Confirm you have a GitHub account for forking repositories
Validate OpenAI API key has Realtime API beta access
Test internet connection for downloading dependencies

Environment Verification Commands

bash

Run these commands to verify your development environment is ready

# Check Node.js version (should be 18+ for OpenAI Agents SDK)
node --version

# Check pnpm is available
pnpm --version

# Verify Git installation
git --version

# Check if you're logged into GitHub CLI (optional but helpful)
gh auth status

# Create a test directory to verify write permissions
mkdir -p ~/test-voice-ai
cd ~/test-voice-ai
echo "Environment test successful" > test.txt
cat test.txt
cd ..
rm -rf ~/test-voice-ai

echo "✅ Environment verification complete!"
echo "If all commands succeeded, you're ready to proceed."
3
5 minutes

Fork OpenAI Realtime Agents Repository

Create your own copy of the OpenAI Realtime Agents repository that you can modify and customize

📚 Understanding This Step

Forking creates your own copy of the OpenAI repository under your GitHub account. This allows you to make changes without affecting the original repository and gives you full control over your voice AI implementation. The fork will serve as the foundation for your professional voice assistant.

Tasks to Complete

Navigate to the OpenAI Realtime Agents repository on GitHub
Click the 'Fork' button to create your own copy
Verify the fork was created successfully in your GitHub account
Note your fork's URL for the next step

Fork Verification Steps

text

Steps to verify your fork was created successfully

1. Navigate to: https://github.com/openai/openai-realtime-agents

2. Click the "Fork" button in the top right corner
   - GitHub will prompt you to select where to create the fork
   - Choose your personal account (not an organization)
   - Optionally customize the repository name (recommended: keep original name)

3. After forking, you should be redirected to:
   https://github.com/YOUR_USERNAME/openai-realtime-agents

4. Verify the fork by checking:
   ✅ The repository shows "forked from openai/openai-realtime-agents"
   ✅ You can see the code files (src/, public/, package.json, etc.)
   ✅ The repository is under your GitHub username

5. Copy your fork's clone URL for the next step:
   https://github.com/YOUR_USERNAME/openai-realtime-agents.git

Note: Replace YOUR_USERNAME with your actual GitHub username
4
10 minutes

Clone Repository and Setup Local Development

Download your forked repository to your local machine and set up the development environment

📚 Understanding This Step

Cloning downloads the repository code to your local machine where you can run and modify it. We'll also set up the upstream remote so you can pull updates from the original OpenAI repository, and install all the necessary dependencies for the voice AI application.

Tasks to Complete

Clone your forked repository to your local machine
Add the original OpenAI repository as upstream remote
Install Node.js dependencies using pnpm
Verify the project structure is correct
Check that all dependencies installed successfully

Repository Setup Commands

bash

Complete setup commands for local development environment

# Create a directory for your voice AI projects
mkdir -p ~/voice-ai-projects
cd ~/voice-ai-projects

# Clone your forked repository (replace YOUR_USERNAME with your GitHub username)
git clone https://github.com/YOUR_USERNAME/openai-realtime-agents.git
cd openai-realtime-agents

# Add the original OpenAI repository as upstream for future updates
git remote add upstream https://github.com/openai/openai-realtime-agents.git

# Verify remotes are configured correctly
git remote -v
# Should show:
# origin    https://github.com/YOUR_USERNAME/openai-realtime-agents.git (fetch)
# origin    https://github.com/YOUR_USERNAME/openai-realtime-agents.git (push) 
# upstream  https://github.com/openai/openai-realtime-agents.git (fetch)
# upstream  https://github.com/openai/openai-realtime-agents.git (push)

# Install all Node.js dependencies
pnpm install

# Verify installation was successful
ls -la node_modules/ | head -10
pnpm list --depth=0

echo "✅ Repository cloned and dependencies installed successfully!"
echo "Next: Configure environment variables"
5
10 minutes

Configure Environment Variables and API Access

Set up your OpenAI API key, MCP server connection, and configure the application for development

📚 Understanding This Step

The voice AI application needs your OpenAI API key to function, plus connection details to your existing MCP server from the simple workshop. We'll create an environment file to securely store your credentials and verify that your API key has access to the Realtime API beta. The MCP server connection enables context-aware conversations using your professional profile data.

Tasks to Complete

Create environment configuration file from template
Add your OpenAI API key to the environment file
Configure MCP server URL from your simple workshop deployment
Set up MCP API authentication if required
Verify your API key has Realtime API access
Test the configuration by running a basic check
Secure your environment file from version control

Environment Configuration Setup

bash

Configure your API credentials, MCP server connection, and verify access

# Ensure you're in the project directory
cd openai-realtime-agents

# Copy the sample environment file
cp .env.sample .env

# Display the template to see what needs to be configured
cat .env.sample

# Edit the .env file with your API key and MCP server details
# You can use nano, vim, or your preferred text editor
nano .env

# ===== SECURE SERVER-SIDE CONFIGURATION =====
# Following OpenAI Agents best practices for credential security

# Server-side only variables (NOT accessible to browser)
OPENAI_API_KEY=sk-your-actual-api-key-here
VERCEL_MCP_SERVER_URL=https://your-mcp-server.vercel.app
MCP_API_KEY=your-mcp-server-api-key-if-required

# Client-side Realtime API connection (minimal exposure)
NEXT_PUBLIC_OPENAI_API_KEY=sk-your-api-key-here

# ===== SECURITY-ENHANCED ARCHITECTURE =====

# OPENAI_API_KEY (Server-side only):
# - Used for server actions and API routes
# - Never exposed to browser/client-side code
# - Handles sensitive MCP server communication via delegation
# - Used for server-side agent reasoning and complex operations

# VERCEL_MCP_SERVER_URL (Server-side only):
# - MCP server calls handled via Next.js API routes
# - Prevents direct client-side exposure of internal URLs
# - Enables server-side data validation and filtering
# - Supports advanced authentication and authorization

# MCP_API_KEY (Server-side only - Optional):
# - Authentication handled entirely server-side
# - Never transmitted to browser environment
# - Used in server actions for secure MCP communication

# NEXT_PUBLIC_OPENAI_API_KEY (Client-side - Limited scope):
# - ONLY for direct Realtime API WebRTC/WebSocket connections
# - Same key as OPENAI_API_KEY but with controlled exposure
# - All sensitive operations delegated to server-side tools
# - Consider using session-based tokens in production

# ===== VERIFICATION STEPS =====

# Verify OpenAI API key is configured
echo "Checking OpenAI API configuration..."
if grep -q "OPENAI_API_KEY=sk-" .env; then
    echo "✅ OpenAI API key is configured"
else
    echo "❌ OpenAI API key not found. Please add OPENAI_API_KEY=sk-your-key-here"
fi

# Verify MCP server URL is configured
echo "Checking MCP server configuration..."
if grep -q "VERCEL_MCP_SERVER_URL=https://" .env; then
    echo "✅ MCP server URL is configured"
else
    echo "❌ MCP server URL not found. Please add VERCEL_MCP_SERVER_URL=https://your-server.vercel.app"
fi

# Test MCP server connectivity (optional)
echo "Testing MCP server connectivity..."
MCP_URL=$(grep "VERCEL_MCP_SERVER_URL" .env | cut -d '=' -f2)
if [ ! -z "$MCP_URL" ]; then
    curl -s -o /dev/null -w "%{http_code}" "$MCP_URL/health" | 
    awk '{if($1==200) print "✅ MCP server is accessible"; else print "⚠️  MCP server returned status:" $1}'
else
    echo "⚠️  MCP URL not configured for testing"
fi

# Verify .env is in .gitignore (should already be there)
if grep -q ".env" .gitignore; then
    echo "✅ .env file is protected from git commits"
else
    echo "⚠️  Consider adding .env to .gitignore for security"
fi

# ===== TROUBLESHOOTING =====
echo ""
echo "🔧 Troubleshooting Tips:"
echo ""
echo "If MCP server URL is unknown:"
echo "1. Check your Vercel dashboard for deployed projects"
echo "2. Look for the project from your simple digital twin workshop"
echo "3. Copy the deployment URL (e.g., https://my-digital-twin.vercel.app)"
echo ""
echo "If MCP server is not responding:"
echo "1. Verify the simple workshop MCP server is still deployed"
echo "2. Check Vercel function logs for any deployment issues"
echo "3. Redeploy the simple workshop if necessary"
echo ""
echo "✅ Environment configuration complete!"
echo "Important: Ensure your OpenAI API key has Realtime API beta access"
echo "Next: Run and test the voice AI demo application"
6
15 minutes

Run and Test the Voice AI Demo Application

Start the development server and test the voice AI functionality with different agent scenarios

📚 Understanding This Step

Now that everything is configured, we'll run the OpenAI Realtime Agents demo to see voice AI in action. This gives you hands-on experience with the Chat-Supervisor and Sequential Handoff patterns before customizing them for your professional use case. Testing different scenarios helps you understand the capabilities and limitations.

Tasks to Complete

Start the Next.js development server
Open the application in your web browser
Test the default Chat-Supervisor agent scenario
Try the Sequential Handoff pattern with different agents
Explore the conversation transcript and event logs
Test voice interactions with various conversation types

Development Server and Testing

bash

Commands to run and test the voice AI application

# Start the development server (ensure you're in the project directory)
cd openai-realtime-agents
pnpm run dev

# The server will start and show output like:
# ▲ Next.js 14.x.x
# - Local:        http://localhost:3000
# - Environments: .env

# Open your browser to the application
echo "🚀 Application starting at http://localhost:3000"
echo "Opening in your default browser..."

# On macOS:
open http://localhost:3000

# On Linux:
# xdg-open http://localhost:3000

# On Windows:
# start http://localhost:3000

echo "✅ Development server is running!"
echo ""
echo "Testing Checklist:"
echo "1. ✅ Application loads without errors"
echo "2. ✅ You can see the Realtime API Agents Demo interface" 
echo "3. ✅ Click microphone button to test voice input (browser will ask for permissions)"
echo "4. ✅ Try saying 'Hello, tell me about yourself' to test basic conversation"
echo "5. ✅ Use 'Scenario' dropdown to switch between different agent types"
echo "6. ✅ Test 'Customer Service Retail' for the complete flow example"
echo "7. ✅ Check conversation transcript on the left shows your interactions"
echo "8. ✅ Event log on the right shows technical details"
echo ""
echo "🎯 Goal: Familiarize yourself with voice agent patterns before customization"
7
20 minutes

Explore Agent Configurations and Architecture

Understand the codebase structure and examine how different voice agent patterns are implemented

📚 Understanding This Step

Before customizing the agents for your professional use case, it's important to understand how the existing patterns work. We'll explore the codebase structure, examine the Chat-Supervisor and Sequential Handoff implementations, and identify the key files you'll need to modify for your professional voice assistant.

Tasks to Complete

Explore the project directory structure and key files
Examine the Chat-Supervisor agent configuration
Study the Sequential Handoff pattern implementation
Understand how agents, tools, and conversation flows are defined
Identify configuration points for professional customization
Review the agent instruction patterns and tool integration

Codebase Exploration Guide

bash

Commands and paths to understand the voice agent architecture

# Explore the project structure
find . -type f -name '*.ts' -o -name '*.tsx' | grep -E '(agent|config)' | head -10

# Key directories to examine:
echo "📁 Key directories and files to explore:"
echo ""
echo "🔧 Agent Configurations:"
ls -la src/app/agentConfigs/
echo ""
echo "   📄 chatSupervisor/ - Chat-Supervisor pattern implementation"
echo "   📄 customerServiceRetail/ - Complete customer service flow"
echo "   📄 simpleExample.ts - Basic handoff example"
echo "   📄 index.ts - Agent configuration registry"
echo ""
echo "🎯 Chat-Supervisor Pattern:"
cat src/app/agentConfigs/chatSupervisor/index.ts | head -20
echo ""
echo "🔄 Sequential Handoff Pattern:"
cat src/app/agentConfigs/simpleExample.ts | head -15
echo ""
echo "⚙️  Main Application Logic:"
ls -la src/app/
echo "   📄 App.tsx - Main application component"
echo "   📄 layout.tsx - Application layout and setup"
echo "   📄 page.tsx - Landing page component"
echo ""
echo "🛠️  Key concepts to understand:"
echo "   • RealtimeAgent: High-level voice agent configuration"
echo "   • Agent instructions: How agents behave and respond"
echo "   • Tools: Functions agents can call for dynamic responses"
echo "   • Handoffs: How agents transfer users between specialists"
echo "   • Session management: Conversation state and history"
echo ""
echo "🎯 Next step: Examine specific agent configurations to understand patterns"
8
30 minutes

OpenAI Agents SDK & Realtime API Research

Study OpenAI's modern Agents SDK for voice integration and understand the Realtime API capabilities for professional voice AI

📚 Understanding This Step

OpenAI's new Agents SDK provides a higher-level abstraction for building voice agents compared to the raw Realtime API. This step focuses on understanding both approaches and choosing the right implementation path for professional voice AI integration.

Tasks to Complete

Study OpenAI Agents SDK documentation and voice agent capabilities
Review Realtime API transport mechanisms (WebRTC vs WebSocket)
Understand RealtimeAgent and RealtimeSession classes
Analyze conversation management and history handling
Research tool integration patterns for MCP server connection
Plan Next.js integration approach and project architecture

OpenAI Agents SDK Research Framework

javascript

Analysis template for modern voice AI implementation using OpenAI Agents SDK

// OpenAI Agents SDK Research & Analysis
// Copy this to: agents-sdk-research.js

/**
 * PHASE 1: Agents SDK Capabilities Assessment
 * Use this prompt with ChatGPT or Claude for detailed analysis
 */

const agentsSDKResearchPrompt = \`
Analyze OpenAI Agents SDK for professional voice AI integration:

## Modern Architecture Context:
- Building on OpenAI's new Agents SDK (@openai/agents)
- Using RealtimeAgent and RealtimeSession classes
- Next.js integration for professional interview preparation
- Target: Create PoC first, then integrate with existing MCP server

## SDK Analysis Requirements:

1. **Agents SDK Capabilities**
   - RealtimeAgent configuration and professional persona setup
   - RealtimeSession management and conversation handling
   - Built-in audio handling vs manual WebSocket management
   - Tool integration patterns for external API connections

2. **Transport Layer Options**
   - OpenAIRealtimeWebRTC (automatic audio handling)
   - OpenAIRealtimeWebSocket (manual audio management)
   - Browser compatibility and user experience implications
   - Professional use case suitability and audio quality

3. **Implementation Approach**
   - Next.js project setup and SDK integration
   - Environment variable configuration and API key management
   - Professional conversation flow design with agents
   - Voice activity detection and interruption handling

4. **MCP Integration Strategy**
   - Tool-based delegation to existing MCP server
   - Conversation history management and context passing
   - Real-time data retrieval during voice conversations
   - Error handling and fallback strategies

Provide implementation roadmap with Next.js PoC first, then MCP integration.
\`;

// Export research framework for implementation
module.exports = {
  agentsSDKResearchPrompt
};
9
25 minutes

Professional Voice Persona Design

Design and define your AI agent's professional voice personality, communication style, and conversation patterns

📚 Understanding This Step

Your voice AI needs a consistent, professional persona that represents you authentically in various business contexts. This step focuses on defining the tone, style, and conversation patterns that will make your AI agent effective in professional interactions.

Tasks to Complete

Define professional voice characteristics (tone, pace, vocabulary level)
Create conversation templates for different interview types
Design greeting scripts and introduction patterns
Plan response structures using STAR methodology for behavioral questions
Develop technical explanation templates with appropriate complexity levels
Create escalation and fallback conversation flows

Voice AI Integration Architecture

javascript

Complete implementation plan for integrating OpenAI Realtime API with your existing MCP server

// Voice AI Integration Planning Template
// Copy this to a new file: voice-ai-integration-plan.js

/**
 * PHASE 1: Research & Analysis Prompt for AI Assistant
 * Copy this prompt to ChatGPT, Claude, or GitHub Copilot
 */

const researchPrompt = `
Analyze and design a voice AI integration strategy for my professional digital twin:

## Current System Context:
- I have a deployed MCP server on Vercel (from the simple digital twin workshop)
- MCP server contains my professional profile data with RAG capabilities
- Need to add voice interaction capabilities for interview preparation

## Voice AI Requirements:
1. **OpenAI Realtime API Integration**
   - Real-time voice-to-voice conversation capability
   - Low latency for natural conversation flow
   - Professional voice persona development
   - Integration with existing MCP server data

2. **Professional Use Cases**
   - HR screening call simulations
   - Technical interview practice sessions
   - Career coaching conversations
   - Salary negotiation practice

3. **Technical Architecture**
   - WebRTC for real-time audio streaming
   - Connection to existing Vercel-deployed MCP server
   - Conversation state management
   - Context switching between topics

## Analysis Required:
- Technical feasibility and implementation complexity
- Cost analysis for OpenAI Realtime API usage
- Voice persona design for professional scenarios
- Integration patterns with existing MCP infrastructure
- Testing and quality assurance strategies

Provide a comprehensive technical design document with implementation roadmap.
`;

/**
 * PHASE 2: Architecture Design Template
 */

const voiceArchitecture = {
  // Voice AI System Components
  components: {
    realtimeAPI: {
      provider: 'OpenAI Realtime API',
      models: ['gpt-4o-realtime-preview'],
      features: ['voice-to-voice', 'low-latency', 'streaming-audio']
    },
    
    audioProcessing: {
      input: 'WebRTC microphone capture',
      output: 'Real-time audio playback', 
      format: 'PCM 24kHz',
      protocols: ['WebSocket', 'WebRTC']
    },
    
    mcpIntegration: {
      // VERCEL_MCP_SERVER_URL: Your deployed MCP server from simple workshop
      // This enables context-aware voice conversations using your professional data
      endpoint: process.env.VERCEL_MCP_SERVER_URL,
      dataSource: 'Existing professional profile RAG system',
      contextRetrieval: 'Semantic search with conversation history'
    }
  },
  
  // Professional Voice Persona Configuration
  voicePersona: {
    tone: 'Professional, confident, approachable',
    style: 'Conversational but authoritative about experience',
    pace: 'Measured, clear articulation for interview contexts',
    vocabulary: 'Technical accuracy with accessible explanations'
  },
  
  // Conversation Flow Management
  conversationFlows: {
    hrScreening: {
      greeting: 'Professional introduction with elevator pitch',
      topics: ['experience overview', 'salary expectations', 'location preferences'],
      responses: 'Concise, metric-driven answers'
    },
    
    technicalInterview: {
      greeting: 'Technical competency confirmation',
      topics: ['project deep-dives', 'problem-solving approach', 'system design'],
      responses: 'Detailed examples with STAR methodology'
    }
  }
};

/**
 * PHASE 3: Implementation Steps
 */

const implementationPlan = [
  {
    phase: 'Setup & Configuration',
    duration: '30 minutes',
    tasks: [
      'Obtain OpenAI Realtime API access and configure credentials',
      'Set up WebRTC audio capture/playback infrastructure',
      'Create voice AI service connection to existing MCP server',
      'Configure Vercel environment variables for voice integration'
    ]
  },
  
  {
    phase: 'Voice Persona Development', 
    duration: '30 minutes',
    tasks: [
      'Define professional voice characteristics and communication style',
      'Create conversation templates for different interview scenarios',
      'Implement context-aware response generation',
      'Test voice clarity and professional presentation'
    ]
  },
  
  {
    phase: 'Integration & Testing',
    duration: '30 minutes', 
    tasks: [
      'Connect voice AI to MCP server RAG capabilities',
      'Implement conversation memory and context management',
      'Test with realistic interview scenarios',
      'Optimize response quality and conversation flow'
    ]
  }
];

// Export configuration for implementation
module.exports = {
  researchPrompt,
  voiceArchitecture,
  implementationPlan
};
10
20 minutes

WebRTC Infrastructure Setup

Configure browser-based real-time audio capture and playback infrastructure for voice AI integration

📚 Understanding This Step

WebRTC (Web Real-Time Communication) is the foundation for browser-based voice AI. This step sets up the audio infrastructure needed for OpenAI Realtime API integration, including microphone access, audio processing, and real-time streaming capabilities.

Tasks to Complete

Set up browser microphone permissions and audio capture
Configure WebRTC audio processing for optimal voice quality
Implement audio playback systems for AI-generated speech
Test audio input/output quality and latency
Create fallback handling for unsupported browsers
Set up audio visualization for user feedback

WebRTC Audio Infrastructure Setup

javascript

Complete WebRTC implementation for voice AI integration with OpenAI Realtime API

// WebRTC Audio Infrastructure Implementation
// Copy this to: webrtc-voice-setup.js

/**
 * PHASE 1: Audio Infrastructure Planning Prompt
 * Use this prompt with your AI assistant for implementation guidance
 */

const webrtcImplementationPrompt = `
Implement WebRTC audio infrastructure for OpenAI Realtime API integration:

## Project Context:
- Building voice AI integration for professional digital twin
- Need real-time audio capture and playback in web browser
- Target: Low latency voice-to-voice conversation
- Integration: OpenAI Realtime API with existing MCP server

## Technical Requirements:

1. **Audio Capture Setup**
   - High-quality microphone access and permissions
   - Noise suppression and echo cancellation
   - Audio format optimization for OpenAI API (24kHz PCM)
   - Real-time audio streaming capabilities

2. **Audio Playback System**
   - Low-latency audio output for AI responses
   - Queue management for streaming audio chunks
   - Volume control and audio visualization
   - Browser compatibility across major browsers

3. **WebRTC Integration Patterns**
   - WebSocket connection for real-time communication
   - Audio encoding/decoding for API compatibility
   - Error handling and connection recovery
   - Performance optimization for conversation flow

4. **User Experience Features**
   - Visual feedback for audio input levels
   - Connection status indicators
   - Graceful fallback for unsupported browsers
   - Professional UI for business use cases

Provide complete implementation with modern JavaScript/TypeScript, error handling, and production-ready code patterns.
`;

/**
 * PHASE 2: WebRTC Audio Implementation
 */

class VoiceAIAudioManager {
  constructor(config = {}) {
    this.config = {
      sampleRate: 24000,
      bufferSize: 4096,
      channels: 1,
      echoCancellation: true,
      noiseSuppression: true,
      autoGainControl: true,
      ...config
    };
    
    this.audioContext = null;
    this.mediaStream = null;
    this.audioProcessor = null;
    this.isRecording = false;
    this.audioChunks = [];
  }
  
  /**
   * Initialize audio infrastructure with user permissions
   */
  async initialize() {
    try {
      // Request microphone permissions
      console.log('Requesting microphone access...');
      
      this.mediaStream = await navigator.mediaDevices.getUserMedia({
        audio: {
          sampleRate: this.config.sampleRate,
          channelCount: this.config.channels,
          echoCancellation: this.config.echoCancellation,
          noiseSuppression: this.config.noiseSuppression,
          autoGainControl: this.config.autoGainControl
        }
      });
      
      // Create audio context for processing
      this.audioContext = new (window.AudioContext || window.webkitAudioContext)({
        sampleRate: this.config.sampleRate
      });
      
      // Set up audio processing pipeline
      await this.setupAudioProcessing();
      
      console.log('WebRTC audio infrastructure initialized successfully');
      return true;
      
    } catch (error) {
      console.error('Failed to initialize audio:', error);
      this.handleAudioError(error);
      return false;
    }
  }
  
  /**
   * Set up real-time audio processing pipeline
   */
  async setupAudioProcessing() {
    const source = this.audioContext.createMediaStreamSource(this.mediaStream);
    
    // Create audio processor for real-time streaming
    this.audioProcessor = this.audioContext.createScriptProcessor(
      this.config.bufferSize,
      this.config.channels,
      this.config.channels
    );
    
    // Process audio data for OpenAI Realtime API
    this.audioProcessor.onaudioprocess = (event) => {
      if (!this.isRecording) return;
      
      const inputBuffer = event.inputBuffer.getChannelData(0);
      
      // Convert to 16-bit PCM for API compatibility
      const pcmData = this.convertToPCM16(inputBuffer);
      
      // Send to OpenAI Realtime API (implement in next step)
      this.onAudioData?.(pcmData);
    };
    
    // Connect audio processing pipeline
    source.connect(this.audioProcessor);
    this.audioProcessor.connect(this.audioContext.destination);
  }
  
  /**
   * Convert Float32 audio data to 16-bit PCM
   */
  convertToPCM16(float32Array) {
    const buffer = new ArrayBuffer(float32Array.length * 2);
    const view = new DataView(buffer);
    
    for (let i = 0; i < float32Array.length; i++) {
      const sample = Math.max(-1, Math.min(1, float32Array[i]));
      view.setInt16(i * 2, sample < 0 ? sample * 0x8000 : sample * 0x7FFF, true);
    }
    
    return buffer;
  }
  
  /**
   * Start audio recording and streaming
   */
  startRecording(onAudioData) {
    if (!this.audioContext || !this.mediaStream) {
      throw new Error('Audio infrastructure not initialized');
    }
    
    this.onAudioData = onAudioData;
    this.isRecording = true;
    
    if (this.audioContext.state === 'suspended') {
      this.audioContext.resume();
    }
    
    console.log('Started audio recording for voice AI');
  }
  
  /**
   * Stop audio recording
   */
  stopRecording() {
    this.isRecording = false;
    this.onAudioData = null;
    console.log('Stopped audio recording');
  }
  
  /**
   * Play audio response from OpenAI
   */
  async playAudioResponse(audioData) {
    try {
      const audioBuffer = await this.audioContext.decodeAudioData(audioData);
      const source = this.audioContext.createBufferSource();
      
      source.buffer = audioBuffer;
      source.connect(this.audioContext.destination);
      source.start();
      
      return new Promise(resolve => {
        source.onended = resolve;
      });
      
    } catch (error) {
      console.error('Failed to play audio response:', error);
    }
  }
  
  /**
   * Handle audio errors and provide user feedback
   */
  handleAudioError(error) {
    if (error.name === 'NotAllowedError') {
      console.error('Microphone access denied. Please grant permissions.');
    } else if (error.name === 'NotFoundError') {
      console.error('No microphone found. Please connect audio input device.');
    } else {
      console.error('Audio setup failed:', error.message);
    }
  }
  
  /**
   * Clean up audio resources
   */
  cleanup() {
    this.stopRecording();
    
    if (this.mediaStream) {
      this.mediaStream.getTracks().forEach(track => track.stop());
    }
    
    if (this.audioContext) {
      this.audioContext.close();
    }
    
    console.log('Audio infrastructure cleaned up');
  }
}

/**
 * PHASE 3: Usage Example and Testing
 */

// Initialize and test WebRTC audio infrastructure
async function initializeVoiceAI() {
  const audioManager = new VoiceAIAudioManager();
  
  // Initialize audio infrastructure
  const initialized = await audioManager.initialize();
  
  if (!initialized) {
    console.error('Failed to initialize voice AI audio infrastructure');
    return;
  }
  
  // Start recording with audio data callback
  audioManager.startRecording((audioData) => {
    console.log('Received audio data:', audioData.byteLength, 'bytes');
    
    // TODO: Send to OpenAI Realtime API (implement in Step 4)
    // sendToOpenAI(audioData);
  });
  
  // Test audio playback (simulate AI response)
  setTimeout(() => {
    console.log('Voice AI infrastructure test completed');
    audioManager.cleanup();
  }, 5000);
}

// Export for use in voice AI integration
module.exports = {
  webrtcImplementationPrompt,
  VoiceAIAudioManager,
  initializeVoiceAI
};
11
25 minutes

OpenAI Realtime API Integration

Implement WebSocket connection to OpenAI Realtime API and establish voice-to-voice communication pipeline

📚 Understanding This Step

This step creates the actual connection between your WebRTC audio infrastructure and OpenAI's Realtime API, enabling true voice-to-voice AI conversation. You'll implement the WebSocket communication protocol and handle real-time audio streaming.

Tasks to Complete

Configure OpenAI Realtime API credentials and connection settings
Implement WebSocket connection with proper authentication
Set up bidirectional audio streaming between browser and API
Configure AI model parameters for professional conversations
Implement connection error handling and reconnection logic
Test basic voice-to-voice communication flow

Next.js Voice AI PoC Implementation

bash

Complete Next.js project setup with OpenAI Agents SDK for professional voice AI

# Next.js Voice AI PoC Setup Guide
# Copy these commands to create your voice AI project

# Step 1: Create Next.js 15 Project
npx create-next-app@latest voice-ai-poc \
  --typescript \
  --tailwind \
  --eslint \
  --app \
  --src-dir \
  --import-alias "@/*"

cd voice-ai-poc

# Step 2: Install OpenAI Agents SDK and Dependencies
pnpm install @openai/agents zod@3
pnpm install @types/node

# Step 3: Set up Environment Variables (Server-Side Security Pattern)
echo "OPENAI_API_KEY=sk-your-api-key-here" >> .env.local
echo "VERCEL_MCP_SERVER_URL=https://your-mcp-server.vercel.app" >> .env.local
echo "NEXT_PUBLIC_OPENAI_API_KEY=sk-your-api-key-here" >> .env.local

# Step 4: Create Secure Project Structure
mkdir -p src/lib/agents
mkdir -p src/components/voice
mkdir -p src/hooks
mkdir -p src/app/api/voice
mkdir -p src/app/api/professional
mkdir -p src/lib/server

# Step 5: Create Server-Side Professional Data Handler
cat > src/lib/server/professional-data.ts << 'EOF'
// Server-side only - credentials never exposed to client
import 'server-only';
import { z } from 'zod';

// MCP Server integration (server-side only)
export async function fetchProfessionalData(topic: string) {
  const mcpUrl = process.env.VERCEL_MCP_SERVER_URL;
  const mcpApiKey = process.env.MCP_API_KEY;
  
  if (!mcpUrl) {
    throw new Error('MCP server URL not configured');
  }

  try {
    const headers: Record<string, string> = {
      'Content-Type': 'application/json',
    };
    
    // Add authentication if MCP server requires it
    if (mcpApiKey) {
      headers['Authorization'] = `Bearer ${mcpApiKey}`;
    }

    const response = await fetch(`${mcpUrl}/api/professional`, {
      method: 'POST',
      headers,
      body: JSON.stringify({ query: topic }),
    });

    if (!response.ok) {
      throw new Error(`MCP server error: ${response.status}`);
    }

    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error fetching professional data:', error);
    // Fallback to mock data in case of MCP server issues
    return getMockProfessionalData(topic);
  }
}

// Fallback mock data
function getMockProfessionalData(topic: string) {
  const mockData: Record<string, string> = {
    experience: 'Senior Software Engineer with 5+ years building scalable web applications',
    skills: 'TypeScript, React, Node.js, Python, AWS, Docker',
    achievements: 'Led team of 4 developers, increased system performance by 40%',
    projects: 'E-commerce platform serving 100K+ users, Real-time analytics dashboard',
    goals: 'Seeking senior technical leadership role in innovative company',
    education: 'Computer Science degree with focus on distributed systems'
  };
  
  return {
    topic,
    data: mockData[topic] || `Professional information about ${topic} available upon request`,
    source: 'mock_fallback'
  };
}
EOF

# Step 6: Create Server Action for Professional Data
cat > src/app/api/professional/route.ts << 'EOF'
import { NextRequest, NextResponse } from 'next/server';
import { fetchProfessionalData } from '@/lib/server/professional-data';
import { z } from 'zod';

// Input validation schema
const RequestSchema = z.object({
  topic: z.string().min(1).max(100),
  context: z.string().optional(),
});

export async function POST(request: NextRequest) {
  try {
    const body = await request.json();
    const { topic, context } = RequestSchema.parse(body);
    
    // Fetch data from MCP server (server-side only)
    const professionalData = await fetchProfessionalData(topic);
    
    // Return processed data to client
    return NextResponse.json({
      success: true,
      data: professionalData,
      timestamp: new Date().toISOString()
    });
    
  } catch (error) {
    console.error('Professional data API error:', error);
    return NextResponse.json(
      { success: false, error: 'Failed to fetch professional data' },
      { status: 500 }
    );
  }
}
EOF

# Step 7: Create Voice Agent with Server-Side Delegation
cat > src/lib/agents/professional-agent.ts << 'EOF'
import { RealtimeAgent, tool, RealtimeContextData } from '@openai/agents/realtime';
import { z } from 'zod';

// Professional Voice Agent with Server-Side Security
export const createProfessionalAgent = () => {
  // Server-side delegation tool (following OpenAI Agents best practices)
  const getProfessionalInfo = tool<
    typeof professionalInfoParameters,
    RealtimeContextData
  >({
    name: 'get_professional_info',
    description: 'Retrieve professional background information via secure server-side call',
    parameters: z.object({
      topic: z.string().describe('What professional information to retrieve (experience, skills, achievements, etc.)'),
      context: z.string().optional().describe('Additional context for the query')
    }),
    async execute({ topic, context }, details) {
      try {
        // Delegate to server-side API route (credentials stay server-side)
        const response = await fetch('/api/professional', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({ 
            topic, 
            context,
            // Include conversation history for context-aware responses
            history: details?.context?.history?.slice(-5) // Last 5 messages for context
          }),
        });

        if (!response.ok) {
          throw new Error(`Server error: ${response.status}`);
        }

        const result = await response.json();
        
        if (result.success) {
          return `Based on professional background: ${result.data.data}`;
        } else {
          return 'I\'m having trouble accessing that information right now. Let me share what I know from memory.';
        }
      } catch (error) {
        console.error('Professional info tool error:', error);
        // Graceful fallback
        return 'I\'m experiencing a connection issue. Let me continue with what I can share from my general knowledge.';
      }
    }
  });

  const professionalInfoParameters = z.object({
    topic: z.string().describe('What professional information to retrieve'),
    context: z.string().optional().describe('Additional context for the query')
  });

  return new RealtimeAgent({
    name: 'Professional AI Assistant',
    instructions: `You are a professional AI assistant representing a skilled software engineer in voice conversations.

🎯 CORE IDENTITY:
- Role: Senior Software Engineer with leadership experience
- Communication Style: Confident, articulate, and metrics-driven
- Personality: Professional but personable, technically precise but accessible

🗣️ VOICE CHARACTERISTICS:
- Tone: Conversational but authoritative, appropriate for business contexts
- Pace: Measured and clear, allowing for technical concepts to be understood
- Style: Use specific examples, metrics, and concrete achievements
- Energy: Engaged and enthusiastic about technical challenges

📋 CONVERSATION GUIDELINES:
- ALWAYS use the get_professional_info tool for specific background questions
- Respond as if you are the professional during interviews or networking
- Keep responses conversational but substantive (30-90 seconds typically)
- Ask follow-up questions to understand the interviewer's specific interests
- Maintain professional boundaries while being personable
- Quantify achievements with specific metrics when possible

🎙️ VOICE AI OPTIMIZATIONS:
- Speak in natural, conversational flow with appropriate pauses
- Use vocal emphasis for key points and achievements
- Vary intonation to maintain engagement
- Signal transitions clearly ("Let me tell you about...", "What's interesting is...")
- End responses with engagement hooks or questions when appropriate

🔧 PROFESSIONAL TOPICS TO LEVERAGE:
- Technical expertise and problem-solving approaches
- Leadership and team management experiences  
- Specific project outcomes and business impact
- Learning and growth mindset examples
- Industry insights and technical trends

Remember: You're not just answering questions - you're having a professional conversation that showcases expertise while building rapport.`,
    tools: [getProfessionalInfo],
    
    // Voice-specific optimizations
    voice: 'alloy', // Professional, clear voice
  });
};
EOF

# Step 8: Create Voice Component with Enhanced Security
cat > src/components/voice/VoiceChat.tsx << 'EOF'
'use client';

import { useState, useEffect } from 'react';
import { RealtimeSession } from '@openai/agents/realtime';
import { createProfessionalAgent } from '@/lib/agents/professional-agent';

export function VoiceChat() {
  const [isConnected, setIsConnected] = useState(false);
  const [isListening, setIsListening] = useState(false);
  const [status, setStatus] = useState('Disconnected');
  const [session, setSession] = useState<RealtimeSession | null>(null);

  useEffect(() => {
    const initializeVoiceAgent = async () => {
      try {
        const agent = createProfessionalAgent();
        const realtimeSession = new RealtimeSession(agent, {
          model: 'gpt-4o-realtime-preview',
          config: {
            inputAudioFormat: 'pcm16',
            outputAudioFormat: 'pcm16',
            inputAudioTranscription: {
              model: 'whisper-1'
            },
            turnDetection: {
              type: 'server_vad',
              threshold: 0.5,
              prefix_padding_ms: 300,
              silence_duration_ms: 200
            }
          }
        });

        // Set up event listeners
        realtimeSession.on('connected', () => {
          setIsConnected(true);
          setStatus('Connected - Ready to talk');
        });

        realtimeSession.on('disconnected', () => {
          setIsConnected(false);
          setStatus('Disconnected');
        });

        realtimeSession.on('error', (error) => {
          console.error('Voice AI error:', error);
          setStatus(`Error: ${error.message}`);
        });

        setSession(realtimeSession);
      } catch (error) {
        console.error('Failed to initialize voice agent:', error);
        setStatus('Initialization failed');
      }
    };

    initializeVoiceAgent();
  }, []);

  const connectToVoiceAI = async () => {
    if (!session) return;
    
    try {
      await session.connect({
        apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY
      });
    } catch (error) {
      console.error('Connection failed:', error);
      setStatus('Connection failed');
    }
  };

  const startListening = () => {
    if (session && isConnected) {
      setIsListening(true);
      setStatus('Listening... Speak now');
      // Note: Actual audio handling depends on transport layer
    }
  };

  const stopListening = () => {
    setIsListening(false);
    setStatus('Connected - Ready to talk');
  };

  return (
    <div className="max-w-md mx-auto p-6 bg-white rounded-lg shadow-lg">
      <h2 className="text-2xl font-bold mb-4 text-center">
        Professional Voice AI
      </h2>
      
      <div className="mb-4">
        <div className="text-sm text-gray-600 mb-2">Status:</div>
        <div className={`p-2 rounded text-center ${
          isConnected ? 'bg-green-100 text-green-800' : 'bg-gray-100 text-gray-800'
        }`}>
          {status}
        </div>
      </div>

      <div className="space-y-3">
        {!isConnected ? (
          <button
            onClick={connectToVoiceAI}
            className="w-full py-2 px-4 bg-blue-600 text-white rounded-lg hover:bg-blue-700"
          >
            Connect to Voice AI
          </button>
        ) : (
          <div className="space-y-2">
            <button
              onClick={isListening ? stopListening : startListening}
              className={`w-full py-2 px-4 rounded-lg ${
                isListening 
                  ? 'bg-red-600 hover:bg-red-700 text-white'
                  : 'bg-green-600 hover:bg-green-700 text-white'
              }`}
            >
              {isListening ? 'Stop Listening' : 'Start Conversation'}
            </button>
          </div>
        )}
      </div>

      <div className="mt-4 text-xs text-gray-500 text-center">
        PoC Implementation - Basic voice agent setup
      </div>
    </div>
  );
}
EOF

# Step 9: Create Main Page
cat > src/app/page.tsx << 'EOF'
import { VoiceChat } from '@/components/voice/VoiceChat';

export default function Home() {
  return (
    <main className="min-h-screen bg-gray-50 flex items-center justify-center p-4">
      <div className="max-w-4xl mx-auto">
        <div className="text-center mb-8">
          <h1 className="text-4xl font-bold text-gray-900 mb-2">
            Professional Voice AI PoC
          </h1>
          <p className="text-lg text-gray-600">
            OpenAI Agents SDK integration for interview preparation
          </p>
        </div>
        
        <VoiceChat />
        
        <div className="mt-8 text-center text-sm text-gray-500">
          <p>Next steps: Integrate with MCP server for dynamic professional data</p>
        </div>
      </div>
    </main>
  );
}
EOF

# Step 10: Update Package.json Scripts
pnpm pkg set scripts.dev="next dev"
pnpm pkg set scripts.build="next build"
pnpm pkg set scripts.start="next start"
pnpm pkg set scripts.lint="next lint"

echo "✅ Secure Next.js Voice AI PoC project created successfully!"
echo ""
echo "🔐 Security Features Implemented:"
echo "- Server-side credential handling (credentials never exposed to client)"
echo "- MCP server integration via secure API routes"
echo "- Server-side data validation and filtering"
echo "- Graceful fallbacks for service unavailability"
echo ""
echo "Next steps:"
echo "1. Add your API keys to .env.local (server-side only)"
echo "2. Update VERCEL_MCP_SERVER_URL with your deployed MCP server"
echo "3. Run 'pnpm run dev' to start development server"
echo "4. Test secure voice agent with server-side data delegation"
echo "5. Deploy to production with environment variables configured"
12
25 minutes

Professional Voice Persona Implementation

Configure your voice agent with professional instructions, conversation patterns, and tools for accessing professional information

📚 Understanding This Step

Now that you have the basic PoC working, it's time to enhance it with a professional persona that can handle different types of business conversations effectively. This step focuses on refining the agent's instructions and adding tools for professional scenarios.

Tasks to Complete

Enhance voice agent instructions for professional contexts
Implement conversation patterns for different interview types
Add tools for retrieving specific professional information
Configure voice settings for professional communication
Test conversation flows with realistic scenarios
Add guardrails for professional boundaries

Enhanced Professional Voice Agent

typescript

Advanced voice agent configuration with professional tools and conversation patterns

// Enhanced Professional Voice Agent Implementation
// Update your src/lib/agents/professional-agent.ts with this code

import { RealtimeAgent, tool, RealtimeOutputGuardrail } from '@openai/agents/realtime';
import { z } from 'zod';

// Professional information tools
const getProfessionalExperience = tool({
  name: 'get_professional_experience',
  description: 'Retrieve detailed work experience and achievements',
  parameters: z.object({
    role: z.string().optional().describe('Specific role or company to focus on'),
    detail_level: z.enum(['summary', 'detailed']).default('summary')
  }),
  async execute({ role, detail_level }) {
    // Mock professional experience data - replace with MCP server calls later
    const experiences = {
      current: {
        title: 'Senior Software Engineer',
        company: 'TechCorp Inc.',
        duration: '2022-Present',
        achievements: [
          'Led development of microservices architecture serving 1M+ users',
          'Reduced system latency by 40% through optimization initiatives', 
          'Mentored 4 junior developers and established code review processes'
        ],
        technologies: ['TypeScript', 'React', 'Node.js', 'AWS', 'Docker']
      },
      previous: {
        title: 'Full Stack Developer',
        company: 'StartupXYZ',
        duration: '2020-2022',
        achievements: [
          'Built MVP that attracted $2M in Series A funding',
          'Implemented CI/CD pipeline reducing deployment time by 60%',
          'Developed real-time features using WebSocket technology'
        ]
      }
    };
    
    if (role && role.toLowerCase().includes('current')) {
      return detail_level === 'detailed' ? 
        JSON.stringify(experiences.current, null, 2) :
        `Currently ${experiences.current.title} at ${experiences.current.company}, ${experiences.current.duration}. Key achievements include ${experiences.current.achievements[0]}.`;
    }
    
    return detail_level === 'detailed' ?
      JSON.stringify(experiences, null, 2) :
      `Senior Software Engineer with 5+ years experience. Led teams, built scalable systems, and delivered measurable business impact.`;
  }
});

const getTechnicalSkills = tool({
  name: 'get_technical_skills',
  description: 'Retrieve technical skills and expertise areas',
  parameters: z.object({
    category: z.enum(['languages', 'frameworks', 'cloud', 'tools', 'all']).default('all')
  }),
  async execute({ category }) {
    const skills = {
      languages: ['TypeScript', 'JavaScript', 'Python', 'Java', 'Go'],
      frameworks: ['React', 'Next.js', 'Node.js', 'Express', 'FastAPI'],
      cloud: ['AWS', 'Docker', 'Kubernetes', 'Terraform'],
      tools: ['Git', 'Jest', 'Webpack', 'VS Code', 'Postman']
    };
    
    if (category === 'all') {
      return `Full-stack expertise: ${skills.languages.slice(0,3).join(', ')} for development; ${skills.frameworks.slice(0,3).join(', ')} for frameworks; ${skills.cloud.slice(0,3).join(', ')} for cloud infrastructure.`;
    }
    
    return skills[category].join(', ');
  }
});

const getCareerGoals = tool({
  name: 'get_career_goals',
  description: 'Retrieve career objectives and preferences',
  parameters: z.object({
    aspect: z.enum(['role', 'company', 'compensation', 'location']).optional()
  }),
  async execute({ aspect }) {
    const goals = {
      role: 'Seeking Senior/Staff Engineer or Technical Lead roles with architecture responsibilities',
      company: 'Interested in innovative companies solving complex problems with strong engineering culture',
      compensation: 'Looking for competitive package in $120-180K range plus equity',
      location: 'Open to remote or hybrid work, willing to relocate for right opportunity'
    };
    
    return aspect ? goals[aspect] : 'Seeking senior technical leadership role at innovative company with growth opportunities and strong team culture.';
  }
});

// Professional guardrails
const professionalGuardrails: RealtimeOutputGuardrail[] = [
  {
    name: 'No personal details',
    async execute({ agentOutput }) {
      const personalKeywords = ['ssn', 'social security', 'password', 'private'];
      const hasPersonalInfo = personalKeywords.some(keyword => 
        agentOutput.toLowerCase().includes(keyword)
      );
      return {
        tripwireTriggered: hasPersonalInfo,
        outputInfo: { hasPersonalInfo }
      };
    }
  },
  {
    name: 'Maintain professional tone',
    async execute({ agentOutput }) {
      const unprofessionalWords = ['hate', 'sucks', 'stupid', 'dumb'];
      const isUnprofessional = unprofessionalWords.some(word => 
        agentOutput.toLowerCase().includes(word)
      );
      return {
        tripwireTriggered: isUnprofessional,
        outputInfo: { isUnprofessional }
      };
    }
  }
];

export const createEnhancedProfessionalAgent = () => {
  return new RealtimeAgent({
    name: 'Professional AI Assistant',
    instructions: `You are a professional AI assistant representing a skilled software engineer in voice conversations.
    
## Professional Identity:
- Senior Software Engineer with 5+ years of experience
- Full-stack developer with leadership experience
- Passionate about building scalable, maintainable systems
- Strong mentor and collaborator

## Communication Style:
- **Tone**: Confident but humble, conversational yet professional
- **Pace**: Measured and clear, allowing time for complex topics
- **Detail Level**: Provide specific examples and metrics when discussing achievements
- **Personality**: Enthusiastic about technology, thoughtful about career decisions

## Conversation Handling:

### For Experience Questions:
- Use the get_professional_experience tool to provide specific details
- Follow STAR method (Situation, Task, Action, Result) for behavioral questions
- Include measurable outcomes (percentages, numbers, timelines)

### For Technical Questions:
- Use get_technical_skills tool for accurate skill information
- Explain technical concepts clearly for non-technical interviewers
- Show depth of knowledge while remaining accessible

### For Career Goals:
- Use get_career_goals tool for consistent messaging
- Show alignment between past experience and future aspirations
- Demonstrate thoughtful career planning

### Professional Boundaries:
- Focus on professional achievements and goals
- Maintain appropriate level of personal disclosure
- Redirect overly personal questions to professional context

## Sample Response Patterns:
- "That's a great question. In my current role at [company]..."
- "I'm particularly proud of a project where..."
- "What I found most interesting about that challenge was..."
- "Looking ahead, I'm excited about opportunities to..."`,
    tools: [getProfessionalExperience, getTechnicalSkills, getCareerGoals]
  });
};

// Update your main component to use the enhanced agent
export const createProfessionalRealtimeSession = () => {
  const agent = createEnhancedProfessionalAgent();
  
  return {
    agent,
    sessionConfig: {
      model: 'gpt-4o-realtime-preview',
      config: {
        voice: 'alloy', // Professional, clear voice
        inputAudioFormat: 'pcm16',
        outputAudioFormat: 'pcm16',
        inputAudioTranscription: {
          model: 'whisper-1'
        },
        turnDetection: {
          type: 'server_vad',
          threshold: 0.5,
          prefix_padding_ms: 300,
          silence_duration_ms: 200
        },
        temperature: 0.7 // Balanced creativity and consistency
      },
      outputGuardrails: professionalGuardrails
    }
  };
};
13
30 minutes

MCP Server Integration & Context Management

Connect your voice AI to the existing MCP server to access professional profile data and enable context-aware conversations

📚 Understanding This Step

Your voice AI needs access to your professional information from the MCP server built in the simple workshop. This step creates the integration that allows your AI to respond with specific details about your experience, skills, and career goals during voice conversations. The VERCEL_MCP_SERVER_URL environment variable connects to your deployed MCP server, while optional MCP_API_KEY provides secure authentication.

Tasks to Complete

Verify VERCEL_MCP_SERVER_URL and MCP_API_KEY are properly configured
Configure API endpoints to connect voice AI with MCP server
Implement context retrieval from professional profile RAG system
Design conversation memory management for multi-turn interactions
Create semantic search integration for relevant information retrieval
Test context accuracy in voice responses
Implement conversation state persistence across sessions

MCP Server Voice AI Integration

javascript

Complete integration between OpenAI Realtime API and existing MCP server for context-aware conversations

// MCP Server Voice AI Integration
// Copy this to: mcp-voice-integration.js

/**
 * PHASE 1: MCP Integration Planning Prompt
 * Use this with your AI assistant for implementation guidance
 */

const mcpIntegrationPrompt = `
Integrate voice AI with existing MCP server for context-aware professional conversations:

## Integration Architecture:

1. **Context Retrieval System**
   - Connect OpenAI Realtime API responses to MCP server data
   - Implement semantic search for relevant professional information
   - Design conversation memory management for multi-turn dialogs
   - Create context scoring and relevance ranking

2. **Professional Profile Access**
   - Retrieve specific work experience details during conversations
   - Access technical skills and project examples for detailed responses
   - Integrate salary expectations and career preferences
   - Connect achievement metrics and performance data

3. **Real-time Data Integration**
   - Low-latency API calls that don't interrupt conversation flow
   - Intelligent caching of frequently accessed profile data
   - Context prediction for proactive information loading
   - Error handling and graceful degradation when MCP server unavailable

4. **Conversation Intelligence**
   - Topic detection to trigger relevant context retrieval
   - Multi-turn conversation memory and state management
   - Context switching between different professional domains
   - Personalized response generation based on conversation history

Provide production-ready implementation with error handling, caching, and optimization for voice interaction latency requirements.
`;

/**
 * PHASE 2: MCP Context Manager Implementation
 */

class MCPVoiceContextManager {
  constructor(config) {
    this.config = {
      // VERCEL_MCP_SERVER_URL: URL of your deployed MCP server from simple workshop
      // Format: https://your-project-name.vercel.app
      // This connects voice AI to your professional profile RAG system
      mcpServerUrl: config.mcpServerUrl || process.env.VERCEL_MCP_SERVER_URL,
      cacheTimeout: config.cacheTimeout || 300000, // 5 minutes
      maxContextLength: config.maxContextLength || 4000,
      ...config
    };
    
    this.contextCache = new Map();
    this.conversationMemory = [];
    this.currentTopics = new Set();
    this.profileData = null;
  }
  
  /**
   * Initialize MCP server connection and load base profile
   */
  async initialize() {
    try {
      console.log('Initializing MCP server integration...');
      
      // Load core professional profile data
      this.profileData = await this.fetchProfileData();
      
      // Warm up context cache with frequently accessed data
      await this.warmUpCache();
      
      console.log('MCP integration initialized successfully');
      return true;
      
    } catch (error) {
      console.error('Failed to initialize MCP integration:', error);
      return false;
    }
  }
  
  /**
   * Fetch complete professional profile from MCP server
   */
  async fetchProfileData() {
    const response = await fetch(`${this.config.mcpServerUrl}/api/profile`, {
      headers: {
        // MCP_API_KEY: Optional API key for secure MCP server access
        // Only required if your MCP server implements authentication
        'Authorization': `Bearer ${process.env.MCP_API_KEY}`,
        'Content-Type': 'application/json'
      }
    });
    
    if (!response.ok) {
      throw new Error(`MCP server error: ${response.status}`);
    }
    
    return await response.json();
  }
  
  /**
   * Warm up cache with frequently accessed professional data
   */
  async warmUpCache() {
    const commonQueries = [
      'work experience and achievements',
      'technical skills and expertise',
      'recent projects and accomplishments',
      'career goals and preferences',
      'education and certifications'
    ];
    
    for (const query of commonQueries) {
      await this.retrieveContext(query);
    }
  }
  
  /**
   * Process voice conversation and extract context needs
   */
  async processVoiceInput(transcript, conversationHistory = []) {
    try {
      // Add to conversation memory
      this.conversationMemory.push({
        timestamp: Date.now(),
        type: 'user',
        content: transcript
      });
      
      // Detect topics and context requirements
      const contextNeeds = await this.detectContextNeeds(transcript, conversationHistory);
      
      // Retrieve relevant professional information
      const contextData = await this.retrieveRelevantContext(contextNeeds);
      
      // Build enhanced system prompt with context
      const enhancedPrompt = this.buildContextualPrompt(contextData, transcript);
      
      return {
        enhancedPrompt,
        contextData,
        conversationState: this.getConversationState()
      };
      
    } catch (error) {
      console.error('Context processing error:', error);
      return this.getFallbackContext(transcript);
    }
  }
  
  /**
   * Detect what professional context is needed based on conversation
   */
  async detectContextNeeds(transcript, history) {
    // Use AI to analyze conversation and determine context needs
    const analysisPrompt = `
    Analyze this professional conversation to determine what specific information should be retrieved:
    
    Recent conversation: ${transcript}
    History: ${history.slice(-3).map(h => h.content).join('. ')}
    
    Available professional data categories:
    - Work experience and roles
    - Technical skills and projects
    - Achievements and metrics
    - Education and certifications
    - Career preferences and goals
    - Salary expectations
    - Location preferences
    
    Return JSON array of specific context categories needed for a relevant response.
    `;
    
    // This would typically use a separate AI call for context analysis
    // For now, implement rule-based detection
    return this.ruleBasedContextDetection(transcript);
  }
  
  /**
   * Rule-based context detection for common conversation patterns
   */
  ruleBasedContextDetection(transcript) {
    const contextNeeds = [];
    const lowerText = transcript.toLowerCase();
    
    // Experience and background questions
    if (lowerText.includes('experience') || lowerText.includes('background') || lowerText.includes('worked')) {
      contextNeeds.push('work_experience');
    }
    
    // Technical skills questions
    if (lowerText.includes('skills') || lowerText.includes('technology') || lowerText.includes('technical')) {
      contextNeeds.push('technical_skills');
    }
    
    // Project and achievement questions
    if (lowerText.includes('project') || lowerText.includes('built') || lowerText.includes('achievement')) {
      contextNeeds.push('projects_achievements');
    }
    
    // Career goals and preferences
    if (lowerText.includes('goals') || lowerText.includes('looking for') || lowerText.includes('interested')) {
      contextNeeds.push('career_goals');
    }
    
    // Compensation discussions
    if (lowerText.includes('salary') || lowerText.includes('compensation') || lowerText.includes('pay')) {
      contextNeeds.push('salary_expectations');
    }
    
    return contextNeeds.length > 0 ? contextNeeds : ['general_profile'];
  }
  
  /**
   * Retrieve relevant context from MCP server based on detected needs
   */
  async retrieveRelevantContext(contextNeeds) {
    const contextData = {};
    
    for (const need of contextNeeds) {
      // Check cache first
      const cacheKey = `context_${need}`;
      const cached = this.contextCache.get(cacheKey);
      
      if (cached && (Date.now() - cached.timestamp) < this.config.cacheTimeout) {
        contextData[need] = cached.data;
        continue;
      }
      
      // Fetch from MCP server
      try {
        const data = await this.fetchContextData(need);
        
        // Cache the result
        this.contextCache.set(cacheKey, {
          data,
          timestamp: Date.now()
        });
        
        contextData[need] = data;
        
      } catch (error) {
        console.warn(`Failed to fetch context for ${need}:`, error);
        contextData[need] = this.getFallbackData(need);
      }
    }
    
    return contextData;
  }
  
  /**
   * Fetch specific context data from MCP server
   */
  async fetchContextData(contextType) {
    const endpoint = this.getContextEndpoint(contextType);
    
    const response = await fetch(`${this.config.mcpServerUrl}${endpoint}`, {
      headers: {
        // MCP_API_KEY: Secure authentication for MCP server API calls
        // Ensures only authorized access to your professional profile data
        'Authorization': `Bearer ${process.env.MCP_API_KEY}`,
        'Content-Type': 'application/json'
      }
    });
    
    if (!response.ok) {
      throw new Error(`Context fetch error: ${response.status}`);
    }
    
    return await response.json();
  }
  
  /**
   * Map context types to MCP server endpoints
   */
  getContextEndpoint(contextType) {
    const endpointMap = {
      work_experience: '/api/experience',
      technical_skills: '/api/skills',
      projects_achievements: '/api/projects',
      career_goals: '/api/goals',
      salary_expectations: '/api/compensation',
      general_profile: '/api/profile/summary'
    };
    
    return endpointMap[contextType] || '/api/profile/summary';
  }
  
  /**
   * Build contextual system prompt with retrieved professional data
   */
  buildContextualPrompt(contextData, currentQuestion) {
    let contextInfo = '';
    
    Object.entries(contextData).forEach(([type, data]) => {
      if (data && data.content) {
        contextInfo += `

${type.toUpperCase()} CONTEXT:
${data.content}`;
      }
    });
    
    return `You are responding to: "${currentQuestion}"
    
Current conversation context: Use the following professional information to provide specific, detailed responses:
    ${contextInfo}
    
    Instructions:
    - Respond with specific examples and metrics from the context data
    - Keep responses conversational but substantive for professional discussions
    - Reference actual projects, achievements, and experience details
    - Maintain consistency with previous conversation topics`;
  }
  
  /**
   * Get current conversation state for context continuity
   */
  getConversationState() {
    return {
      topics: Array.from(this.currentTopics),
      memoryLength: this.conversationMemory.length,
      recentContext: this.conversationMemory.slice(-5),
      timestamp: Date.now()
    };
  }
  
  /**
   * Handle conversation completion and update memory
   */
  processVoiceResponse(response) {
    this.conversationMemory.push({
      timestamp: Date.now(),
      type: 'assistant',
      content: response
    });
    
    // Trim memory if too long
    if (this.conversationMemory.length > 20) {
      this.conversationMemory = this.conversationMemory.slice(-15);
    }
  }
  
  /**
   * Fallback context for when MCP server is unavailable
   */
  getFallbackContext(transcript) {
    return {
      enhancedPrompt: `Respond professionally to: "${transcript}" using general professional knowledge.`,
      contextData: { general: 'MCP server temporarily unavailable' },
      conversationState: { fallback: true }
    };
  }
  
  /**
   * Fallback data for specific context types
   */
  getFallbackData(contextType) {
    const fallbackData = {
      work_experience: { content: 'Experienced professional with diverse background' },
      technical_skills: { content: 'Proficient in modern technologies and best practices' },
      projects_achievements: { content: 'Delivered successful projects with measurable impact' },
      career_goals: { content: 'Seeking opportunities for professional growth and impact' },
      salary_expectations: { content: 'Competitive compensation based on market standards' }
    };
    
    return fallbackData[contextType] || { content: 'Professional information available upon request' };
  }
}

/**
 * PHASE 3: Enhanced Voice AI Controller with MCP Integration
 */

class EnhancedVoiceAIController {
  constructor(config) {
    this.audioManager = new VoiceAIAudioManager();
    this.realtimeClient = new OpenAIRealtimeClient(config.openai);
    this.contextManager = new MCPVoiceContextManager(config.mcp);
    
    this.setupIntegratedEventHandlers();
  }
  
  setupIntegratedEventHandlers() {
    // Handle user speech with context processing
    this.realtimeClient.on('userTranscript', async (transcript) => {
      console.log('Processing user input with professional context...');
      
      const contextResult = await this.contextManager.processVoiceInput(transcript);
      
      // Update OpenAI with enhanced context
      this.realtimeClient.updateSystemPrompt(contextResult.enhancedPrompt);
    });
    
    // Handle AI responses and update conversation memory
    this.realtimeClient.on('responseComplete', (response) => {
      this.contextManager.processVoiceResponse(response);
    });
  }
  
  async initialize() {
    // Initialize all components
    await this.audioManager.initialize();
    await this.realtimeClient.connect();
    await this.contextManager.initialize();
    
    console.log('Enhanced Voice AI with MCP integration ready');
  }
}

// Export enhanced controller
module.exports = {
  mcpIntegrationPrompt,
  MCPVoiceContextManager,
  EnhancedVoiceAIController
};
14
25 minutes

Professional Conversation Flow Design

Create structured conversation templates and response patterns for different professional scenarios and interview types

📚 Understanding This Step

Professional conversations follow predictable patterns. This step creates conversation templates that ensure your voice AI responds appropriately to different types of professional interactions, from HR screenings to technical deep-dives.

Tasks to Complete

Design conversation templates for HR screening calls
Create technical interview response patterns using STAR methodology
Develop networking conversation flows for relationship building
Implement salary negotiation conversation scripts
Create follow-up question handling and topic transitions
Test conversation flows with realistic professional scenarios

Professional Conversation Flow Templates

javascript

Structured conversation patterns and response templates for professional voice AI interactions

// Professional Conversation Flow Templates
// Copy this to: conversation-flow-templates.js

/**
 * PHASE 1: Conversation Design Framework
 */

const conversationFlowPrompt = `
Design professional conversation flows for voice AI in business contexts:

## Conversation Scenarios:

1. **HR Screening Calls**
   - Professional greeting and rapport building
   - Experience overview with key achievements
   - Salary and location preference discussions
   - Company culture fit assessment
   - Next steps and follow-up coordination

2. **Technical Interviews**
   - Technical competency demonstration
   - Project deep-dives with STAR methodology
   - Problem-solving approach explanation
   - System design and architecture discussions
   - Code review and technical challenge responses

3. **Networking Conversations**
   - Professional introduction and elevator pitch
   - Industry insights and expertise sharing
   - Mutual value creation opportunities
   - Relationship building and follow-up planning
   - Referral and recommendation discussions

4. **Career Coaching Sessions**
   - Career goal assessment and planning
   - Skill gap analysis and development plans
   - Industry trend discussions and positioning
   - Professional brand development
   - Job search strategy and optimization

For each scenario, provide:
- Conversation flow structure and key phases
- Response templates with personalization variables
- Question handling strategies and follow-up patterns
- Professional tone and communication style guidelines
- Metrics and examples integration approaches
`;

/**
 * PHASE 2: Conversation Flow Manager
 */

class ConversationFlowManager {
  constructor(config = {}) {
    this.config = config;
    this.currentFlow = null;
    this.conversationPhase = 'initial';
    this.conversationHistory = [];
    this.detectedIntent = null;
    
    // Load conversation templates
    this.templates = this.initializeTemplates();
  }
  
  /**
   * Initialize conversation flow templates
   */
  initializeTemplates() {
    return {
      hr_screening: new HRScreeningFlow(),
      technical_interview: new TechnicalInterviewFlow(),
      networking: new NetworkingConversationFlow(),
      career_coaching: new CareerCoachingFlow(),
      general_professional: new GeneralProfessionalFlow()
    };
  }
  
  /**
   * Detect conversation type and initialize appropriate flow
   */
  detectConversationType(transcript, context = {}) {
    const lowerText = transcript.toLowerCase();
    
    // HR/Recruiting indicators
    if (this.containsKeywords(lowerText, [
      'recruiter', 'hr', 'hiring', 'position', 'role', 'company',
      'salary', 'benefits', 'start date', 'background check'
    ])) {
      return 'hr_screening';
    }
    
    // Technical interview indicators
    if (this.containsKeywords(lowerText, [
      'technical', 'code', 'system design', 'algorithm', 'architecture',
      'programming', 'development', 'engineering', 'project details'
    ])) {
      return 'technical_interview';
    }
    
    // Networking indicators
    if (this.containsKeywords(lowerText, [
      'networking', 'introduction', 'connect', 'industry', 'insights',
      'collaboration', 'opportunity', 'referral', 'recommendation'
    ])) {
      return 'networking';
    }
    
    // Career coaching indicators
    if (this.containsKeywords(lowerText, [
      'career', 'goals', 'development', 'growth', 'skills', 'advice',
      'guidance', 'planning', 'transition', 'coaching'
    ])) {
      return 'career_coaching';
    }
    
    return 'general_professional';
  }
  
  containsKeywords(text, keywords) {
    return keywords.some(keyword => text.includes(keyword));
  }
  
  /**
   * Process conversation input and generate appropriate response
   */
  processConversation(transcript, contextData) {
    // Detect or maintain conversation type
    if (!this.currentFlow) {
      const conversationType = this.detectConversationType(transcript, contextData);
      this.currentFlow = this.templates[conversationType];
      this.detectedIntent = conversationType;
    }
    
    // Process input through current conversation flow
    const response = this.currentFlow.processInput(
      transcript, 
      contextData, 
      this.conversationPhase,
      this.conversationHistory
    );
    
    // Update conversation state
    this.updateConversationState(transcript, response);
    
    return {
      response: response.content,
      systemPrompt: response.systemPrompt,
      conversationType: this.detectedIntent,
      phase: this.conversationPhase,
      suggestedFollowUp: response.suggestedFollowUp
    };
  }
  
  updateConversationState(input, response) {
    this.conversationHistory.push({
      timestamp: Date.now(),
      input,
      response: response.content,
      phase: this.conversationPhase
    });
    
    // Update conversation phase based on flow
    this.conversationPhase = response.nextPhase || this.conversationPhase;
  }
}

/**
 * PHASE 3: HR Screening Conversation Flow
 */

class HRScreeningFlow {
  constructor() {
    this.phases = ['greeting', 'background', 'role_discussion', 'logistics', 'closing'];
    this.currentPhase = 'greeting';
  }
  
  processInput(transcript, contextData, phase, history) {
    switch (phase) {
      case 'greeting':
        return this.handleGreeting(transcript, contextData);
      case 'background':
        return this.handleBackground(transcript, contextData);
      case 'role_discussion':
        return this.handleRoleDiscussion(transcript, contextData);
      case 'logistics':
        return this.handleLogistics(transcript, contextData);
      default:
        return this.handleGeneral(transcript, contextData);
    }
  }
  
  handleGreeting(transcript, contextData) {
    return {
      content: `Thank you for taking the time to speak with me today. I'm excited to learn more about this opportunity and discuss how my background aligns with what you're looking for. I understand you'd like to get to know more about my experience and qualifications?`,
      systemPrompt: 'Respond professionally and enthusiastically. Show genuine interest in the role and company.',
      nextPhase: 'background',
      suggestedFollowUp: ['Tell me about yourself', 'Walk me through your background']
    };
  }
  
  handleBackground(transcript, contextData) {
    const experience = contextData.work_experience?.content || 'diverse professional experience';
    
    return {
      content: `I'd be happy to walk you through my background. ${experience} I'm particularly excited about this role because it aligns perfectly with my experience and career goals. What specific aspects of my background would you like me to elaborate on?`,
      systemPrompt: 'Provide specific examples from work experience. Reference actual achievements and metrics from context data.',
      nextPhase: 'role_discussion',
      suggestedFollowUp: ['Tell me about the role', 'What attracted you to this position?']
    };
  }
  
  handleRoleDiscussion(transcript, contextData) {
    const goals = contextData.career_goals?.content || 'professional growth and impact';
    
    return {
      content: `Based on what I understand about the role, it seems like an excellent fit for my skills and interests. ${goals} I'm particularly drawn to the opportunity to contribute to your team's success. Could you tell me more about the day-to-day responsibilities and team dynamics?`,
      systemPrompt: 'Show enthusiasm and ask thoughtful questions about the role and company culture.',
      nextPhase: 'logistics',
      suggestedFollowUp: ['What are your salary expectations?', 'When can you start?']
    };
  }
  
  handleLogistics(transcript, contextData) {
    const salary = contextData.salary_expectations?.content || 'competitive market rates';
    
    return {
      content: `I'm flexible on logistics and committed to making this work if it's the right fit. Regarding compensation, I'm looking for ${salary} and I'm open to discussing the complete package. I could potentially start within 2-3 weeks notice at my current position. What are the next steps in your process?`,
      systemPrompt: 'Be professional but confident about compensation and logistics. Show flexibility while maintaining your worth.',
      nextPhase: 'closing',
      suggestedFollowUp: ['Do you have any other questions for me?']
    };
  }
  
  handleGeneral(transcript, contextData) {
    return {
      content: 'I appreciate your question. Let me provide you with a comprehensive answer based on my experience.',
      systemPrompt: 'Provide detailed, professional responses with specific examples from contextData.',
      nextPhase: this.currentPhase,
      suggestedFollowUp: []
    };
  }
}

/**
 * PHASE 4: Technical Interview Flow
 */

class TechnicalInterviewFlow {
  constructor() {
    this.phases = ['technical_intro', 'project_deep_dive', 'problem_solving', 'system_design', 'questions'];
  }
  
  processInput(transcript, contextData, phase, history) {
    switch (phase) {
      case 'technical_intro':
        return this.handleTechnicalIntro(transcript, contextData);
      case 'project_deep_dive':
        return this.handleProjectDeepDive(transcript, contextData);
      case 'problem_solving':
        return this.handleProblemSolving(transcript, contextData);
      default:
        return this.handleTechnicalGeneral(transcript, contextData);
    }
  }
  
  handleTechnicalIntro(transcript, contextData) {
    const skills = contextData.technical_skills?.content || 'comprehensive technical expertise';
    
    return {
      content: `I'm excited to dive into the technical discussion. I have ${skills} and I'm passionate about building scalable, maintainable solutions. I'd love to walk you through some of the projects I've worked on and discuss the technical decisions behind them. What would you like to explore first?`,
      systemPrompt: 'Demonstrate technical confidence while being approachable. Reference specific technologies and methodologies.',
      nextPhase: 'project_deep_dive'
    };
  }
  
  handleProjectDeepDive(transcript, contextData) {
    const projects = contextData.projects_achievements?.content || 'impactful technical projects';
    
    return {
      content: `Let me walk you through one of my recent projects using the STAR method. **Situation**: ${this.extractSTARComponent(projects, 'situation')}. **Task**: ${this.extractSTARComponent(projects, 'task')}. **Action**: ${this.extractSTARComponent(projects, 'action')}. **Result**: ${this.extractSTARComponent(projects, 'result')}. The technical challenges were particularly interesting - would you like me to elaborate on any specific aspect?`,
      systemPrompt: 'Use STAR methodology to structure technical examples. Include specific metrics and technical details.',
      nextPhase: 'problem_solving'
    };
  }
  
  extractSTARComponent(projectData, component) {
    // This would extract specific STAR components from project data
    // For now, return generic professional responses
    const starMap = {
      situation: 'We needed to solve a complex technical challenge',
      task: 'I was responsible for designing and implementing the solution',
      action: 'I researched best practices, designed the architecture, and led the implementation',
      result: 'We delivered on time with measurable performance improvements'
    };
    
    return starMap[component] || projectData;
  }
  
  handleProblemSolving(transcript, contextData) {
    return {
      content: `That's a great technical question. Let me think through this systematically. First, I'd clarify the requirements and constraints. Then I'd consider different approaches, weighing trade-offs like performance, scalability, and maintainability. Based on my experience with similar challenges, I'd recommend... Would you like me to elaborate on any particular aspect of this solution?`,
      systemPrompt: 'Demonstrate systematic problem-solving approach. Show technical depth while explaining clearly.',
      nextPhase: 'questions'
    };
  }
  
  handleTechnicalGeneral(transcript, contextData) {
    return {
      content: 'That\'s an excellent technical question. Let me break down my approach...',
      systemPrompt: 'Provide detailed technical explanations with practical examples from your experience.',
      nextPhase: 'questions'
    };
  }
}

// Export conversation flow system
module.exports = {
  conversationFlowPrompt,
  ConversationFlowManager,
  HRScreeningFlow,
  TechnicalInterviewFlow
};
15
90 minutes

Telephony Integration & Omni-Channel Architecture

Design comprehensive telephony system using Twilio Voice API for phone-based interactions and create unified omni-channel experience

📚 Understanding This Step

Extend your voice-enabled digital twin to handle phone calls, creating a complete omni-channel AI agent that can interact via chat (MCP), voice (Realtime API), and phone (Twilio). This creates a professional-grade system that can handle actual recruiter calls, networking conversations, and career discussions through any communication channel.

Tasks to Complete

Research telephony integration options with Twilio Voice API and alternatives
Design phone interaction workflows and call routing architecture
Plan professional phone greeting and conversation management
Create unified omni-channel architecture connecting chat, voice, and phone
Implement call recording, transcription, and conversation logging
Design escalation flows for complex conversations
Test complete omni-channel experience across all communication methods

Omni-Channel Telephony Architecture

javascript

Complete telephony integration design and implementation planning

// Telephony Integration Architecture Plan
// Copy this to: telephony-integration-plan.js

/**
 * PHASE 1: Telephony Requirements Analysis Prompt
 * Use this with your AI assistant for comprehensive analysis
 */

const telephonyResearchPrompt = `
Design a comprehensive telephony integration system for my professional digital twin:

## Current System Status:
- Deployed MCP server on Vercel with professional profile RAG
- Voice AI integration with OpenAI Realtime API (from Step 11)
- Need to add phone-based interaction capabilities

## Telephony Requirements Analysis:

1. **Integration Platform Options**
   - Twilio Voice API capabilities and pricing
   - VAPI.ai telephony features and comparison
   - Vercel Functions compatibility for call handling
   - WebRTC vs traditional telephony approaches

2. **Professional Phone Interaction Features**
   - Inbound call handling with professional greeting
   - AI-powered conversation with context from MCP server
   - Call recording and transcription for follow-up
   - Voicemail and callback management
   - Integration with calendar scheduling systems

3. **Omni-Channel Experience Design**
   - Unified conversation context across chat, voice, and phone
   - Seamless handoff between communication channels
   - Consistent professional persona across all touchpoints
   - Conversation history and relationship management

4. **Technical Architecture Requirements**
   - Real-time audio processing and streaming
   - Call routing and queue management
   - Integration with existing Vercel infrastructure
   - Scalability and cost optimization

5. **Business Use Cases**
   - Recruiter screening calls with AI pre-qualification
   - Networking conversations and relationship building
   - Career coaching and interview preparation sessions
   - Professional consultation and expertise sharing

Provide detailed technical specifications, cost analysis, implementation complexity assessment, and recommended architecture for each option.
`;

/**
 * PHASE 2: Omni-Channel Architecture Design
 */

const omniChannelArchitecture = {
  // Communication Channel Integration
  channels: {
    chat: {
      platform: 'MCP Server (Claude Desktop integration)',
      features: ['text-based interaction', 'file sharing', 'code examples'],
      useCase: 'Detailed technical discussions and documentation'
    },
    
    voice: {
      platform: 'OpenAI Realtime API (from Step 11)',
      features: ['voice-to-voice conversation', 'real-time interaction'],
      useCase: 'Interview practice and conversational coaching'
    },
    
    phone: {
      platform: 'Twilio Voice API',
      features: ['inbound/outbound calls', 'recording', 'transcription'],
      useCase: 'Professional calls and recruiter interactions'
    }
  },
  
  // Unified Data Layer
  dataIntegration: {
    sharedContext: {
      source: 'Existing MCP server RAG system',
      sync: 'Real-time conversation context across all channels',
      persistence: 'Conversation history and relationship tracking'
    },
    
    professionalProfile: {
      data: 'Comprehensive professional information from Step 1',
      access: 'Consistent across chat, voice, and phone interactions',
      updates: 'Real-time learning from conversations'
    }
  },
  
  // Professional Call Management
  callFlows: {
    inboundGreeting: {
      script: 'Professional introduction with context awareness',
      routing: 'Intelligent call classification and handling',
      escalation: 'Human handoff for complex situations'
    },
    
    recruiterScreening: {
      qualification: 'AI-powered initial screening questions',
      responses: 'Data-driven answers from professional profile',
      followUp: 'Automated scheduling and next steps'
    },
    
    networkingCalls: {
      relationship: 'Context from previous interactions',
      valueCreation: 'Professional insights and expertise sharing',
      continuity: 'Follow-up planning and relationship nurturing'
    }
  }
};

/**
 * PHASE 3: Twilio Integration Implementation
 */

const twilioIntegration = {
  // Twilio Configuration
  setup: {
    account: {
      requirement: 'Twilio account with Voice API access',
      credentials: ['Account SID', 'Auth Token', 'Phone Number'],
      verification: 'Phone number verification and setup'
    },
    
    webhooks: {
      endpoint: 'Vercel function for call handling',
      events: ['incoming-call', 'call-status', 'recording-complete'],
      security: 'Request validation and authentication'
    }
  },
  
  // Call Handling Logic
  callHandling: {
    incomingCall: {
      greeting: 'Professional AI assistant introduction',
      contextRetrieval: 'Caller identification and history lookup',
      conversation: 'Integration with OpenAI Realtime API',
      recording: 'Automatic call recording and transcription'
    },
    
    callRouting: {
      screening: 'AI-powered call classification',
      priority: 'Important call identification and handling',
      voicemail: 'Professional voicemail with callback scheduling',
      escalation: 'Human contact when needed'
    }
  },
  
  // Cost Optimization
  costManagement: {
    usage: 'Monitor call volume and duration',
    optimization: 'Efficient call routing and AI processing',
    budgeting: 'Cost caps and usage alerts',
    analytics: 'ROI tracking for professional interactions'
  }
};

/**
 * PHASE 4: Implementation Roadmap
 */

const implementationRoadmap = [
  {
    phase: 'Telephony Platform Setup',
    duration: '30 minutes',
    tasks: [
      'Create Twilio account and obtain phone number',
      'Configure Vercel environment variables for Twilio integration',
      'Set up webhook endpoints for call handling',
      'Test basic call routing and connection'
    ]
  },
  
  {
    phase: 'Voice AI Integration',
    duration: '30 minutes',
    tasks: [
      'Connect Twilio calls to OpenAI Realtime API',
      'Implement call audio streaming and processing',
      'Configure professional greeting and conversation flows',
      'Test voice quality and conversation coherence'
    ]
  },
  
  {
    phase: 'Omni-Channel Unification',
    duration: '30 minutes',
    tasks: [
      'Unify conversation context across chat, voice, and phone',
      'Implement conversation history and relationship tracking',
      'Create seamless handoff between communication channels',
      'Test complete omni-channel user experience'
    ]
  }
];

// Alternative: VAPI.ai Integration (Simpler Option)
const vapiAlternative = {
  advantages: [
    'Built-in telephony integration',
    'Simplified voice AI setup',
    'Pre-configured call handling',
    'Integrated analytics and monitoring'
  ],
  
  implementation: {
    setup: 'VAPI.ai account and API configuration',
    integration: 'Connect to existing MCP server via API',
    customization: 'Professional voice persona configuration',
    testing: 'End-to-end call testing and optimization'
  },
  
  consideration: 'Evaluate VAPI.ai vs Twilio based on cost, features, and control requirements'
};

// Export complete telephony architecture
module.exports = {
  telephonyResearchPrompt,
  omniChannelArchitecture,
  twilioIntegration,
  implementationRoadmap,
  vapiAlternative
};

Learning Outcomes

Advanced skills and knowledge you'll master

Advanced AI agent architecture with multi-channel support
Voice AI integration patterns and best practices
Telephony system design and implementation planning
Real-time audio processing and conversation management
Professional AI persona development and conversation design
Production deployment of complex AI systems

Your Advanced Voice AI is Complete! 🎙️

You've built a sophisticated voice AI agent with OpenAI's Realtime API, server-side security, and professional conversation capabilities.