AI Voice Agent / ExecutiveAI Assistant

Real-time voice agent system

Built real-time voice agent system with Pipecat framework and Attendee meeting bot infrastructure, processing 16kHz audio streams via WebSocket connections with optimized VAD and supporting concurrent voice bots with conversation state management.

Project Overview

Key Features

Built real-time voice agent system with Pipecat framework and Attendee meeting bot infrastructure, processing 16kHz audio streams via WebSocket connections with optimized VAD (Voice Activity Detection) using Silero analyzer, supporting concurrent voice bots with conversation state management across 3 phases (Initial Questions, Follow-up, Ongoing Facilitation), featuring smart turn detection and participant-aware audio routing.
Developed meeting transcription and notetaker system using Groq LLM (Llama-4-Scout-17B) that processes meeting transcripts to generate 120+ word summaries, extracts 5+ action items per meeting, analyzes speaking time distribution, calculates sentiment scores, and provides effectiveness metrics (0-100 scale) with communication pattern analysis.
Architected 102+ REST API endpoints across Next.js frontend and Express backend, integrating Groq LLM for email prioritization, task generation, transcript analysis, and chat functionality, handling batch processing of multiple emails and meetings with JSON repair mechanisms for robust error handling.
Implemented encrypted voice bot transcription storage system in MongoDB with recurring meeting isolation using composite keys (meeting_id + date + time), tracking word counts, duration metrics, participant identification, and 95% confidence scores, with automatic transcription persistence and retrieval.
Integrated Pipecat framework with Attendee API for bidirectional voice communication with real-time audio processing (16-bit PCM at 16kHz), implementing custom AttendeeFrameSerializer for per-participant audio routing, WebSocket connection management via FastAPI transport layer, Groq STT for speech recognition, Sarvam TTS for natural voice synthesis, and voice-enabled meeting bots that can speak, transcribe, and analyze meetings in real-time with participant awareness.

Technologies Used

PythonPipecatFastAPINext.jsExpress.jsMongoDBWebSocketsGroq STTSilero VADSarvam TTSAttendee APIReal-time Audio ProcessingVoice Activity DetectionREST APIsLarge Language Models

View Live Project View Source Code