Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
-
Updated
May 20, 2026
Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖
🎤💬 Full example of implementing ChatGPT's realtime voice from scratch with VAD + STT + LLM + TTS technology stack within almost one file!
Real-time voice agents with parallel async background sub-agents — conversations continue naturally while tasks run • Join the builders → https://discord.gg/mqxKaN3UKC
Open-source realtime voice agent server in Go with WebRTC (WHIP), barge-in, streaming STT/LLM/TTS pipelines, plugin system, multi-language SDKs, SIP telephony, ESP32 support & fully local mode.
An AI-powered object detection system using YOLOv8 to identify and locate graffiti across various contexts including walls, buildings, over-bridges, vehicles, and other surfaces.
LiveKit voice app validation skill. Use when building, debugging, or declaring working any LiveKit voice agent, Agents UI app, or React/Next.js LiveKit project. Enforces evidence-based validation before reporting a session, token endpoint, worker, transcript, or end-to-end voice interaction as complete.
Real-time voice interface for OpenClaw. Stream speech-to-text, LLM reasoning, and text-to-speech into a low-latency conversational agent you can talk to—locally or in the cloud.
Bounded-latency browser edge inference pipeline for real-time voice interview summarization using ONNX Runtime Web + WASM. Features Web Worker isolation, semantic ring buffers, latest-only concurrency control, observability dashboard, offline-first architecture and production-ready whisper.cpp upgrade path.
Voice agent prototype for structured clinical interviewing, with VAD-based interruption handling, modular ASR/LLM/TTS backends, and dialogue workflow control.
LiveKit Agents UI demo showing a voice AI assistant that schedules roof inspections using real-time voice interaction, visualizers, and booking workflow.
Real-time hand sign recognition using LSTM-based models for sequence detection from video frames.
howeverpipecat: engineering-focused Pipecat distribution
A real-time (<500ms) voice AI concierge built with Next.js, FastAPI, and Gemini 2.5 Flash Lite. Features local RAG (ChromaDB) for policy retrieval, Tool Calling for live booking, and event-driven CRM logging to Google Sheets.
Realtime multimodal AI agent with voice streaming, RAG memory, and autonomous workflows
Traffyx-AI — Traffic Forecasting & Urban Mobility Intelligence System Applied machine learning system for traffic prediction, congestion analysis, and real-world spatiotemporal data modeling.
Real-time face verification system using MediaPipe Face Mesh and landmark-based geometric feature extraction for improved accuracy and robustness.
Example apps showcase what can be build with the Livepeer BYOC workflow.
Production-ready real-time voice AI pipeline integrating Twilio Media Streams, streaming ASR (Deepgram), LLM reasoning, and live analytics dashboard. Designed for ultra-low latency conversational intelligence in call center and healthcare environments.
Realtime voice AI gateway with turn state, interruption handling, provider fallback, degraded state, audit events, runtime evals, Bun, and TypeScript.
High-performance async Python backend for real-time AI conversations with Quart, Supabase, and OpenAI.
Add a description, image, and links to the realtime-ai topic page so that developers can more easily learn about it.
To associate your repository with the realtime-ai topic, visit your repo's landing page and select "manage topics."