
This guide shows how to build a AI voice agent device with Realtime AI Speech powered by OpenAI Realtime API, ESP32, Secure WebSockets, and Deno Edge Functions for >10-minute uninterrupted global conversations.
An active version of this README is available at ElatoAI.
Demo Video
https://github.com/user-attachments/assets/aa60e54c-5847-4a68-80b5-5d6b1a5b9328
Hardware Design
The reference implementation uses an ESP32-S3 microcontroller with minimal additional components:

Required Components:
- ESP32-S3 development board
- I2S microphone (e.g., INMP441)
- I2S amplifier and speaker (e.g., MAX98357A)
- Push button to start/stop the conversation
- RGB LED for visual feedback
- Optional: touch sensor for alternative control
Optional hardware: A fully assembled PCB and device is available in the ElatoAI store.
🚀 Quick Start Guide
- Clone the repository
Head over to the ElatoAI GitHub repository and clone the repository.
git clone https://github.com/akdeb/ElatoAI.git
cd ElatoAI
- Set your environment variables (OPENAI_API_KEY, SUPABASE_ANON_KEY)
In the frontend-nextjs
directory, create a .env.local
file and set your environment variables.
cd frontend-nextjs
cp .env.example .env.local
# In .env.local, set your environment variables
# NEXT_PUBLIC_SUPABASE_ANON_KEY=<your-supabase-anon-key>
# OPENAI_API_KEY=<your-openai-api-key>
In the server-deno
directory, create a .env
file and set your environment variables.
cd server-deno
cp .env.example .env
# In .env, set your environment variables
# SUPABASE_KEY=<your-supabase-anon-key>
# OPENAI_API_KEY=<your-openai-api-key>
- Start Supabase
Install Supabase CLI and set up your Local Supabase Backend. From the root directory, run:
brew install supabase/tap/supabase
supabase start # Starts your local Supabase server with the default migrations and seed data.
- Set up your NextJS Frontend
From the frontend-nextjs
directory, run the following commands. (Login creds: Email: admin@elatoai.com
, Password: admin
)
cd frontend-nextjs
npm install
# Run the development server
npm run dev
- Start the Deno server
# Navigate to the server directory
cd server-deno
# Run the server at port 8000
deno run -A --env-file=.env main.ts
- Setup the ESP32 Device firmware
In Config.cpp
set ws_server
and backend_server
to your local IP address. Run ifconfig
in your console and find en0
-> inet
-> 192.168.1.100
(it may be different for your Wifi network). This tells the ESP32 device to connect to your NextJS frontend and Deno server running on your local machine. All services should be on the same Wifi network.
- Setup the ESP32 Device Wifi
Build and upload the firmware to your ESP32 device. The ESP32 should open an ELATO-DEVICE
captive portal to connect to Wifi. Connect to it and go to http://192.168.4.1
to configure the device wifi.
-
Once your Wifi credentials are configured, turn the device OFF and ON again and it should connect to your Wifi and your server.
-
Now you can talk to your AI Character!
🚀 Ready to Launch?
- Register your device by adding your ESP32 Device's MAC Address and a unique user code to the
devices
table in Supabase.
Pro Tip: To find your ESP32-S3 Device's MAC Address, build and upload
test/print_mac_address_test.cpp
using PlatformIO and view the serial monitor.
-
On your frontend client in the Settings page, add the unique user code so that the device is linked to your account in Supabase.
-
If you're testing locally, you can keep enabled the
DEV_MODE
macro infirmware-arduino/Config.h
and the Deno server env variable to use your local IP addresses for testing. -
Now you can register multiple devices to your account by repeating the process above.
Project Architecture
ElatoAI consists of three main components:
- Frontend Client (
Next.js
hosted on Vercel) - to create and talk to your AI agents and 'send' it to your ESP32 device - Edge Server Functions (
Deno
running on Deno/Supabase Edge) - to handle the websocket connections from the ESP32 device and the OpenAI API calls - ESP32 IoT Client (
PlatformIO/Arduino
) - to receive the websocket connections from the Edge Server Functions and send audio to the OpenAI API via the Deno edge server.
🌟 Key Features
- Realtime Speech-to-Speech: Instant speech conversion powered by OpenAI's Realtime APIs.
- Create Custom AI Agents: Create custom agents with different personalities and voices.
- Customizable Voices: Choose from a variety of voices and personalities.
- Secure WebSockets: Reliable, encrypted WebSocket communication.
- Server VAD Turn Detection: Intelligent conversation flow handling for smooth interactions.
- Opus Audio Compression: High-quality audio streaming with minimal bandwidth.
- Global Edge Performance: Low latency Deno Edge Functions ensuring seamless global conversations.
- ESP32 Arduino Framework: Optimized and easy-to-use hardware integration.
- Conversation History: View your conversation history.
- Device Management and Authentication: Register and manage your devices.
- User Authentication: Secure user authentication and authorization.
- Conversations with WebRTC and Websockets: Talk to your AI with WebRTC on the NextJS webapp and with websockets on the ESP32.
- Volume Control: Control the volume of the ESP32 speaker from the NextJS webapp.
- Realtime Transcripts: The realtime transcripts of your conversations are stored in the Supabase DB.
- OTA Updates: Over the Air Updates for the ESP32 firmware.
- Wifi Management with captive portal: Connect to your Wifi network from the ESP32 device.
- Factory Reset: Factory reset the ESP32 device from the NextJS webapp.
- Button and Touch Support: Use the button OR touch sensor to control the ESP32 device.
- No PSRAM Required: The ESP32 device does not require PSRAM to run the speech to speech AI.
- OAuth for Web client: OAuth for your users to manage their AI characters and devices.
🛠 Tech Stack
Component | Technology Used |
---|---|
Frontend | Next.js, Vercel |
Backend | Supabase DB |
Edge Functions | Edge Functions on Deno / Supabase Edge Runtime |
IoT Client | PlatformIO, Arduino Framework, ESP32-S3 |
Audio Codec | Opus |
Communication | Secure WebSockets |
Libraries | ArduinoJson, WebSockets, AsyncWebServer, ESP32_Button, Arduino Audio Tools, ArduinoLibOpus |
📈 Core Use Cases
We have a Usecases.md file that outlines the core use cases for the Elato AI device or any other custom conversational AI device.
🗺️ High-Level Flow
flowchart TD
User[User Speech] --> ESP32
ESP32[ESP32 Device] -->|WebSocket| Edge[Deno Edge Function]
Edge -->|OpenAI API| OpenAI[OpenAI Realtime API]
OpenAI --> Edge
Edge -->|WebSocket| ESP32
ESP32 --> User[AI Generated Speech]
Project Structure
graph TD
repo[ElatoAI]
repo --> frontend[Frontend Vercel NextJS]
repo --> deno[Deno Edge Function]
repo --> esp32[ESP32 Arduino Client]
deno --> supabase[Supabase DB]
frontend --> supabase
esp32 --> websockets[Secure WebSockets]
esp32 --> opus[Opus Codec]
esp32 --> audio_tools[arduino-audio-tools]
esp32 --> libopus[arduino-libopus]
esp32 --> ESPAsyncWebServer[ESPAsyncWebServer]
⚙️ PlatformIO Configuration
[env:esp32-s3-devkitc-1]
platform = espressif32 @ 6.10.0
board = esp32-s3-devkitc-1
framework = arduino
monitor_speed = 115200
lib_deps =
bblanchon/ArduinoJson@^7.1.0
links2004/WebSockets@^2.4.1
ESP32Async/ESPAsyncWebServer@^3.7.6
https://github.com/esp-arduino-libs/ESP32_Button.git#v0.0.1
https://github.com/pschatzmann/arduino-audio-tools.git#v1.0.1
https://github.com/pschatzmann/arduino-libopus.git#a1.1.0
📊 Important Stats
- ⚡️ Latency: <2s round-trip globally
- 🎧 Audio Quality: Opus codec at bitrate 12kbps (high clarity)
- ⏳ Uninterrupted Conversations: Up to 10 minutes continuous conversations
- 🌎 Global Availability: Optimized with edge computing with Deno
🛡 Security
- Secure WebSockets (WSS) for encrypted data transfers
- Optional: API Key encryption with 256-bit AES
- Supabase DB for secure authentication
- Supabase RLS for all tables
🚫 Limitations
- 3-4s Cold start time while connecting to edge server
- Limited to upto 10 minutes of uninterrupted conversations
- Edge server stops when wall clock time is exceeded
- No speech interruption detection on ESP32
License
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project interesting or useful, drop a GitHub ⭐️ at ElatoAI. It helps a lot!