June Realtime Agent
This document explains how to set up and run a basic June Realtime Agent for low-latency, voice-first interactions using the June Realtime API and Agents SDK.
Prerequisites
- June account with an API key.
- Python 3.9+ or Node.js (depending on which SDK you use).
- Basic familiarity with WebSocket/WebRTC or a framework that wraps them.
Export your API key & June endpoint:
Set these environment variables in your terminal before running the agent.
terminal
export REALTIME_API_KEY="your-api-key"
export REALTIME_WS_URL="wss://koe-labs--june-backend-server-run-server.modal.run/api/v1/real-time/connect"
Installation
You can use either the Python or JavaScript Agents SDK.
Python (Agents SDK)
Installation
pip install openai-agents
This provides RealtimeAgent and RealtimeRunner for building voice agents over the Realtime API.
JavaScript/TypeScript (Agents SDK)
Installation
npm install @openai/agents
This provides RealtimeAgent and RealtimeSession for browser and server use with the Realtime API.
Creating a basic realtime agent (Python)
The minimal structure to create a Realtime Agent and run a session:
realtime_agent.py
import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner
import os
async def main():
# 1. Define the agent
agent = RealtimeAgent(
name="Assistant",
instructions=(
"You are a helpful voice assistant. "
"Keep responses brief and conversational."
),
)
# 2. Configure the realtime runner
runner = RealtimeRunner(starting_agent=agent)
# Customize agent and modify base url and key
config={
"url": os.environ.get("REALTIME_WS_URL"),
"headers": {"custom-auth": os.environ.get("REALTIME_API_KEY")},
# NOTE: self.playback_tracker is typically part of a class;
# using 'None' or a mock here for standalone script.
"playback_tracker": None,
"initial_model_settings": {
"model_name": "june-realtime",
"voice": "sofia",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"modalities": ["audio"],
"turn_detection": {
"type": "semantic_vad",
"interrupt_response": True,
"create_response": True,
},
},
}
# 3. Start the realtime session
session = await runner.run(model_config=config)
async with session:
print("Session started, streaming audio in realtime.")
async for event in session:
# Handle key lifecycle events
if event.type == "agent_start":
print(f"Agent started: {event.agent.name}")
elif event.type == "agent_end":
print(f"Agent ended: {event.agent.name}")
elif event.type == "audio":
# Enqueue audio for playback in your audio pipeline
pass
elif event.type == "audio_end":
print("Audio ended")
elif event.type == "audio_interrupted":
print("Audio interrupted")
elif event.type == "tool_start":
print(f"Tool started: {event.tool.name}")
elif event.type == "tool_end":
print(f"Tool ended: {event.tool.name}")
elif event.type == "error":
print(f"Error: {event.error}")
if __name__ == "__main__":
asyncio.run(main())
You are responsible for wiring microphone capture and audio playback to the session’s audio events.
Core configuration options
Key config fields commonly used in an openai-realtime-agent style setup:
- Model:
model_name: e.g.june-realtimefor the GA realtime model.voice: named voice such asmiles,maya, etc.modalities:["audio"]or["text"]depending on your use case.
- Audio:
input_audio_format: typicallypcm16or a telephony codec such asg711_ulaw.output_audio_format: usually matches your playback pipeline (for examplepcm16).input_audio_transcription.model: model used for ASR, such asjune-realtime.
- Turn detection:
type:server_vadorsemantic_vadfor automatic turn-taking.interrupt_response: whether new user speech can interrupt model speech.
Running locally
Typical workflow for local development:
- Start your application (Python or JS) that creates the Realtime Agent and session.
- Open your browser or client app, grant microphone permissions, and connect to the session via WebRTC or WebSocket.
- Speak into the mic; the agent streams responses as audio with low latency.
More in-depth examples:
More specific examples can be found in the examples repo.
Extending the agent
Once the basic agent works, you can layer on:
- Tools (functions) for calling external APIs or services.
- Handoffs between multiple agents (e.g., routing to a “supervisor” agent).
- Guardrails such as content filters and safety checks.
- Telephony/SIP or RTC integration for phone calls or multi-party sessions.
Refer to the official Realtime API and Agents SDK docs for up-to-date details and advanced patterns.