June Realtime Agent

This document explains how to set up and run a basic June Realtime Agent for low-latency, voice-first interactions using the June Realtime API and Agents SDK.

Prerequisites

June account with an API key.
Python 3.9+ or Node.js (depending on which SDK you use).
Basic familiarity with WebSocket/WebRTC or a framework that wraps them.

Export your API key & June endpoint:

Set these environment variables in your terminal before running the agent.

terminal


export REALTIME_API_KEY="your-api-key"
export REALTIME_WS_URL="wss://koe-labs--june-backend-server-run-server.modal.run/api/v1/real-time/connect"

Installation

You can use either the Python or JavaScript Agents SDK.

Python (Agents SDK)

Installation


pip install openai-agents

This provides RealtimeAgent and RealtimeRunner for building voice agents over the Realtime API.

JavaScript/TypeScript (Agents SDK)

Installation


npm install @openai/agents

This provides RealtimeAgent and RealtimeSession for browser and server use with the Realtime API.

Creating a basic realtime agent (Python)

The minimal structure to create a Realtime Agent and run a session:

realtime_agent.py


import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner
import os

async def main():
  # 1. Define the agent
  agent = RealtimeAgent(
      name="Assistant",
      instructions=(
          "You are a helpful voice assistant. "
          "Keep responses brief and conversational."
      ),
  )

  # 2. Configure the realtime runner
  runner = RealtimeRunner(starting_agent=agent)

  # Customize agent and modify base url and key
  config={
      "url": os.environ.get("REALTIME_WS_URL"),
      "headers": {"custom-auth": os.environ.get("REALTIME_API_KEY")},
      # NOTE: self.playback_tracker is typically part of a class;
      # using 'None' or a mock here for standalone script.
      "playback_tracker": None, 
      "initial_model_settings": {
          "model_name": "june-realtime",
          "voice": "sofia",
          "input_audio_format": "pcm16",
          "output_audio_format": "pcm16",
          "modalities": ["audio"],
          "turn_detection": {
              "type": "semantic_vad",
              "interrupt_response": True,
              "create_response": True,
          },
      },
  }

  # 3. Start the realtime session
  session = await runner.run(model_config=config)
  async with session:
      print("Session started, streaming audio in realtime.")
      async for event in session:
          # Handle key lifecycle events
          if event.type == "agent_start":
              print(f"Agent started: {event.agent.name}")
          elif event.type == "agent_end":
              print(f"Agent ended: {event.agent.name}")
          elif event.type == "audio":
              # Enqueue audio for playback in your audio pipeline
              pass
          elif event.type == "audio_end":
              print("Audio ended")
          elif event.type == "audio_interrupted":
              print("Audio interrupted")
          elif event.type == "tool_start":
              print(f"Tool started: {event.tool.name}")
          elif event.type == "tool_end":
              print(f"Tool ended: {event.tool.name}")
          elif event.type == "error":
              print(f"Error: {event.error}")

if __name__ == "__main__":
  asyncio.run(main())

You are responsible for wiring microphone capture and audio playback to the session’s audio events.

Core configuration options

Key config fields commonly used in an openai-realtime-agent style setup:

Model:
- model_name: e.g. june-realtime for the GA realtime model.
- voice: named voice such as miles, maya, etc.
- modalities: ["audio"] or ["text"] depending on your use case.
Audio:
- input_audio_format: typically pcm16 or a telephony codec such as g711_ulaw.
- output_audio_format: usually matches your playback pipeline (for example pcm16).
- input_audio_transcription.model: model used for ASR, such as june-realtime.
Turn detection:
- type: server_vad or semantic_vad for automatic turn-taking.
- interrupt_response: whether new user speech can interrupt model speech.

Running locally

Typical workflow for local development:

Start your application (Python or JS) that creates the Realtime Agent and session.
Open your browser or client app, grant microphone permissions, and connect to the session via WebRTC or WebSocket.
Speak into the mic; the agent streams responses as audio with low latency.

More in-depth examples:

More specific examples can be found in the examples repo.

Extending the agent

Once the basic agent works, you can layer on:

Tools (functions) for calling external APIs or services.
Handoffs between multiple agents (e.g., routing to a “supervisor” agent).
Guardrails such as content filters and safety checks.
Telephony/SIP or RTC integration for phone calls or multi-party sessions.

Refer to the official Realtime API and Agents SDK docs for up-to-date details and advanced patterns.