Skip to main content
Voice agents

Voice SDK overview

Learn how to build voice-enabled applications with the Speechmatics Voice SDK

The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python:

  • Intelligent segmentation: groups words into meaningful speech segments per speaker.
  • Turn detection: automatically detects when speakers finish talking.
  • Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
  • Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
  • Simplified event handling: delivers clean, structured segments instead of raw word-level events.

Voice SDK vs Realtime SDK

Use the Voice SDK when:

  • Building conversational AI or voice agents
  • You need automatic turn detection
  • You want speaker-focused transcription
  • You need ready-to-use presets for common scenarios

Use the Realtime SDK when:

  • You need the raw stream of word-by-word transcription data
  • Building custom segmentation logic
  • You want fine-grained control over every event
  • Processing audio files or custom workflows

Getting started

1. Create an API key

Create a Speechmatics API key in the portal to access the Voice SDK. Store your key securely as a managed secret.

2. Install dependencies

# Standard installation
pip install speechmatics-voice

# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]

3. Quickstart

Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:

import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
"""Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset"""

# Audio configuration
SAMPLE_RATE = 16000 # Hz
CHUNK_SIZE = 160 # Samples per read
PRESET = "scribe" # Configuration preset

# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
preset=PRESET
)

# Print finalised segments of speech with speaker ID
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")

# Setup microphone
mic = Microphone(SAMPLE_RATE, CHUNK_SIZE)
if not mic.start():
print("Error: Microphone not available")
return

# Connect to the Voice Agent
await client.connect()

# Stream microphone audio (interruptable using keyboard)
try:
while True:
audio_chunk = await mic.read(CHUNK_SIZE)
if not audio_chunk:
break # Microphone stopped producing data
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()

if __name__ == "__main__":
asyncio.run(main())

Presets - the simplest way to get started

These are purpose-built, optimized configurations, ready for use without further modification:

fast - low latency, fast responses

adaptive - general conversation

smart_turn - complex conversation

external - user handles end of turn

scribe - note-taking

captions - live captioning

To view all available presets:

presets = VoiceAgentConfigPreset.list_presets()

4. Custom configurations

For more control, you can also specify custom configurations or use presets as a starting point and customise with overlays:

Specify configurations in a VoiceAgentConfig object:

from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode

config = VoiceAgentConfig(
language="en",
enable_diarization=True,
max_delay=0.7,
end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
)

client = VoiceAgentClient(api_key=os.getenv("YOUR_API_KEY"), config=config)

Note: If no configuration or preset is provided, the client will default to the external preset.

FAQ

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps