Voice SDK overview
The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python:
- Intelligent segmentation: groups words into meaningful speech segments per speaker.
- Turn detection: automatically detects when speakers finish talking.
- Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
- Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
- Simplified event handling: delivers clean, structured segments instead of raw word-level events.
Voice SDK vs Realtime SDK
Use the Voice SDK when:
- Building conversational AI or voice agents
- You need automatic turn detection
- You want speaker-focused transcription
- You need ready-to-use presets for common scenarios
Use the Realtime SDK when:
- You need the raw stream of word-by-word transcription data
- Building custom segmentation logic
- You want fine-grained control over every event
- Processing audio files or custom workflows
Getting started
1. Create an API key
Create a Speechmatics API key in the portal to access the Voice SDK. Store your key securely as a managed secret.
2. Install dependencies
# Standard installation
pip install speechmatics-voice
# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]
3. Quickstart
Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:
import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType
async def main():
"""Stream microphone audio to Speechmatics Voice Agent using 'scribe' preset"""
# Audio configuration
SAMPLE_RATE = 16000 # Hz
CHUNK_SIZE = 160 # Samples per read
PRESET = "scribe" # Configuration preset
# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
preset=PRESET
)
# Print finalised segments of speech with speaker ID
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")
# Setup microphone
mic = Microphone(SAMPLE_RATE, CHUNK_SIZE)
if not mic.start():
print("Error: Microphone not available")
return
# Connect to the Voice Agent
await client.connect()
# Stream microphone audio (interruptable using keyboard)
try:
while True:
audio_chunk = await mic.read(CHUNK_SIZE)
if not audio_chunk:
break # Microphone stopped producing data
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()
if __name__ == "__main__":
asyncio.run(main())
Presets - the simplest way to get started
These are purpose-built, optimized configurations, ready for use without further modification:
fast - low latency, fast responses
adaptive - general conversation
smart_turn - complex conversation
external - user handles end of turn
scribe - note-taking
captions - live captioning
To view all available presets:
presets = VoiceAgentConfigPreset.list_presets()
4. Custom configurations
For more control, you can also specify custom configurations or use presets as a starting point and customise with overlays:
Specify configurations in a VoiceAgentConfig object:
from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode
config = VoiceAgentConfig(
language="en",
enable_diarization=True,
max_delay=0.7,
end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
)
client = VoiceAgentClient(api_key=os.getenv("YOUR_API_KEY"), config=config)
Use presets as a starting point and customise with overlays:
from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig
# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
VoiceAgentConfig(
language="es",
max_delay=0.8
)
)
Note: If no configuration or preset is provided, the client will default to the external preset.
FAQ
Support
Where can I provide feedback or get help?
You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.
Next steps
- For more information, see the Voice SDK on GitHub.
- For working examples, integrations and templates, check out the Speechmatics Academy.
- Share and discuss your project with our team or join our developer community on Reddit to connect with other builders in voice AI.