Customize your Voice: configuration via UI available now!

DATE: May 20, 2026

AUTHOR: Zowie Team

You can now configure how your AI agent speaks and listens directly from Zowie platform. This gives CX and operations teams precise control over the voice experience — without involving engineering.

What's new

Voice selection

Choose the voice your AI agent uses on every call. Voices are available from two providers: Google and ElevenLabs. Filter by provider, gender, or search by name. Preview any voice before saving — playback uses a live call to the text-to-speech service so what you hear is what your customers get.

Speech output controls

Tune how the agent sounds in conversation.

Speed (0.25x to 2x for Google, 0.7x to 1.2x for ElevenLabs) — slow the agent down for complex topics, speed it up for simple confirmations.
Volume (0.5x to 1.5x) — scale output amplitude on top of the base voice. Note: very high values may introduce audible crackling; adjust carefully.
Stability (ElevenLabs only) — controls how consistent the voice sounds across utterances. Higher values keep the voice close to its baseline tone; lower values introduce more expressive variation.
Similarity boost (ElevenLabs only) — controls how closely the generated speech resembles the source voice model. Higher values are more faithful to the model; lower values give it more freedom.

Custom vocabulary

Define how your AI agent pronounces specific words — brand names, product names, acronyms, technical terms. Add a phrase and a pronunciation spelling; the agent uses your version instead of the default. Supported for Google voices. ElevenLabs does not currently support custom vocabulary.

Speech recognition controls

Tune how the agent understands what callers say.

Priority phrases — list words and phrases the recognition model should weight more heavily. Useful for company names, product names, or proper nouns your callers are likely to say.
Post-process replacements — define find-and-replace rules applied to transcribed speech before it reaches the agent's reasoning layer. Use this to fix persistent misrecognitions, correct systematic errors, normalize formatting, or redact specific terms.

Session settings

Additional controls for call behavior.

Background noise — add ambient audio (five call center options, one office setting) to the agent's outgoing audio. Makes the agent sound more natural to callers who expect call center ambience. Volume is configurable. Does not affect what the agent hears from the caller.
Beep — plays a short tone after the caller finishes speaking, signaling that input was received and the agent is processing. Similar to a voicemail prompt beep.
DTMF (touchtone input) — allows callers to navigate by pressing phone keys instead of speaking. Active only during data collection steps; keypad input is ignored outside those steps. Runs in parallel with speech recognition — enabling it does not disable voice input.
Call recording — controls whether calls are recorded and stored. Recordings are retained for 30 days. Requires additional permissions to enable. Enabling triggers a confirmation acknowledging that local regulations typically require informing callers before recording begins.

Detailed documentation can be found under the link provided.

Links

Voice Configuration docs