Voice and audio

Voice replies

Voice is optional and can be enabled per verb. Start with text only, then add voice once the personality feels right. If you want voice on every eligible reply, set voice response frequency to 100. Lower values make voice replies intermittent on purpose.

Voice attachment visual

The Voice Engine settings also include a Voice Attachment Visual section. You can upload a custom still image for voice message videos instead of using the default Verba image.

The live preview uses the same 20:7 crop as the final voice attachment
The generated video output is framed to 400x140
You can replace or remove the image at any time
If no custom image is uploaded, Verba falls back to the default voice banner

For best results, use a wide banner-style image with the subject centered, because tall portraits will be cropped heavily in the final attachment.

Voice cloning

Upload a short, clean sample to create a custom voice. The best results come from 6 to 12 seconds of clear speech with minimal background noise. Reference text is optional. If you can, paste the exact transcript of the sample. It improves similarity and keeps the voice more stable across replies.

Supported languages

The voice engine supports a focused language set:

Auto (recommended default)
English
Chinese
Japanese
Korean
German
French
Russian
Portuguese
Spanish
Italian

If you pick a language outside this list, the engine falls back to Auto.

Premium voice model access

Voice model availability is plan-based in the same way as the AI and Image engines. When a selected premium voice model is outside your current tier, Verba shows an upgrade prompt with the number of additional premium voice models available on a higher plan. That count is dynamic and can change as the voice catalog changes.

Discord voice chat

On Discord, Ultra verbs can join voice chat with /vc-join, and they can also join from a normal mention request such as asking the bot to join VC/call in server chat. Lower-tier verbs can still keep those commands enabled, but they respond with an in-character upgrade message instead of joining live VC. Normal generated voice messages stay free. When the live voice path is healthy, the bot can:

Listen in the connected voice channel
Transcribe incoming speech
Generate a reply
Speak the reply back into VC

Voice-channel replies now depend on the live voice pipeline, so if VC is silent you should verify:

Voice Engine is enabled for the verb
The selected voice model is available to that plan
The bot can access and speak in the target voice channel
The active speech provider is healthy and has available quota

If the selected live speech provider is unavailable or out of quota, Discord VC can join successfully but still fail to transcribe or speak until that provider becomes available again.

Keep it natural

Short replies sound better.
Avoid long paragraphs in voice mode.
Set a frequency that feels human.

If the voice feels off, lower reply length and reduce random creativity.

Safety and permissions

Only upload audio you own or have permission to use.

AI engine

Lower temperature for clearer voice output.

Getting Started

Build and Customize

Chat and Community

Media and Voice

Deploy and Integrate

Safety and Privacy

Billing and Accounts

Help and FAQ

Voice replies

Voice attachment visual

Voice cloning

Supported languages

Premium voice model access

Discord voice chat

Keep it natural

Safety and permissions

AI engine

​Voice replies

​Voice attachment visual

​Voice cloning

​Supported languages

​Premium voice model access

​Discord voice chat

​Keep it natural

​Safety and permissions

AI engine

Voice replies

Voice attachment visual

Voice cloning

Supported languages

Premium voice model access

Discord voice chat

Keep it natural

Safety and permissions