Skip to main content

Voice replies

Voice is optional and can be enabled per verb. Start with text only, then add voice once the personality feels right. If you want voice on every eligible reply, set voice response frequency to 100. Lower values make voice replies intermittent on purpose.

Voice attachment visual

The Voice Engine settings also include a Voice Attachment Visual section. You can upload a custom still image for voice message videos instead of using the default Verba image.
  • The live preview uses the same 20:7 crop as the final voice attachment
  • The generated video output is framed to 400x140
  • You can replace or remove the image at any time
  • If no custom image is uploaded, Verba falls back to the default voice banner
For best results, use a wide banner-style image with the subject centered, because tall portraits will be cropped heavily in the final attachment.

Voice cloning

Upload a short, clean sample to create a custom voice. The best results come from 6 to 12 seconds of clear speech with minimal background noise. Reference text is optional. If you can, paste the exact transcript of the sample. It improves similarity and keeps the voice more stable across replies.

Supported languages

The voice engine supports a focused language set:
  • Auto (recommended default)
  • English
  • Chinese
  • Japanese
  • Korean
  • German
  • French
  • Russian
  • Portuguese
  • Spanish
  • Italian
If you pick a language outside this list, the engine falls back to Auto.

Premium voice model access

Voice model availability is plan-based in the same way as the AI and Image engines. When a selected premium voice model is outside your current tier, Verba shows an upgrade prompt with the number of additional premium voice models available on a higher plan. That count is dynamic and can change as the voice catalog changes.

Discord voice chat

On Discord, Ultra verbs can join voice chat with /vc-join, and they can also join from a normal mention request such as asking the bot to join VC/call in server chat. Lower-tier verbs can still keep those commands enabled, but they respond with an in-character upgrade message instead of joining live VC. Normal generated voice messages stay free. When the live voice path is healthy, the bot can:
  • Listen in the connected voice channel
  • Transcribe incoming speech
  • Generate a reply
  • Speak the reply back into VC
Voice-channel replies now depend on the live voice pipeline, so if VC is silent you should verify:
  • Voice Engine is enabled for the verb
  • The selected voice model is available to that plan
  • The bot can access and speak in the target voice channel
  • The active speech provider is healthy and has available quota
If the selected live speech provider is unavailable or out of quota, Discord VC can join successfully but still fail to transcribe or speak until that provider becomes available again.

Keep it natural

  • Short replies sound better.
  • Avoid long paragraphs in voice mode.
  • Set a frequency that feels human.
If the voice feels off, lower reply length and reduce random creativity.

Safety and permissions

Only upload audio you own or have permission to use.

AI engine

Lower temperature for clearer voice output.