How to Enable Voice Chat with TTS for Your Assistant
Superinterface allows you to enable voice chat using OpenAI’s Whisper and TTS APIs. This setup transcribes speech into text, processes it through your AI model, and converts the response back into natural-sounding speech.
Unlike realtime WebRTC connections, this method works with any AI model, making it a flexible solution for voice interactions.
Let’s get started with setting it up step by step.
Step 1: Setting Up Your Assistant
Create and Configure
Create a New Assistant
Log in to your Superinterface dashboard, go to Assistants, and click New Assistant.
Choose an AI Provider
TTS works with any provider, so select Anthropic, OpenAI, or another supported model.
Set Thread Storage and Execution
Choose Superinterface Cloud to handle transcription and speech generation.
Pick a Model
Select any model you prefer. For this example, we’ll use Claude 3.5 Sonnet.
Setting up a voice assistant with TTS in Superinterface.
Add Personalization
Customize Your Assistant
Set a name and define its behavior.
Set an Initial Message
Add a welcome message like “Hi! How can I assist you today?” This will be spoken aloud when the voice chat starts.
Save Your Assistant
Click Save to store your settings.
Step 2: Publish Your TTS Voice Interface
Creating an interface for your TTS-based voice assistant.
Once your assistant is set up, it’s time to enable TTS-based voice chat and publish the interface.
Enable Voice Chat
Create a New Interface
Navigate to the Publish tab and click Choose Interface, then Create New Interface.
Select Interaction Mode: Voice Chat
This enables OpenAI’s Whisper and TTS APIs to process speech.
Enabling Voice Mode in Superinterface.
Publish Your Assistant
Finalize Publishing
Choose your publishing method: subdomain, custom domain, script tag, or React component.
When users speak, their voice is transcribed using Whisper, processed by the assistant, and converted into speech using OpenAI’s TTS models.
Voice assistant using OpenAI's TTS and Whisper APIs via Superinterface.
You’re all set!
Now go ahead and let your users experience natural, high-quality voice interactions with your AI assistant—no realtime model required.