How to Enable Voice Chat with TTS for Your Assistant

Superinterface allows you to enable voice chat using OpenAI’s Whisper and TTS APIs. This setup transcribes speech into text, processes it through your AI model, and converts the response back into natural-sounding speech.
Unlike realtime WebRTC connections, this method works with any AI model, making it a flexible solution for voice interactions.
Let’s get started with setting it up step by step.

Step 1: Setting Up Your Assistant

Create and Configure

    Create a New Assistant
    Log in to your Superinterface dashboard, go to Assistants, and click New Assistant.
    Need help? Follow this guide.
    Choose an AI Provider
    TTS works with any provider, so select Anthropic, OpenAI, or another supported model.
    Set Thread Storage and Execution
    Choose Superinterface Cloud to handle transcription and speech generation.
    Pick a Model
    Select any model you prefer. For this example, we’ll use Claude 3.5 Sonnet.
    Setting up a voice assistant with TTS in Superinterface.

Add Personalization

    Customize Your Assistant
    Set a name and define its behavior.
    Set an Initial Message
    Add a welcome message like “Hi! How can I assist you today?” This will be spoken aloud when the voice chat starts.
    Save Your Assistant
    Click Save to store your settings.

Step 2: Publish Your TTS Voice Interface

Creating an interface for your TTS-based voice assistant.
Creating an interface for your TTS-based voice assistant.
Once your assistant is set up, it’s time to enable TTS-based voice chat and publish the interface.

Enable Voice Chat

    Create a New Interface
    Navigate to the Publish tab and click Choose Interface, then Create New Interface.
    Select Interaction Mode: Voice Chat
    This enables OpenAI’s Whisper and TTS APIs to process speech.
    Enabling Voice Mode in Superinterface.
    Enabling Voice Mode in Superinterface.

Publish Your Assistant

    Finalize Publishing
    Choose your publishing method: subdomain, custom domain, script tag, or React component.
    Need help? Follow this guide.
When users speak, their voice is transcribed using Whisper, processed by the assistant, and converted into speech using OpenAI’s TTS models.
Voice assistant using OpenAI's TTS and Whisper APIs via Superinterface.
Voice assistant using OpenAI's TTS and Whisper APIs via Superinterface.

You’re all set!

Now go ahead and let your users experience natural, high-quality voice interactions with your AI assistant—no realtime model required.