Skip to content

Text-to-Speech API

Prosa Text-to-Speech (TTS) API provides a way for you to create synthetic speech. Our TTS API is able to convert your articles, scripts, and dialogues into audio files, then you can embed them directly to your applications, websites, or multimedia contents. We support common audio formats, and you may choose your own synthesis method: synchoronously or asynchrounously.

Use Cases

There are 2 groups of use cases which are supported by the Prosa TTS API:

  • Instantaneous Synthesis

    If you need to create short audio files from some input text immediately, you can send the text to the API and wait for the audio to be returned. Our engine will synthesize the audio and send them to you synchronously. Use cases which may involve instantaneous systhesis are: voice response in telephony systems, question answering by virtual assistants, etc.

  • Batch Synthesis

    If you have collections of texts (e.g. articles, book chapters, news, etc.) to be synthesized, that you want to use later, you can submit the texts to the API asynchronously. Our system will schedule them to be synthesized and you can retrieve the audio files later. Use cases which may involve batch synthesis are: creating audiobooks, creating audible news & articles, etc.

Synthesis Methods

To support those use cases, Prosa TTS API provides 2 synthesis methods:

  • Synchronous Synthesis

    Clients send the text through the REST API. The wait field in the request body must be set to true. Clients then wait for the synthesis process to finish, then get the audio data/audio URL immediately. The text for each request must not exceed 280 characters.

  • Asynchronous Synthesis

    Clients send the text through the REST API, with the wait field in the request body set to false. After submitting the request, clients receive the TTS job details, including the job ID. Using the job ID, clients can check the synthesis progress and result. Clients can submit up to 5000 characters for each synthesis request.