Skip to content

STT Streaming

Speech To Text streaming service using websocket endpoint.

Security Requirements

Type In Name Scheme Format Description
httpApiKey header x-api-key - - API Key received from Prosa API Console

Endpoint

1
wss://s-api.prosa.ai/v2/speech/stt

Publish Operation

Server may return one of the following messages

Subscribe Operation

Client may send one of the following messages to the server

ApiKey

If passing the api-key as HTTP header is not feasible, it is instead expected to be sent as the first message.

Payload

Name Type Optional Description
token string true API Key received from Prosa API Console

Example

1
2
3
{
  "token": "string"
}

Configuration

The configuration to run with. This message is sent initially after authentication to configure the transcription process.

Payload

Name Type Optional Description
label string true The label to give to this transcript.
model string false The model to use.
audio true Describes the incoming audio. This is optional as the format of the audio is generally detected automatically.
audio.format string false The audio format.
audio.channels integer true The number of audio channels.
audio.sample_rate integer true The sample rate of the audio.
include_filler boolean true Include filler in transcription result.
include_partial boolean true Whether or not to receive only final transcription or partial transcription as well.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "label": null,
  "model": "stt-general-online",
  "audio": {
    "format": "wav",
    "channels": 1,
    "sample_rate": 16000
  },
  "include_filler": false,
  "include_partial": true
}

AudioData

The audio data to transcribe. The audio data are sent as bytes. The audio header is expected to only be present in the first chunk. An empty byte is expected at the end of audio stream.

Publish Operation

Server may return one of the following messages

TranscriptionStart

Signifies that the transcription is ready to accept Audio Data.

Payload

Name Type Optional Description
type string false created
id string false Id of the transcription. Use this id to refer to this transcription on another operation

Example

1
2
3
4
{
  "type": "created",
  "id": "string"
}

TranscriptionStatus

Status of the ongoing transcription process.

Payload

Name Type Optional Description
type string false status
status string false Status of the transcription progress.

Example

1
2
3
4
{
  "type": "status",
  "status": "created"
}

PartialTranscript

Partial transcript of the ongoing speech.

Payload

Name Type Optional Description
type string false partial
transcript string false The partial transcription.

Example

1
2
3
4
{
  "type": "partial",
  "transcript": "string"
}

FinalTranscript

Final transcript of a speech segment.

Payload

Name Type Optional Description
type string false result
transcript string false The final transcription of a specific segment.
time_start number false Relative timestamp from the start of the audio.
time_end number false Relative timestamp from the start of the audio.

Example

1
2
3
4
5
6
{
  "type": "result",
  "transcript": "string",
  "time_start": 0,
  "time_end": 0
}

Metadata

Metadata of the elapsed transcription process.

Payload

Name Type Optional Description
type string false metadata
duration number false The total duration of the audio.
quota_used integer false The total quota used for this transcription session.
max_reached boolean false Whether or not the process is stopped abruptly because the maximum duration has been reached.
max_duration number false The maximum duration of a streaming that is allowed.

Example

1
2
3
4
5
6
7
{
  "type": "metadata",
  "duration": 0,
  "quota_used": 0,
  "max_reached": true,
  "max_duration": 0
}

QuotaAlert

An alert sent when you have run out of quota in the middle of transcription process. The transcription process is stopped and audio additional audio sent is not processed.

Payload

Name Type Optional Description
type string false quota
active boolean false Whether or not the quota is still active.
timestamp number false The relative timestamp from the start of audio in which the quota ran out.
quota_used integer false The total quota used for this transcription session.

Example

1
2
3
4
5
6
{
  "type": "quota",
  "active": false,
  "timestamp": 0,
  "quota_used": 0
}

Error

Error occurred.

Payload

Name Type Optional Description
type string false error
message string false The message of the error

Example

1
2
3
4
{
  "type": "error",
  "message": "Invalid audio configuration."
}

Websocket Close Codes

The websocket close codes contains information of its cause.

Close Code Description
1000 Success
1006 Uncaught Internal Error
4000 Invalid Auth
4001 Invalid Session Config
4002 Invalid Model
4005 Insufficient Quota
4029 Rate Limited
4500 Internal Error