Before using the Rest API, obtain an API key from Prosa Console.
Synchronous Request
Synchronous Speech-to-Text API request consist of a speech recognition configuration as well as audio data.
The audio in each synchronous request is limited to 60 seconds. The wait parameter is set to true
to indicate that it is a synchronous request.
importbase64importtimefromtypingimportOptionalimportrequestsurl="https://api.prosa.ai/v2/speech/stt"api_key="..."defmain():filename="audio_file.mp3"result=stt(filename)print(result)defstt(filename:str)->dict:job=submit_stt_request(filename)ifjob["status"]=="complete":returnjob["result"]# Job was not completed within the timeframedefsubmit_stt_request(filename:str)->dict:withopen(filename,"rb")asf:b64audio_data=base64.b64encode(f.read()).decode("utf-8")payload={"config":{"model":"stt-general","wait":True# Blocks the request until the execution is finished},"request":{"data":b64audio_data}}response=requests.post(url,json=payload,headers={"x-api-key":api_key})returnresponse.json()if__name__=='__main__':main()
consthttps=require('https');constfs=require('fs');// Setupconsturl='https://api.prosa.ai/v2/speech/stt';constapiKey='...';(async()=>{constfilename='audio_file.wav';letres=awaitstt(filename);console.log(res)})();asyncfunctionstt(filename){letjob=awaitsubmitSttRequest(filename);if(job["status"]==="complete"){returnjob["result"]["data"];}// Job was not completed within the timeframe}asyncfunctionsubmitSttRequest(filename){constaudioData=fs.readFileSync(filename)constb64audioData=audioData.toString('base64');constpayload={"config":{"model":"stt-general","wait":true// Blocks the request until the execution is finished},"request":{"data":b64audioData}}returnawaitrequest(url,"POST",{json:payload,headers:{"x-api-key":apiKey}});}functionrequest(url,method,{headers=null,json=null}){// Simple promise wrapper for built-in https modulereturnnewPromise((resolve,reject)=>{letreq=https.request(url,{method:method,headers:{"Accept":"application/json","Content-Type":"application/json; charset=UTF-8",...headers}},(res)=>{if(res.statusCode===200){letdata=""res.on('data',(chunk)=>{data+=chunk;});res.on('end',()=>{constresponse=JSON.parse(data);resolve(response);});}else{reject(res.statusCode);}})req.on('error',reject);if(json!=null){req.write(JSON.stringify(json));}req.end();})}
Note
The Node.js example contains a simple promise wrapper for built-in https module.
Warning
If the job could not be completed within a specified timeframe, it is treated as an Asynchronous Request instead. See Retrieving Result on how to retrieve the result of asynchronous requests.
Important
Requests are limited to 10 MB for each request. If you need to transcribe larger audio, consider using external storage. See Alternative Audio Source
Configure Request
Configure the model to use. In this example, the model being used is stt-general.
1234567
defsubmit_stt_request(filename:str)->dict:payload={"config":{"model":"stt-general","wait":True# Blocks the request until the execution is finished}}
123456789
asyncfunctionsubmitSttRequest(filename){constpayload={"config":{"model":"stt-general","wait":true// Blocks the request until the execution is finished}}}
Sending audio data
Read audio data from any source. In this example, the audio is read from the filesystem. The audio is then encoded as
base64 string as part of the request payload.
1 2 3 4 5 6 7 8 910111213
defsubmit_stt_request(filename:str)->dict:withopen(filename,"rb")asf:b64audio_data=base64.b64encode(f.read()).decode("utf-8")payload={"config":{"model":"stt-general","wait":True# Blocks the request until the execution is finished},"request":{"data":b64audio_data}}
1 2 3 4 5 6 7 8 910111213141516
consturl='https://api.prosa.ai/v2/speech/stt';asyncfunctionsubmitSttRequest(filename){constaudioData=fs.readFileSync(filename)constb64audioData=audioData.toString('base64');constpayload={"config":{"model":"stt-general","wait":true// Blocks the request until the execution is finished},"request":{"data":b64audioData}}}
Sending the request
Authenticate the request by including API Key in the HTTP request header.
1 2 3 4 5 6 7 8 9101112131415161718192021
url="https://api.prosa.ai/v2/speech/stt"defsubmit_stt_request(filename:str)->dict:withopen(filename,"rb")asf:b64audio_data=base64.b64encode(f.read()).decode("utf-8")payload={"config":{"model":"stt-general","wait":True# Blocks the request until the execution is finished},"request":{"data":b64audio_data}}response=requests.post(url,json=payload,headers={"x-api-key":api_key})returnresponse.json()
1 2 3 4 5 6 7 8 91011121314151617181920212223
consturl='https://api.prosa.ai/v2/speech/stt';asyncfunctionsubmitSttRequest(filename){constaudioData=fs.readFileSync(filename)constb64audioData=audioData.toString('base64');constpayload={"config":{"model":"stt-general","wait":true// Blocks the request until the execution is finished},"request":{"data":b64audioData}}returnawaitrequest(url,"POST",{json:payload,headers:{"x-api-key":apiKey}});}
For synchronous requests, the transcribed text is returned directly under the object result->data as base64-encoded data.
If the job could not be completed within a specified timeframe, it is treated as an Asynchronous Request instead.
In that case, you need to poll and retrieve the result using job_id. See Retrieving Result on
how to retrieve the result of asynchronous requests.
12345678
defstt(filename:str)->dict:job=submit_stt_request(filename)ifjob["status"]=="complete":returnjob["result"]["data"]# Job was not completed within the timeframejob_id=job["job_id"]# Retrieve with job_id instead
123456789
asyncfunctionstt(filename){letjob=awaitsubmitSttRequest(filename);if(job["status"]==="complete"){returnjob["result"]["data"];}// Job was not completed within the timeframeletjobId=job["job_id"]// Retrieve with job_id instead}
Info
See AsrResponse for more information regading the response.
Asynchronous Request
Asynchronous Speech-to-Text API request is fairly similar to
synchronous Speech-to-Text API request. However, instead of immediately returning the result,
the request will initiate a Long Running Operation and return a response without result.
Each asynchronous requests can process up to 4 hours of audio data.
Here are some example codes to help you get started quickly.
importbase64importtimefromtypingimportOptionalimportrequestsurl="https://api.prosa.ai/v2/speech/stt"api_key="..."defmain():filename="audio_file.mp3"job=submit_stt_request(filename)job_id=job["job_id"]poll_interval=5.0whileTrue:result=query_stt_result(job_id)ifresultisnotNone:print(result)breaktime.sleep(poll_interval)defsubmit_stt_request(filename:str)->dict:withopen(filename,"rb")asf:b64audio_data=base64.b64encode(f.read()).decode("utf-8")payload={"config":{"model":"stt-general","wait":False# Do not wait for the request to complete},"request":{"data":b64audio_data}}response=requests.post(url,json=payload,headers={"x-api-key":api_key})returnresponse.json()defquery_stt_result(job_id:str)->Optional[dict]:response=requests.get(url+"/"+job_id,headers={"x-api-key":api_key})ifresponse.status_code==200:job=response.json()status=job["status"]ifstatus=="complete":result=job["result"]["data"]returnresultreturnNoneif__name__=='__main__':main()
consthttps=require('https');constfs=require('fs');// Setupconsturl='https://api.prosa.ai/v2/speech/stt';constapiKey='...';(async()=>{constfilename='audio_file.wav';letjob=awaitsubmitSttRequest(filename);constjobId=job["job_id"];constpollInterval=5.0*1000;while(true){letresult=awaitquerySttResult(jobId);if(result!=null){console.log(result);break;}awaitnewPromise((resolve)=>{setTimeout(resolve,pollInterval);});}})();asyncfunctionsubmitSttRequest(filename){constaudioData=fs.readFileSync(filename)constb64audioData=audioData.toString('base64');constpayload={"config":{"model":"stt-general","wait":false// Do not wait for the request to complete},"request":{"data":b64audioData}}returnawaitrequest(url,"POST",{json:payload,headers:{"x-api-key":apiKey}});}asyncfunctionquerySttResult(jobId){letres=awaitrequest(url+"/"+jobId,"GET",{headers:{"x-api-key":apiKey}});if(res["status"]==="complete"){returnres["result"]["data"]}returnnull;}functionrequest(url,method,{headers=null,json=null}){returnnewPromise((resolve,reject)=>{letreq=https.request(url,{method:method,headers:{"Accept":"application/json","Content-Type":"application/json; charset=UTF-8",...headers}},(res)=>{if(res.statusCode===200){letdata=""res.on('data',(chunk)=>{data+=chunk;});res.on('end',()=>{constresponse=JSON.parse(data);resolve(response);});}else{reject(res.statusCode);}})req.on('error',reject);if(json!=null){req.write(JSON.stringify(json));}req.end();})}
Note
The Node.js example contains a simple promise wrapper for built-in https module.
Requests are limited to 10 MB for each request. If you need to transcribe larger audio, consider using external storage. See Alternative Audio Source
Submitting request
The request is fairly similar to synchronous request except the wait parameter is set to false to
indicate that this is an asynchronous request.
1 2 3 4 5 6 7 8 910111213141516171819202122
url="https://api.prosa.ai/v2/speech/stt"defsubmit_stt_request(filename:str)->dict:withopen(filename,"rb")asf:b64audio_data=base64.b64encode(f.read()).decode("utf-8")payload={"config":{"model":"stt-general","wait":False# Do not wait for the request to complete},"request":{"data":b64audio_data}}response=requests.post(url,json=payload,headers={"x-api-key":api_key})returnresponse.json()
Note
Note that the value returned is an AsrResponse object which job_id property will be used to retrieve the result.
1 2 3 4 5 6 7 8 91011121314151617181920212223
consturl='https://api.prosa.ai/v2/speech/stt';asyncfunctionsubmitSttRequest(filename){constaudioData=fs.readFileSync(filename)constb64audioData=audioData.toString('base64');constpayload={"config":{"model":"stt-general","wait":false// Do not wait for the request to complete},"request":{"data":b64audioData}}returnawaitrequest(url,"POST",{json:payload,headers:{"x-api-key":apiKey}});}
Retrieving result
Using the job_id from AsrResponse object we previously received when submitting requests,
we can retrieve the status and the result of the request. The status describes the progress of the STT request.
We check to see if the status is compelete before returning the result.