Speechmatics ASR REST API
The Speechmatics Automatic Speech Recognition REST API is used to submit ASR jobs and receive the results. The supported job type is transcription of audio files.
Version: 2.6.0
Contact information:
support@speechmatics.com
/jobs
POST
Summary:
Create a new job.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
config |
formData |
JSON containing a JobConfig model indicating the type and parameters for the recognition job. |
Yes |
string |
data_file |
formData |
The data file to be processed. Alternatively the data file can be fetched from a url specified in JobConfig . |
No |
file |
Responses
GET
Summary:
List all jobs.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
Responses
/jobs/{jobid}
GET
Summary:
Get job details for a specific job, including progress and any error reports.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
jobid |
path |
ID of the job. |
Yes |
string |
Responses
DELETE
Summary:
Delete a job and remove all associated resources.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
jobid |
path |
ID of the job to delete. |
Yes |
string |
Responses
/jobs/{jobid}/data
GET
Summary:
Get the data file used as input to a job.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
jobid |
path |
ID of the job. |
Yes |
string |
Responses
/jobs/{jobid}/transcript
GET
Summary:
Get the transcript for a transcription job.
Parameters
Name |
Located in |
Description |
Required |
Schema |
Authorization |
header |
Customer API token |
Yes |
string |
jobid |
path |
ID of the job. |
Yes |
string |
format |
query |
The transcription format (by default the json-v2 format is returned). |
No |
string |
Responses
Models
ErrorResponse
Name |
Type |
Description |
Required |
code |
integer |
The HTTP status code. |
Yes |
error |
string |
The error message. |
Yes |
detail |
string |
The details of the error. |
No |
TrackingData
Name |
Type |
Description |
Required |
title |
string |
The title of the job. |
No |
reference |
string |
External system reference. |
No |
tags |
[ string ] |
|
No |
details |
object |
Customer-defined JSON structure. |
No |
DataFetchConfig
Name |
Type |
Description |
Required |
url |
string |
|
Yes |
auth_headers |
[ string ] |
A list of additional headers to be added to the input fetch request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token. |
No |
TranscriptionConfig
Name |
Type |
Description |
Required |
language |
string |
Language model to process the audio input, normally specified as an ISO language code |
Yes |
output_locale |
string |
Language locale to be used when generating the transcription output, normally specified as an ISO language code |
No |
additional_vocab |
[ object ] |
List of custom words or phrases that should be recognized. Alternative pronunciations can be specified to aid recognition. |
No |
punctuation_overrides |
|
Control punctuation settings. |
No |
diarization |
string |
Specify whether speaker or channel labels are added to the transcript. The default is none . - none: no speaker or channel labels are added. - speaker: speaker attribution is performed based on acoustic matching; all input channels are mixed into a single stream for processing. - channel: multiple input channels are processed individually and collated into a single transcript. - speaker_change: the output indicates when the speaker in the audio changes. No speaker attribution is performed. This is a faster method than speaker. The reported speaker changes may not agree with speaker. - channel_and_speaker_change: both channel and speaker_change are switched on. The speaker change is indicated if more than one speaker are recorded in one channel. |
No |
speaker_change_sensitivity |
float |
Ranges between zero and one. Controls how responsive the system is for potential speaker changes. High value indicates high sensitivity. Defaults to 0.4. |
No |
channel_diarization_labels |
[ string ] |
Transcript labels to use when using collating separate input channels. |
No |
NotificationConfig
Name |
Type |
Description |
Required |
url |
string |
The url to which a notification message will be sent upon completion of the job. The job id and status are added as query parameters, and any combination of the job inputs and outputs can be included by listing them in contents . If contents is empty, the body of the request will be empty. If only one item is listed, it will be sent as the body of the request with Content-Type set to an appropriate value such as application/octet-stream or application/json . If multiple items are listed they will be sent as named file attachments using the multipart content type. If contents is not specified, the transcript item will be sent as a file attachment named data_file , for backwards compatibility. If the job was rejected or failed during processing, that will be indicated by the status, and any output items that are not available as a result will be omitted. The body formatting rules will still be followed as if all items were available. The user-agent header is set to Speechmatics-API/2.0 , or Speechmatics API V2 in older API versions. |
Yes |
contents |
[ string ] |
Specifies a list of items to be attached to the notification message. When multiple items are requested, they are included as named file attachments. |
No |
method |
string |
The method to be used with http and https urls. The default is post. |
No |
auth_headers |
[ string ] |
A list of additional headers to be added to the notification request when using http or https. This is intended to support authentication or authorization, for example by supplying an OAuth2 bearer token. |
No |
OutputConfig
Name |
Type |
Description |
Required |
srt_overrides |
object |
Parameters that override default values of srt conversion. max_line_length: sets maximum count of characters per subtitle line including white space. max_lines: sets maximum count of lines in a subtitle section. |
No |
JobConfig
JSON object that contains various groups of job configuration
parameters. Based on the value of type
, a type-specific object
such as transcription_config
is required to be present to
specify all configuration settings or parameters needed to
process the job inputs as expected.
If the results of the job are to be forwarded on completion,
notification_config
can be provided with a list of callbacks
to be made; no assumptions should be made about the order in
which they will occur.
Customer specific job details or metadata can be supplied in
tracking
, and this information will be available where
possible in the job results and in callbacks.
CreateJobResponse
Name |
Type |
Description |
Required |
id |
string |
The unique ID assigned to the job. Keep a record of this for later retrieval of your completed job. |
Yes |
JobDetails
Document describing a job. JobConfig will be present in JobDetails returned for GET jobs/ request in SaaS and in Batch Appliance, but it will not be present in JobDetails returned as item in RetrieveJobsResponse for the Batch Appliance.
Name |
Type |
Description |
Required |
created_at |
dateTime |
The UTC date time the job was created. |
Yes |
data_name |
string |
Name of the data file submitted for job. |
Yes |
duration |
integer |
The file duration (in seconds). May be missing for fetch URL jobs. |
No |
id |
string |
The unique id assigned to the job. |
Yes |
status |
string |
The status of the job. running - The job is actively running. done - The job completed successfully. rejected - The job was accepted at first, but later could not be processed by the transcriber. deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time. |
Yes |
config |
JobConfig |
|
No |
RetrieveJobsResponse
Name |
Type |
Description |
Required |
jobs |
[ JobDetails ] |
|
Yes |
RetrieveJobResponse
DeleteJobResponse
JobInfo
Summary information about an ASR job, to support identification and tracking.
Name |
Type |
Description |
Required |
created_at |
dateTime |
The UTC date time the job was created. |
Yes |
data_name |
string |
Name of data file submitted for job. |
Yes |
duration |
integer |
The data file audio duration (in seconds). |
Yes |
id |
string |
The unique id assigned to the job. |
Yes |
tracking |
TrackingData |
|
No |
Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.
Name |
Type |
Description |
Required |
created_at |
dateTime |
The UTC date time the transcription output was created. |
Yes |
type |
string |
|
Yes |
transcription_config |
TranscriptionConfig |
|
No |
output_config |
OutputConfig |
|
No |
RecognitionDisplay
Name |
Type |
Description |
Required |
direction |
string |
|
Yes |
RecognitionAlternative
List of possible job output item values, ordered by likelihood.
Name |
Type |
Description |
Required |
content |
string |
|
Yes |
confidence |
float |
|
Yes |
language |
string |
|
Yes |
display |
RecognitionDisplay |
|
No |
speaker |
string |
|
No |
tags |
[ string ] |
|
No |
RecognitionResult
An ASR job output item. The primary item types are word
and punctuation
. Other item types may be present, for example to provide semantic information of different forms.
Name |
Type |
Description |
Required |
channel |
string |
|
No |
start_time |
float |
|
Yes |
end_time |
float |
|
Yes |
is_eos |
boolean |
Whether the punctuation mark is an end of sentence character. Only applies to punctuation marks. |
No |
type |
string |
New types of items may appear without being requested; unrecognized item types can be ignored. |
Yes |
alternatives |
[ RecognitionAlternative ] |
|
No |
RetrieveTranscriptResponse