Overview

What's New

The following features are now fully supported using the V2 API

  • SubRip (srt) subtitle format
  • Custom Dictionary
  • Channel Diarization

What's Changed

For the English language pack only, a new tag, [profanity] has been added to a list of offensive words in the JSON-v2 output only. Customers may use this tag to carry out post-processing, including filtering, obsfuscation, and sentiment analysis

API Endpoint

You should use either of these hostnames as per your contract:

  • EU region: asr.api.speechmatics.com
  • US region: us.asr.api.speechmatics.com
  • Trial: trial.asr.api.speechmatics.com

If you want to use a different region please contact sales@speechmatics.com.

Authorization headers

Access to the API requires use of an authorization token ('auth token'). In the V1 API your auth token is passed as a query string parameter on the URI. In the V2 API it is done using an Authorization header, which is the recommended OAuth2 approach.

[info] Auth tokens

Currently there is no way to generate an auth token; if you require a new auth token please contact support@speechmatics.com. In the future we will provide the ability to generate new tokens for the V2 API.

User endpoint

There is no longer a /user endpoint, instead jobs are referenced using a simpler /jobs endpoint, with user access being controlled using the new authentication header. V2 API calls use a /v2 path in the URL. For example, requests to submit or refer to jobs now look like this:

https://asr.api.speechmatics.com/v2/jobs/

A separate authentication service will be added in future that will provide the equivalent capabilities of /user.

Status endpoint

The /status endpoint has been removed.

Speaker diarization

Speaker diarization is now off by default.

Form fields for configuration

A JSON configuration object ('config JSON') replaces the form fields that were previously used for configuration of a job. The audio file name is still specified as a form field, but all other configuration is passed in the config JSON.

Job IDs are strings

We now use a random string job ID value to refer to jobs, rather than an incrementing integer. The Job IDs that you see will look like this: yjbmf9kqub.

Legacy JSON output format dropped

A richer JSON format (json-v2) is now used which provides support for new features. Plain text output (txt) is still available. In the JSON transcript output you will see the following:

"format": "2.4"

ISO 8601 timestamps

Timestamps are now represented in ISO 8601 format, for example: 2018-10-02T13:10:25Z. Coordinated Universal Time (UTC) is used (indicated by the Z suffix).

Metadata

The meta form parameter is replaced with the tracking element in config JSON. This supports a title, list of tags and a customer-defined JSON object. You can use this information to track jobs through your workflow.

Egress IP adresses (for whitelisting)

You may want to whitelist Speechmatics SaaS for the notification callback service to prevent misuse of your endpoints.

Currently, callbacks can come from one of the following addresses, depending on which API endpoint was used.

EU region

40.74.41.91
52.236.157.154
40.74.37.0
52.142.116.223
52.155.88.26
52.142.90.149

US region

52.149.21.32
52.149.21.10
52.137.102.83
40.64.107.92
40.64.107.99

Trial

52.236.149.196

Supported Languages

The Speechmatics Cloud Offering supports the following file formats

  • Arabic (ar)
  • Bulgarian (bg)
  • Catalan (ca)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • Finnish (fi)
  • French (fr)
  • German (de)
  • Greek (el)
  • Hindi (hi)
  • Hungarian (hu)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Latvian (lv)
  • Lithuanian (lt)
  • Malay (ms)
  • Mandarin (cmn)
  • Norwegian (no)
  • Polish (pl)
  • Portuguese (pt)
  • Romanian (ro)
  • Russian (ru)
  • Slovakian (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swedish (sv)
  • Turkish (tr)

Please also note any languages outside this list are not explicitly supported. Only one language can be processed within each request. Each language above also has a two-letter ISO639-1 code that must be provided for any transcription request

Supported file types

The Speechmatics cloud offering also supports the following file types for transcription

  • aac
  • amr
  • avi
  • caf
  • flac
  • flv
  • m4a
  • m4v
  • mkv
  • mov
  • mp3
  • mp4
  • mpeg
  • mpg
  • ogg
  • wav
  • wma
  • wmv

The list above is exhaustive - any file format outside the list above is explicitly not supported

Current Limitations

Email Notifications

There is currently no support for email notification of job completion using the V2 API. However there is full support for notifications using webhooks.

Alignment

Alignment jobs are not supported by the V2 API. If you want to submit alignment jobs then you should continue to do so using the V1 API.

Rate Limiting

Unless agreed otherwise with Speechmatics, the following behaviour will be considered acceptable use of the Cloud Services ASR.

Speechmatics reserve the right to change the rate limits at any time in order to ensure continuity of service for all customers of the Cloud.

  • The Customer shall limit the rate of submission of files to a maximum of 2 jobs per second with a maximum of 100 jobs in progress at any one time.
  • The Customer shall limit the rate of polling for the status of submitted jobs to a maximum of 20 queries per second (across all jobs). If for your use case you believe you need increased limits please contact support@speechmatics.com

results matching ""

    No results matching ""