Overview

Developers can use the Elsa Metered API to create applications that evaluate the accuracy of spoken English in a voice recording. The ELSA API offers an easy-to-use development interface for integrating advanced machine learning technology into applications. The API can analyze unscripted English speech recordings without a previous transcript, making it ideal for analyzing spontaneous speech. The ELSA AI technology evaluates spoken English in five major dimensions of communication: pronunciation, prosody, fluency, grammar, and vocabulary. It then provides feedback to help users enhance their communication skills.

API Plans

Plan	Standard	Premium	STANDARD Unscripted	PREMIUM Unscripted
Speech type	Scripted speech	Scripted speech	Spontaneous speech	Spontaneous speech
Speaking activities	Word pronunciation Sentence pronunciation Multiple choice pronunciation	all STANDARD activities Fluency assessment	Spontaneous speech assessment	Spontaneous speech assessment Fluency assessment IELTS assessment

API Regions

The ELSA API is deployed in several data-centers to ensure sufficient proximity to your servers and your users. This results in a low latency when sending the speech to be processed and improves your users experience. We are currently deployed in 3 different regions using the Amazon-AWS network. A single API endpoint is used for all regions, where each call is routed to the closest datacenter:

Regions	Data Center
South-east Asia	Singapore
Europe	Ireland
Americas	East-Coast USA

Code Samples

We will soon provide client code samples in Python, PhP and Javascript.

Recording Audio requirements

Elsa supports audio in most formats used on the web (e.g. mp3, .wav, .flac, .mp4, .m4a) but we strongly recommend using flac (lossless compression, much better on bandwidth) or wav.

We internally convert all files into single channel (mono), 16KHz sampling rate and 16bits resolution. Our standard procedure involves internally converting all files to a single-channel (mono) format, with a sampling rate of 16KHz and a resolution of 16 bits. Any deviation from these specifications may result in processing delays.

Scripted or unscripted API

We split the ELSA API into two types of calls, depending on the data available for the API to process.

On the one hand, the scripted API interprets the provided text as the expected speech in the provided audio. This API is usually most useful to evaluate read speech.
On the other hand, the unscripted API considers that only audio is available. This is most useful when analyzing spontaneous speech recordings, but it can also analyze read speech for which you do not have the text available. Please be aware that if the English speech in the recording has a beginner accent, the ASR (automatic speech recognition) module may produce some transcription errors, which should be considered when evaluating the metrics.

API limitations

There are certain minimum conditions required to obtain results from specific metrics. These metrics take into consideration the audio content:

Metric	Minimum words
Grammar score	50
Vocabulary score	75

Warning: If you wish to get these results using the unscripted API, even below the minimum threshold, simply include the flag:

-F force_grammar_vocab=True

Please note that these results may not be as accurate as those that meet the minimum threshold.

Maximum File Upload Size

The maximum file size allowed for upload is 100MB. If you need to upload files larger than this limit, please contact our support team for assistance.

Overview​

API Plans​

API Regions​

Code Samples​

Recording Audio requirements​

Scripted or unscripted API​

API limitations​

Maximum File Upload Size​