Developers can use the Elsa Metered API to create applications that evaluate the accuracy of spoken English in a voice recording. The ELSA API offers an easy-to-use development interface for integrating advanced machine learning technology into applications. The API can analyze unscripted English speech recordings without a previous transcript, making it ideal for analyzing spontaneous speech. The ELSA AI technology evaluates spoken English in five major dimensions of communication: pronunciation, prosody, fluency, grammar, and vocabulary. It then provides feedback to help users enhance their communication skills.
|Plan||Standard||Premium||STANDARD Unscripted||PREMIUM Unscripted|
|Speech type||Scripted speech||Scripted speech||Spontaneous speech||Spontaneous speech|
The ELSA API is deployed in several data-centers to ensure sufficient proximity to your servers and your users. This results in a low latency when sending the speech to be processed and improves your users experience. We are currently deployed in 3 different regions using the Amazon-AWS network. A single API endpoint is used for all regions, where each call is enrouted to the closest datacenter:
Recording Audio requirements
Elsa supports audio in most formats used on the web (e.g. mp3, .wav, .flac, .mp4, .m4a) but we strongly recommend using flac (lossless compression, much better on bandwidth) or wav.
We internally convert all files into single channel (mono), 16KHz sampling rate and 16bits resolution. If you send anything different that this will only delay the processing. Files with less resolution or sampling rate will probably underperform (e.g. 8KHz files are known to eliminate important information about Fricative sounds).
Scripted or unscripted API
We split the ELSA API into two types of calls, depending on the data available for the API to process.
On the one hand, the scripted API considers that the provided text is spoken by the provided audio. This API is usually most useful to evaluate read speech.
On the other hand, the unscripted API considers that only audio is available. This is most useful when analyzing spontaneous speech recordings, but it can also analyze read speech for which you do no have the text available. Note that if the accent of the English speech in the recording is of a very beginner level, the ASR (automatic speech recognition) module is probably going to make some transcription mistakes to be taken into account in the metrics.
There are certain minimum conditions required to obtain results from specific metrics. These metrics take into consideration the audio content:
Warning: If you want to have this results in the unscripted API despite this minimum threshold, you just have to add the flag:
Keep in mind, that this results won't be as accurate as if they passed the minimum threshold.