Skip to main content

Unscripted API info

Introduction

Execution of the unscripted API via cUrl (batch) or websocket (streaming) returns the same result. Here, we show the returned result from executing the example sentence, together with an explanation of what each field means.

IMPORTANT: Depending on the api_plan you select, the output will vary. Below we show the properties that appear both in the “premium” plan.

Feedback overview

Overall, the feedback is a hierarchical-nested structure with several “utterance”->”word”-> “phoneme” levels. In each level, we indicate parameters computed at that level and a list of items (e.g. an utterance contains a list of words and a word contains a list of phonemes). A special case is word stress, which is added inside each word, alongside the phonemes structure.

A description of each field in the json response can be found below:

{
"speakers":"[{}] list of speakers found in the audio. By default the API will only consider a single speaker."
[
{
"speaker_id": "<string> Unique identifier of the current speaker, defaults to 0.",
"total_time": "<float> Total time in seconds the current speaker has spoken",
"metrics":"{} global metrics for the current speaker"
{
"general_scores":
{
"elsa":
{
"eps_score": "<float> Global ELSA Proficiency Score, in percentage, which is split into the following below:",
"eps_decision": "<float> Global ELSA Proficiency Score, in percentage, which is split into the following below:",
"pronunciation_score": "<float> ELSA pronunciation score, in percentage",
"pronunciation_decision": "<string> Evaluation about the pronunciation score",
"intonation_score": "<float> ELSA intonation score, in percentage",
"intonation_decision": "<string> Evaluation about the intonation score",
"fluency_score": "<float> ELSA fluency score, in percentage",
"fluency_decision": "<string> Evaluation about the fluency score",
"grammar_score": "<float> ELSA grammar score, in percentage",
"grammar_decision": "<string> Evaluation about the grammar score",
"vocabulary_score": "<float> ELSA vocabulary score, in percentage",
"vocabulary_decision": "<string> Evaluation about the vocabulary score",
},
"cefr": "CEFR estimates"
{
"overall_cefr": "<string> Overall CEFR level (This is a categorical list of 6 values)",
"pronunciation_cefr": "<string> CEFR estimate in pronunciation",
"intonation_cefr": "<string> CEFR estimate in intonation",
"fluency_cefr": "<string> CEFR estimate in fluency",
"grammar_cefr": "<string> CEFR estimate in grammar",
"vocabulary_cefr": "<string> CEFR estimate in vocabulary"
},
"other_scores":"Other score estimates that map the EPS score with major scores in the English learning industry"
{
"ielts_score": "<int> Estimate for IELTS score, speaking part",
"toefl_score": "<int> Estimate for the TOEFL iBT score from ETS",
"pte_score": "<int> Estimate for the PTE score From Pearson"
}
},
"other_metrics":"{} Global Metrics obtained for some of the dimensions measured by the API"
{
"pronunciation":
{
"advanced_pronunciation_score":"{} Analysis of how much the user uses advanced pronunciation sounds"
},
"fluency":"{} Fluency-related scores"
{
"words_per_minute": "<float> Average words per minute",
"words_per_minute_min": "<float> Smaller words per minute of the segment (computed per every single sentence for longer sentences, and 2-3 sentences for smaller ones)",
"words_per_minute_max": "<float> Bigger words per minute, computed as above",
"pausing_score": "<float> ELSA pausing score, in percentage",
},
"vocabulary":
{
"total_words_count": "<int> total number of spoken words",
"unique_words_count": "<int> number of unique words",
"uncommon_words_count": "<int> number of uncommon words after subtracting the top 3K most common words in English",
"cefr_distribution": "[{}] Percentage of words in each CEFR level (unique counts, i.e. each word only counts +1)."
[
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
}
]
}
}
},
"feedbacks":"{} global feedback for the current speaker"
{
"show_transcript": "<bool> Flag computed by the ELSA API suggesting whether the transcript is of good enough quality to be shown. When the system is not sure the transcript is correct it will set this flag to false, otherwise it will be true.",
"pronunciation":"{} Pronunciation-related feedback"
{
"top_errors":" [{}] Top 5 pronunciation errors"
[
{
"phoneme": "<string> Phoneme we are giving feedback for",
"count": "<int>",
"errors":"[{}] List of errors found for that phoneme"
[
{
"type": "<string> “insertion”, “deletion”, “substitution”",
"error_phoneme":"<string> only in substitutions, indicates the phoneme the user actually said",
"count": "How many times this error was detected",
"examples":"[{}] shows api of the words including the error"
[
{
"text": "<string>",
"start_index": "<int> global index",
"end_index": "<int> global index"
}
]
}
]
}
]
},
"intonation":"{} intonation-related feedback",
{
"word_prominence_items": "[{}] List of the pronounced words that we low and high",
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"expected_prominence": "<string> Original intonation identified in the transcript",
"original_prominence": "<string> Original intonation identified in the transcript, which is considered correct",
"decision": null
}
]
},
"grammar":"{} Grammar-related feedback"
{
"items":"[{}] List of grammar errors found"
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"original": "<string> Original text identified in the transcript, which is considered incorrect",
"suggestion": "<string> Proposed replacement",
"type":"<string-categorical> Indicates the type of error we are showing in this entry"
},
{
"start_index": 20,
"end_index": 22,
"original": "ear",
"suggestion": "ears"
}
]
},
"vocabulary":"{} Vocabulary-related feedback"
{ "items":" [{}] list of vocabulary recommendations"
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"original": "<string> Original text we are proposing to replace",
"suggestion": "<string> Replacement suggestion"
"type": "<string-categorical> synonym/informal expression/offensive/..."
}
]

"top_cefr_words":"[{}] List with the top 5 words used according to their CEFR level"
[
{
"word": "<string>"
"cefr_level": "<string>"
}
]
}
"pronounciation":"Pronunciation-related feedback"
{
"top_errors":"Top 5 pronunciation errors"
[
{
"phoneme": "<string> Phoneme we are giving feedback for",
"count": "<int>",
"errors":"List of errors found for that phoneme"
[
{
"type": "<string> “insertion”, “deletion”, “substitution”",
"error_phoneme": "<string> only in substitutions, indicates the phoneme the user actually said",
"count": "How many times this error was detected",
"examples":" shows api of the words including the error"
[
{
"text": "<string>",
"start_index": "<int> global index",
"end_index": "<int> global index"
}
]
}
]

}
]
}
},
"utterances":"In depth local analysis per utterance"
[
{
"utterance_id": "<index>",
"start_time": "<float> Utterance start time in seconds",
"end_time": "<float> Utterance end time in seconds",
"start_index": "<int> Character index where the sentence starts (global index)",
"end_index": "<int> Character index where the sentence ends (global index)",
"text": "<string> Transcribed text",
"result":"The rest of properties are the same as returned by the ELSA scripted API v3 in “utterance[0]” as described in this document."
},
]
}
],
"transcript": "Full transcript of the audio. All character indexes in this API refer to the position of characters in this transcript, i.e. at global level.",
"timeline":"Helper structure with overall information of the sentences spoken by each speaker found in the audio. By default the API only processes one single speaker"
[
{
"speaker_id": "<string> unique identifier of the speaker, as used in the “speakers” structure",
"utterance_id": "Utterance positions in the vector of the user utterance",
"start_time": "<float> start time of the sentence",
"end_time": "<float> end time of the sentence",
"start_index": "<int> starting index of the sentence global index)",
"end_index": "<int> ending index of the sentence global index)",
"type": "<categorical-string> type of content, one of: “speech”, “overlapped_speech”, “music”, ..."
},
],
"api_version": "<string> API version of the API (includes the version of the scripted part, at utterance level)",
"api_plan": "<string-categorical> API tier used when processing this audio",
"recording_quality": "<string-categorical> any noise or volume problems spotted with the audio. See below for possible values.",
"assessment_quality": "<string-categorical> informs whether the amount of speech was sufficient to perform all or only some assessments. See below for possible values.",
"total_time": "<float> Total length of the audio in seconds",
"success": "<bool> Either true or false, whether the call had been successful or not",
"message": "<string> (optional) available in case we want to send a message to the user, most important when there are some problems."
}

Field options

Some fields above have categorical output values (they can take multiple values). Here we define them in detail.

assessment_quality

Informs whether the amount of speech was sufficient to perform all or only some assessments.

ValuesDescription
okAll good
too_shortNot enough spoken audio to obtain a good result. We still return some analysis on what we received. (No Fluency or no intonation score)
shortNot enough spoken audio to obtain a good result. We still return some analysis on what we received (some more from the too short case). (No Grammar or no Vocabulary score)
unintUnintelligible speech. Some or all segments. (We got some text from ASR but it was low confident and we filtered as (...)”
no_speechNo speech detected, nothing returned from ASR.

Recording Quality

Any noise or volume problems spotted with the audio. See below for possible values.

ValuesDescription
nullUnable to calculate, this most likely means the audio is extremely short.
okAll good
loudSaturated
quietToo low volume
noisyLow SNR (Signal to Noise Ratio)
mixedMixed issues