Unscripted API info
Introduction
Execution of the unscripted API via cUrl (batch) or websocket (streaming) returns the same result. Here, we show the returned result from executing the example sentence, together with an explanation of what each field means.
IMPORTANT: Depending on the api_plan you select, the output will vary. Below, we list the properties available in the 'premium' plan.
Feedback overview
Overall, the feedback is hierarchically-nested structured with multiple levels: 'utterance'->'word'->'phoneme'. At each level, we provide the computed parameters and a list of items (e.g., an utterance contains a list of words, and a word contains a list of phonemes). A special case is word stress, which is included within each word alongside the phoneme structure.
A description of each field in the JSON response is provided below:
{
"speakers":"[{}] list of speakers found in the audio. By default the API will only consider a single speaker."
[
{
"speaker_id": "<string> Unique identifier of the current speaker, defaults to 0.",
"total_time": "<float> Total time in seconds the current speaker has spoken",
"metrics":"{} global metrics for the current speaker"
{
"general_scores":
{
"elsa":
{
"eps_score": "<float> Global ELSA Proficiency Score, in percentage, which is split into the following below:",
"eps_decision": "<float> Global ELSA Proficiency Score, in percentage, which is split into the following below:",
"pronunciation_score": "<float> ELSA pronunciation score, in percentage",
"pronunciation_decision": "<string> Evaluation about the pronunciation score",
"intonation_score": "<float> ELSA intonation score, in percentage",
"intonation_decision": "<string> Evaluation about the intonation score",
"fluency_score": "<float> ELSA fluency score, in percentage",
"fluency_decision": "<string> Evaluation about the fluency score",
"grammar_score": "<float> ELSA grammar score, in percentage",
"grammar_decision": "<string> Evaluation about the grammar score",
"vocabulary_score": "<float> ELSA vocabulary score, in percentage",
"vocabulary_decision": "<string> Evaluation about the vocabulary score",
},
"cefr": "CEFR estimates"
{
"overall_cefr": "<string> Overall CEFR level (This is a categorical list of 6 values)",
"pronunciation_cefr": "<string> CEFR estimate in pronunciation",
"intonation_cefr": "<string> CEFR estimate in intonation",
"fluency_cefr": "<string> CEFR estimate in fluency",
"grammar_cefr": "<string> CEFR estimate in grammar",
"vocabulary_cefr": "<string> CEFR estimate in vocabulary"
},
"other_scores":"Other score estimates that map the EPS score with major scores in the English learning industry"
{
"ielts_score": "<int> Estimate for IELTS score, speaking part",
"toefl_score": "<int> Estimate for the TOEFL iBT score from ETS",
"pte_score": "<int> Estimate for the PTE score From Pearson"
}
},
"other_metrics":"{} Global Metrics obtained for some of the dimensions measured by the API"
{
"pronunciation":
{
"advanced_pronunciation_score":"{} Analysis of how much the user uses advanced pronunciation sounds"
},
"fluency":"{} Fluency-related scores"
{
"words_per_minute": "<float> Average words per minute",
"words_per_minute_min": "<float> Smaller words per minute of the segment (computed per every single sentence for longer sentences, and 2-3 sentences for smaller ones)",
"words_per_minute_max": "<float> Bigger words per minute, computed as above",
"pausing_score": "<float> ELSA pausing score, in percentage",
},
"vocabulary":
{
"total_words_count": "<int> total number of spoken words",
"unique_words_count": "<int> number of unique words",
"uncommon_words_count": "<int> number of uncommon words after subtracting the top 3K most common words in English",
"cefr_distribution": "[{}] Percentage of words in each CEFR level (unique counts, i.e. each word only counts +1)."
[
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
},
{
"cefr_level": "<string>",
"percentage": "<float> Percentage of that cefr level"
}
]
}
}
},
"feedbacks":"{} global feedback for the current speaker"
{
"show_transcript": "<bool> Flag computed by the ELSA API suggesting whether the transcript is of good enough quality to be shown. When the system is not sure the transcript is correct it will set this flag to false, otherwise it will be true.",
"pronunciation":"{} Pronunciation-related feedback"
{
"top_errors":" [{}] Top 5 pronunciation errors"
[
{
"phoneme": "<string> Phoneme we are giving feedback for",
"count": "<int>",
"errors":"[{}] List of errors found for that phoneme"
[
{
"type": "<string> “insertion”, “deletion”, “substitution”",
"error_phoneme":"<string> only in substitutions, indicates the phoneme the user actually said",
"count": "How many times this error was detected",
"examples":"[{}] shows api of the words including the error"
[
{
"text": "<string>",
"start_index": "<int> global index",
"end_index": "<int> global index"
}
]
}
]
}
]
},
"intonation":"{} intonation-related feedback",
{
"word_prominence_items": "[{}] List of the pronounced words that we low and high",
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"expected_prominence": "<string> Original intonation identified in the transcript",
"original_prominence": "<string> Original intonation identified in the transcript, which is considered correct",
"decision": null
}
]
},
"grammar":"{} Grammar-related feedback"
{
"items":"[{}] List of grammar errors found"
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"original": "<string> Original text identified in the transcript, which is considered incorrect",
"suggestion": "<string> Proposed replacement",
"type":"<string-categorical> Indicates the type of error we are showing in this entry"
},
{
"start_index": 20,
"end_index": 22,
"original": "ear",
"suggestion": "ears"
}
]
},
"vocabulary":"{} Vocabulary-related feedback"
{ "items":" [{}] list of vocabulary recommendations"
[
{
"start_index": "<int> Start character (global index)",
"end_index": "<int> End character (global index)",
"original": "<string> Original text we are proposing to replace",
"suggestion": "<string> Replacement suggestion"
"type": "<string-categorical> synonym/informal expression/offensive/..."
}
]
"top_cefr_words":"[{}] List with the top 5 words used according to their CEFR level"
[
{
"word": "<string>"
"cefr_level": "<string>"
}
]
}
"pronounciation":"Pronunciation-related feedback"
{
"top_errors":"Top 5 pronunciation errors"
[
{
"phoneme": "<string> Phoneme we are giving feedback for",
"count": "<int>",
"errors":"List of errors found for that phoneme"
[
{
"type": "<string> “insertion”, “deletion”, “substitution”",
"error_phoneme": "<string> only in substitutions, indicates the phoneme the user actually said",
"count": "How many times this error was detected",
"examples":" shows api of the words including the error"
[
{
"text": "<string>",
"start_index": "<int> global index",
"end_index": "<int> global index"
}
]
}
]
}
]
}
},
"utterances":"In depth local analysis per utterance"
[
{
"utterance_id": "<index>",
"start_time": "<float> Utterance start time in seconds",
"end_time": "<float> Utterance end time in seconds",
"start_index": "<int> Character index where the sentence starts (global index)",
"end_index": "<int> Character index where the sentence ends (global index)",
"text": "<string> Transcribed text",
"result":"The rest of properties are the same as returned by the ELSA scripted API v3 in “utterance[0]” as described in this document."
},
]
}
],
"transcript": "Full transcript of the audio. All character indexes in this API refer to the position of characters in this transcript, i.e. at global level.",
"timeline":"Helper structure with overall information of the sentences spoken by each speaker found in the audio. By default the API only processes one single speaker"
[
{
"speaker_id": "<string> unique identifier of the speaker, as used in the “speakers” structure",
"utterance_id": "Utterance positions in the vector of the user utterance",
"start_time": "<float> start time of the sentence",
"end_time": "<float> end time of the sentence",
"start_index": "<int> starting index of the sentence global index)",
"end_index": "<int> ending index of the sentence global index)",
"type": "<categorical-string> type of content, one of: “speech”, “overlapped_speech”, “music”, ..."
},
],
"api_version": "<string> API version of the API (includes the version of the scripted part, at utterance level)",
"api_plan": "<string-categorical> API tier used when processing this audio",
"recording_quality": "<string-categorical> any noise or volume problems spotted with the audio. See below for possible values.",
"assessment_quality": "<string-categorical> informs whether the amount of speech was sufficient to perform all or only some assessments. See below for possible values.",
"total_time": "<float> Total length of the audio in seconds",
"success": "<bool> Either true or false, whether the call had been successful or not",
"message": "<string> (optional) available in case we want to send a message to the user, most important when there are some problems."
}
Field options
Some of the fields above have categorical output values, meaning they can take on multiple values. These are defined in detail below.
assessment_quality
Indicates whether the amount of speech was sufficient to perform all assessments or only some.
Values | Description |
---|---|
ok | All good |
too_short | Insufficient spoken audio to obtain an accurate result. However, we still provide some analysis based on the available audio. (No fluency or intonation score) |
short | Insufficient spoken audio to achieve a reliable result. However, we still provide partial analysis based on the received audio, more than in the case of extremely short audio. (No grammar or vocabulary score) |
unint | Unintelligible speech. Incomprehensible speech detected in some or all segments. (We received some text from ASR, but it had low confidence, so we filtered it as (...)” |
no_speech | No speech detected, nothing returned from ASR. |
Recording Quality
Any noise or volume problems spotted with the audio. See below for possible values.
Values | Description |
---|---|
null | Unable to calculate, this most likely means the audio is extremely short. |
ok | All good |
loud | Saturated |
quiet | Too low volume |
noisy | Low SNR (Signal to Noise Ratio) |
mixed | Mixed issues |