Websocket call to Scripted Server API
An introduction to the streaming audio client that uses the ELSA scripted API. This implementation facilitates sending an audio file and corresponding sentence to ELSA server. A websocket connection establishes a direct bidirectional connection between the client and the processing speech server. Exchange of information between both is done via messages, which can be either json structures or binary audio packets. You should choose this method instead of a single-file method if:
- You are implementing ELSA API on an online system where assessment delay is an important factor. ELSA API in streaming mode will be faster to respond as the ELSA Servers start processing your query as soon as the first audio comes in.
- You want to use the automatic endpoint feature to finish the assessment when the server detects completion.
API endpoint
All connections to the Elsa scripted API start as a websocket request to wss://api.elsanow.io/api/v2/ws/score_audio
. Once the connection is authorized, the user can start sending audio data in one of the supported formats.
To perform this integration, you can do it in python, javascript and php. However, to this example, we will continue to use python example: https://gitlab.com/elsacorp/aiteam_tools/api-examples/-/blob/main/streaming_unscripted_client.py.
Usage
To run the ELSA Scripted Audio Streaming Client, execute the following command:
python3 streaming_scripted_client.py --token <CLIENT_TOKEN> --sentence "<sentence>" --audio_path <audio_path>
Ensure <CLIENT_TOKEN>
is replaced with your ELSA API token and that the correct paths are provided for the --sentence
and --audio_path
arguments.
Functionality
- Parse command-line arguments, include the token for authentication (--token), the sentence to be transcribed (--sentence), the path to the audio file (--audio_path), and a few optional arguments like the API plan (--api_plan) and a flag to return feedback hints (--return_feedback_hints).
- It establishes a connection to the ELSA server via a WebSocketClient instance. This client maintains a queue (results_queue) to store messages received from the server. The client's opened() method confirms that the WebSocket connection has been established.
INFO:root:Using hardcoded session token
INFO:root:Socket opened! ready for action.
INFO:root:Waiting for websocket connection...
- The audio file is split into chunks, and these chunks are sent to the server through the WebSocket connection. The sending of audio data happens in a separate thread so that it can be stopped or paused independently of the main thread.
INFO:root:Starting a new stream
INFO:root:Sending start_stream request
DEBUG:root:Request body:
- The WebSocketClient instance handles messages received from the ELSA server in its received_message() method. Depending on the type of message received (error, warning, decoding result, etc.), it may store the message in the results queue, start or stop the audio thread, or close the WebSocket connection.
INFO:root:Received start_stream response, starting to send audio.
INFO:root:Stream id:
- When the WebSocket connection is closed (either by the server or by the client), the closed() method is called. If the connection was closed due to an error, this method logs the error. It also puts a special "client_finished" message in the results queue.
- The main thread of the script waits for results from the server by repeatedly fetching messages from the results queue until a decoding result or an error is received, or until a maximum amount of time has passed.
- Finally, the script writes the results it received from the server to a JSON file and exits.
INFO:root:Writing response to ./response.json
This script demonstrates how a client application can send audio data to the ELSA API for transcription, and receive and handle results from the API in real time. The use of threads allows the script to send audio data and handle server messages concurrently, which is essential for real-time communication over WebSockets.
Responses
Response Type | Description |
---|---|
ELSA:error | Error sent from the server |
ELSA:warning | Warning sent from the server |
ELSA:start_stream_result | Stream is open, can start sending audio |
ELSA:decoding_result | Server finished computing and sends back the results |
ELSA:ready | Server is ready to receive new streams |
ELSA:message | Server sends a message to the client |
ELSA:audioACK | Server acknowledges receiving an audio packet |
ELSA:wsConnect | Server sends information on the specific server the client connected to |
ELSA:stopped_listening | Server stopped accepting audio, further audio will be ignored |
Unknown response | Received an unknown response, closing the connection |
No "type" in the response | Received a message without a "type" in the body, closing connection |