Skip to main content

Websocket call to Scripted Server API

An introduction to the streaming audio client that uses the ELSA scripted API. This implementation facilitates sending an audio file and corresponding sentence to ELSA server. A websocket connection establishes a direct bidirectional connection between the client and the processing speech server. Exchange of information between both is done via messages, which can be either json structures or binary audio packets. You should choose this method instead of a single-file method if:

  • You are implementing ELSA API on an online system where assessment delay is an important factor. ELSA API in streaming mode will be faster to respond as the ELSA Servers start processing your query as soon as the first audio comes in.
  • You want to use the automatic endpoint feature to finish the assessment when the server detects completion.

API endpoint

All connections to the Elsa scripted API start as a websocket request to wss://api.elsanow.io/api/v2/ws/score_audio. Once the connection is authorized, the user can start sending audio data in one of the supported formats. To perform this integration, you can do it in python, javascript and php. However, to this example, we will continue to use python example: https://gitlab.com/elsacorp/aiteam_tools/api-examples/-/blob/main/streaming_unscripted_client.py.

Usage

To run the ELSA Scripted Audio Streaming Client, execute the following command:

python3 streaming_scripted_client.py --token <CLIENT_TOKEN> --sentence "<sentence>" --audio_path <audio_path>

Ensure <CLIENT_TOKEN> is replaced with your ELSA API token and that the correct paths are provided for the --sentence and --audio_path arguments.

Functionality

  1. Parse command-line arguments, include the token for authentication (--token), the sentence to be transcribed (--sentence), the path to the audio file (--audio_path), and a few optional arguments like the API plan (--api_plan) and a flag to return feedback hints (--return_feedback_hints).
  2. It establishes a connection to the ELSA server via a WebSocketClient instance. This client maintains a queue (results_queue) to store messages received from the server. The client's opened() method confirms that the WebSocket connection has been established.
INFO:root:Using hardcoded session token
INFO:root:Socket opened! ready for action.
INFO:root:Waiting for websocket connection...

  1. The audio file is split into chunks, and these chunks are sent to the server through the WebSocket connection. The sending of audio data happens in a separate thread so that it can be stopped or paused independently of the main thread.
INFO:root:Starting a new stream
INFO:root:Sending start_stream request
DEBUG:root:Request body:
  1. The WebSocketClient instance handles messages received from the ELSA server in its received_message() method. Depending on the type of message received (error, warning, decoding result, etc.), it may store the message in the results queue, start or stop the audio thread, or close the WebSocket connection.
INFO:root:Received start_stream response, starting to send audio.
INFO:root:Stream id:
  1. When the WebSocket connection is closed (either by the server or by the client), the closed() method is called. If the connection was closed due to an error, this method logs the error. It also puts a special "client_finished" message in the results queue.
  2. The main thread of the script waits for results from the server by repeatedly fetching messages from the results queue until a decoding result or an error is received, or until a maximum amount of time has passed.
  3. Finally, the script writes the results it received from the server to a JSON file and exits.
INFO:root:Writing response to ./response.json

This script demonstrates how a client application can send audio data to the ELSA API for transcription, and receive and handle results from the API in real time. The use of threads allows the script to send audio data and handle server messages concurrently, which is essential for real-time communication over WebSockets.

Responses

Response TypeDescription
ELSA:errorError sent from the server
ELSA:warningWarning sent from the server
ELSA:start_stream_resultStream is open, can start sending audio
ELSA:decoding_resultServer finished computing and sends back the results
ELSA:readyServer is ready to receive new streams
ELSA:messageServer sends a message to the client
ELSA:audioACKServer acknowledges receiving an audio packet
ELSA:wsConnectServer sends information on the specific server the client connected to
ELSA:stopped_listeningServer stopped accepting audio, further audio will be ignored
Unknown responseReceived an unknown response, closing the connection
No "type" in the responseReceived a message without a "type" in the body, closing connection