Speech Recognition

Eleveo provides a complex Speech Recognition solution for both small and large organizations that provides for:

Transcription services
Emotion detection
Phrase detection

This feature requires a dedicated license file. Users must be granted the appropriate license–based effective role in order to access the functionality provided. Please review the instructions on Using Eleveo Specific Roles to ensure that the appropriate license–based roles are added to Eleveo-specific roles.

Users must have the following role assigned by an administrator: VIEW_TRANSCRIPTION (to view transcriptions) and SPEEECHREC_ADMIN (to modify Speech Tags and Phrases).

Why use Speech Recognition?

Monitor 100% of calls.
- Access transcriptions for all (100%) of your recorded calls with punctuation and number formatting
- Multilingual support (multiple dialects supported)
- Emotion detection – find dissatisfied customers, upset agents or problematic interactions quickly
- Metadata generated and saved alongside recorded media provides an additional level of analytical value
Allow human quality monitors to concentrate on key issues.
Scan all calls for potential compliance violations, improving the effectiveness of compliance departments.
Quickly and effectively address key business issues in contact centers.
Search and find words or phrases in transcripts directly from the Search bar within the Conversation Explorer.

Supported Languages

Speech Recognition	Dialects supported
English	North America Australia United Kingdom Europe Philippines International
French	Canada France Europe
Spanish	North America Spain Mexico Argentina Columbia Panama
German
Italian
Portuguese	Brazil

Automated language detection - If automated language detection is enabled for your server, the system will automatically detect what language is used in the first twenty seconds of the recording and switch the language processor to the detected language. This means that if multiple languages are used in a conversation, the system will transcribe text according to the language detected at the beginning of the recording. If the speech recognition engine fails to detect the language accurately, it may produce transcriptions for the incorrect language. The system does not automatically detect and switch languages after the first twenty seconds, even if speakers switch between different languages.

Currently, Eleveo supports automated language detection for the following language pairs:

English - Spanish
English - French

Permissions

Speech Recognition

This module is licensed separately from Quality Management. It is necessary to have the required license to use this feature. It is also necessary that you have the required role assigned by your administrator: VIEW_TRANSCRIPTION.

Users without this role will see a message displayed in the Conversation Explorer Details Pane that states: "No acoustic & emotion available" or "No transcription available".

Speech Tags and Phrases

To modify Speech Tags and Phrases it is necessary to have the required role assigned by your administrator: SPEEECHREC_ADMIN.

Emotion Detection

Emotion detection is only available for installations that have a valid Speech Recognition license and for conversations that contain a transcript.

Emotion detection is currently only available for the English Language. We are actively working on this enhancement.

The emotion detection feature combines acoustic features and word sentiment scores of the transcription files to determine emotion. Emotion detection is not influenced by customer-defined speech tags, as the speech tags are applied to the transcription after it has been analyzed for emotion (refer to the dedicated page for information on how to configure Speech Tags and Phrases).

Emotion is displayed in the system in two different ways:

Participant emotion - is displayed in the Conversation Explorer Details pane.
Emotion of individual lines in the transcript – Emotion for each transcribed sentence is displayed alongside the transcript.

Participant emotion

The average emotion of the participant at the end of the conversation is extracted from the transcription and then displayed within the Conversation Explorer Details Pane. The emotion shown in the details pane can be improving, positive, neutral, negative or worsening. This is the overall emotion (of one party) during the last segment they were participating in. E.g. This is the emotion of the speaker when they left the call.

Emotion of individual lines in the transcript

Emotion for each transcribed sentence is displayed alongside the transcript within the Details Pane.

To view Emotion:

Open the Conversation Explorer and select a conversation that contains a transcript.
Expand the Details Pane.
An emoticon will display alongside the transcript.
The emotion is shown for each transcription utterance (sentence). The emotion detection feature combines acoustic features and word sentiment scores to determine the emotion of each individual sentence. The emotion displayed within the transcript can be Mostly Positive, Positive, Neutral, Negative, Mostly Negative.

This is also described in the documentation for the Conversation Explorer.

Phrase Detection

Speech tags and phrases are defined by users (refer to the dedicated page for information on how to configure Speech Tags and Phrases). The system scans new media (transcriptions) and matches the defined speech tags and phrases with words or phrases in the transcript. The detected phrases are then saved to the database and are displayed to users on the Conversation Explorer screen. A more detailed description of the process is described in the section How Does Phrase Detection Work (in Brief) on the page dedicated to Speech Tags and Phrases.

Limitations

The recording server must be set to output WAV files!
Integrations that generate only mono audio recordings, such as MS Teams © Recording, are supported. Two identical audio waveforms are displayed in the Interaction Player, one for each channel. This is expected behavior.
Conference participants are not identified by the speech recognition software. The current implementation can only detect 2 participants in the call.
Individual participants are not recognized in mono recordings. We recommend that you capture stereo audio recordings whenever possible.

Redaction of Sensitive Data

Eleveo takes personal data seriously and supports the redaction of sensitive data from transcriptions. The redaction of credit card data and other sensitive numbers is enabled by default when the solution is first configured.
Any redacted information is shown as #### hashtags in the transcription so that viewers can clearly see that something was hidden.

If enabled, the Transcription engine will redact sensitive numeric values from transcription results. By default, sensitive numbers are any string of numeric digits that do not fit into an allowable pattern.

Patterns that are allowed by default are the following:

Ordinals (1^st, 2^nd, etc.)
Percentages
Clock times (12:57 PM, etc.)
Monetary amounts
Floating point numbers with 4 or fewer digits (12.47 GB, etc.)

Transcription Improvement/Tuning

Administrators of your installation can upload industry-specific terminology to improve the transcription accuracy. Add words or phrases to the Out-of-vocabulary (OOV) dictionary to ensure that brand or product names, or industry-specific terminology is accurately transcribed.

This feature is available for Eleveo V 9.3+.

Limitations

Conference participants are not identified by the speech recognition software. The current implementation can only detect 2 participants in the call.