How to Use SSML Tags in SmartFlows

This guide describes how to use Speech Synthesis Markup Language (SSML) tags within SmartFlows and available tags for each vendor:


 

Overview

SSML helps make your messages more conversational and a little bit less robotic. Control pronunciation, add special emphasis, include breaks, and more by adding simple tags to your message content.

 

SSML tags could be used to:

  • Repeat back a phone number for customer confirmation in a louder volume and slower speed using the prosody tag.

  • Make sure an address is pronounced properly using the say-as tag for the building number and the phoneme tag for an unusual street name.

  • Add pauses between Menu Tree items to break up the caller’s options using the break tag.

Note: Supported SSML tags vary based on the Text-to-Speech vendor your flow uses. See the applicable vendor section below for more details.

 

Adding SSML Tags

SSML tags can be used with any action that includes Text-to-Speech audio capabilities (e.g., Menu Tree, Play Audio, Transfer, etc.).

 

Once you’re ready to configure your Text-to-Speech action:

  1. Click on the action to view the available Inputs in the Configurations panel on the right.

  2. Click Configure Audio.

  3. From the Audio Settings pop-up make sure your Channel ID is selected on the left, and Audio Text (TTS) is selected on the right.

The Audio Settings pop-up window

  1. Select the appropriate section and attempt from the drop-down lists.

  2. From the Vendor list, select your default flow settings or a specific vendor.

    • Remember: available SSML tags vary depending on the vendor you use. See the vendor specific tags below.

    • If you select a specific vendor, you will also be able to pick a voice (if you are using the default flow settings, the voice has already been selected).

  3. In the message textbox add in the content of your message and include any SSML tags you want to use.

Sample text-to-speech input is added to the text box on the right for the Max Attempts audio

  1. Click the blue plus sign to add your message. The green success message appears below the textbox and your message content appears on the left side of the pop-up.

Once the text-to-speech input is added in the text box on the right and the blue plus sign button is clicked, the input appears under the Max Attempts section on the left side of the window

  1. On the left side of the pop-up, click the play button to hear your message and update the message content as needed.

  2. Once you’ve customized each of the available sections, simply exit out of the Audio Settings pop-up.

 

Tips and Tricks

  • We recommend using the same vendor and voice throughout your flow for audio and SSML consistency. Supported tags and how they should be formatted vary across vendors.

  • Add your SSML tags in single angle quotation marks: <tag>.

  • Make sure to include a closing tag where you want the effect to end (e.g., We are open <prosody rate="x-slow"> Monday through Friday, from 8:30 A.M. to 5 P.M.</prosody>...).

  • Remember to include the strength (e.g., “weak”, “strong”, etc.) or length of time (i.e., “s” for seconds or “ms” for milliseconds) in a break tag. If you just include a number, your tag won’t function (e.g., “You’ve reached the maximum number of attempts <break time=”3s”/> Goodbye”).

  • Zero vs. the letter O: Depending on the vendor you select zero might be pronounced the same as the letter O. If you are using Text-To-Speech for something like two-factor authentication or an address, where zero should be pronounced differently than the letter O, we recommend selecting Amazon as your vendor.

  • If a specific word within your message is not automatically pronounced correctly, you may need to spell it out phonetically to get the ideal user experience. Check out your favorite online dictionary for phonetic spellings.

  • Some tags are incompatible and can’t be used together. You may need to experiment with using different tags or moving them to different parts of your message to create the effect you want.

  • If you’re using a variable within a SSML tag you may need to break up your message into multiple pop-up boxes.

  • Once you’ve added your Text-to-Speech message, we recommend using the play the button to test out the audio experience and update as needed. You may need to experiment with longer or shorter break times, or change prosody based on customer feedback or flow analytics (e.g., if the break is too long customers might drop out of your flow early, not realizing more options will follow).

 

AWS SSML Tags

If you select AWS as your Text-to-Speech vendor, the following SSML tags are compatible with SmartFlows:

SSML Tag Description
amazon:effect Apply a whisper effect to a specific word or phrase in your message.
break Add a pause to in your message. Include the duration or strength.
emphasis Change the speed and volume of a specific word or phrase in your message. Most commonly used to increase the volume and slow the speed of important information.
lang

Pronounce a word or phrase in another language. For example, if your shop is called “La Casa de Los Espiritus” you can include a Spanish tag to get the proper pronunciation.

Note: There are multiple tags for some languages to differentiate pronunciation in various countries (e.g., UK vs. American pronunciation) so make sure to choose the appropriate language and country tag.

phoneme Tailor the phonetic pronunciation of a specific word in your message.
prosody Customize the volume, speed, and pitch of a specific word or phrase in your message.
say-as Control how a specific word or phrase should be interpreted. Most commonly used for rendering numbers, such as translating digits to a cardinal number, date, street address, etc.
sub Replace a specific word or phrase in your message with an alias. For example, you could replace an acronym with the complete phrase (e.g., read UID as “unique identification”).
w Specify the pronunciation of a specific word in your message based on the appropriate part of speech. For example, pronouncing read as a verb vs. read as a past participle.

 

Note: Some AWS tags are not compatible with SmartFlows: amazon:domain, amazon:emotion, audio, and voice/polly

For more information, including examples and incompatible tags, checkout the Amazon Alexa SSML Reference page.

 

Google SSML Tags

If you select Google as your Text-to-Speech vendor, the following SSML tags are compatible with SmartFlows:

SSML Tag Description
break Add a pause to in your message. Include the duration or strength.
emphasis Change the speed and volume of a specific word or phrase in your message. Most commonly used to increase the volume and slow the speed of important information.
p , s

Add the paragraph <p> or sentence <s> tag to designate different paragraphs or sentences within a message. This is an optional tool to organize your content.

phoneme Tailor the phonetic pronunciation of a specific word in your message.
prosody Customize the volume, speed, and pitch of a specific word or phrase in your message.
say-as Control how a specific word or phrase should be interpreted. Most commonly used for rendering numbers, such as translating digits to a cardinal number, date, street address, etc.
sub Replace a specific word or phrase in your message with an alias. For example, you could replace an acronym with the complete phrase (e.g., read UID as “unique identification”).

 

Note: Some Google tags are not compatible with SmartFlows: audio, par, seq, mark, and media

For more information checkout the Google SSML Reference page.

 

IBM SSML Tags

If you select IBM as your Text-to-Speech vendor, the following SSML tags are compatible with SmartFlows:

SSML Tag Description
break Add a pause to in your message. Include the duration or strength.
emphasis Change the speed and volume of a specific word or phrase in your message. Most commonly used to increase the volume and slow the speed of important information.
p , s

Add the paragraph <p> or sentence <s> tag to designate different paragraphs or sentences within a message. This is an optional tool to organize your content.

phoneme Tailor the phonetic pronunciation of a specific word in your message.
prosody Customize the volume, speed, and pitch of a specific word or phrase in your message.
say-as Control how a specific word or phrase should be interpreted. Most commonly used for rendering numbers, such as translating digits to a cardinal number, date, street address, etc.
sub Replace a specific word or phrase in your message with an alias. For example, you could replace an acronym with the complete phrase (e.g., read UID as “unique identification”).

 

Note: Some IBM tags are not compatible with SmartFlows: audio, desc, lexicon, mark, meta or metadata, voice

For more information checkout the IBM SSML Elements page.

 

Microsoft SSML Tags

When using SSML tags with Microsoft Azure voices, you must include the speak root element before any additional customization. The speak element must include:

  • The version of the SSML (version)

  • The URI defining the markup elements and attributes (xmlns)

  • The language code or location for your TTS (xml:lang)

For example:

Copy
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"> 
    <voice name="en-US-EmmaNeutral"> Today's date is: <say-as interpret-as="date" format="mdy"> 3/14/2024 </say-as>
    </voice>
</speak>

 

If you select Microsoft as your Text-to-Speech vendor, the following SSML tags are compatible with SmartFlows:

SSML Tag Description
break Add a pause to in your message. Include the strength or duration.
emphasis Change the speed and volume of a specific word or phrase in your message. Most commonly used to increase the volume and slow the speed of important information.
p , s

Add the paragraph <p> or sentence <s> tag to designate different paragraphs or sentences within a message. This is an optional tool to organize your content.

phoneme Tailor the phonetic pronunciation of a specific word in your message.
prosody Customize the volume, speed, and pitch of a specific word or phrase in your message.
say-as Control how a specific word or phrase should be interpreted. Most commonly used for rendering numbers, such as translating digits to a cardinal number, date, street address, etc.
sub Replace a specific word or phrase in your message with an alias. For example, you could replace an acronym with the complete phrase (e.g., read UID as “unique identification”).

For more information check out the Azure SSML Document Structure and Events page.