In a previous article , I described how to utilize speech recognition in Android. This consisted of capturing user speech, processing it, and implementing it.
But what if that process is reversed, and you want to take a text input and output speech? This type of system is, so-called, text-to-speech (TTS). TTS software, in general, creates a computer-generated voice that’s also considered as an assistive technology tool.
This technology has already been adopted widely in real-world applications like Assistants (Google and Siri) for interacting with the user commands. By applying this feature, you can rapidly increase the accessibility of your application for those with visual impairments and reading struggles. For example, this feature could especially help those with learning disabilities or those with difficulty reading large amounts of text due to dyslexia.
In today’s article, we’ll clarify the way to integrate this technology into Android applications using only the core SDK, so your applications can start speaking.
This article presumes that you have some experience in building Android apps in Kotlin.
Project Setup
To implement TTS technology, open up a new Android Studio project—no need for permissions or other library dependencies.
Text-To-Speech (TTS)
To start, we’ll work with a class called TextToSpeech
. We also need the text to be spoken, so this one should be in string format. The first task is to create an instance of this class. Here’s how we initialize a global variable for this instance:
val textToSpeech = TextToSpeech(this, this)
The first “this” in the constructor is the context of the activity I’m in; the second “this” will be the initialization listeners of type TextToSpeech.OnInitListener
, which will tell us about the TextToSpeech
initialization result. For that, we need to let our activity implement the previous interface, and then we override the method called onInit()
. Here’s a code sample:
override fun onInit(status: Int) {
// check the results in status variable.
if (status == TextToSpeech.SUCCESS) {
// setting the language to the default phone language.
val ttsLang = textToSpeech.setLanguage(Locale.getDefault())
// check if the language is supportable.
if (ttsLang == TextToSpeech.LANG_MISSING_DATA || ttsLang == TextToSpeech.LANG_NOT_SUPPORTED) {
Toast.makeText(this, "We can't support your language", Toast.LENGTH_LONG).show()
}
} else {
Toast.makeText(this, "TTS Initialization failed!", Toast.LENGTH_SHORT).show()
}
}
As the code snippet above shows, if the status is TextToSpeech.SUCCESS
, that means we’re ready to move forward. Next, we need to set up the language in which we want to speak—bear in mind that this isn’t always guaranteed, so we need to do an extra check to see if the language we’d like to use is supported and if there are sufficient data packages (voices) available.
At this point, we can say something with the TTS Engine via the speak()
method. This will take 3 parameters: the “something”, which is what we want the phone to say in string format, a queuing strategy (we will talk more about this next), and finally the utteranceid
, which can be used to identify the request to the TTS so we can use them in callbacks later on.
The speak()
method will return the results of queuing the speak operation. Note that this method is asynchronous, and as the docs say The synthesis might not have finished (or even started!) at the time when this method returns. Thus, we can’t rely on the result of this method as a speech state, but we can use callbacks (as shown in the Add Callbacks section below) to detect errors.
Here’s a code snippet illustrates this:
private fun saySomething(something: String, queueMode: Int = TextToSpeech.QUEUE_ADD) {
val speechStatus = textToSpeech.speak(something, queueMode, null, "ID")
if (speechStatus == TextToSpeech.ERROR) {
Toast.makeText(this, "Cant use the Text to speech.", Toast.LENGTH_LONG).show()
}
}
Queuing Strategy
The queueMode
parameter passed with the speak
method is used to handle multiple requests by the TTS. It can be in 2 states: QUEUE_ADD
or QUEUE_FLUSH
.
The QUEUE_ADD
state is a queue mode where the new entry is added at the end of the playback queue, while QUEUE_FLUSH
is also a queue mode where all entries in the playback queue (media to be played and text to be synthesized) are dropped and replaced by the new entry.
For example, let’s say that we call the speak
method 3 times with Strings of “Hello”, “Hi” and “How are you”. With queueMode
being set to QUEUE_ADD
, this means that the TTS Engine will speak the three different strings in order, The QUEUE_FLUSH
mode, on the other hand, will indicate to the TTS Engine to only speak the last String (“How are you”).
Add Callbacks
For advanced flexibility with the TTS Engine, you can add callbacks to understand the state of the TTS Engine. For example, you can find out if the Engine is finished talking or when/if it starts, and so on.
This kind of information can be very useful when you want to do something based on its state. For instance, I built an Android application where I wanted to show an ad banner just after saying an expression. In this case, these callbacks were very helpful in doing this. Here’s some sample code that shows how to accomplish that by implementing an interface called UtteranceProgressListener
via the setOnUtteranceProgressListener()
method. The overridden methods are self-expressive, as shown:
textToSpeech.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
override fun onDone(utteranceId: String?) {
//do whatever you want when TTS finish speaking.
}
override fun onError(utteranceId: String?) {
//do whatever you want if TTS makes an error.
}
override fun onStart(utteranceId: String?) {
//do whatever you want when TTS start speaking.
}
})
As you can see in the methods, the utteranceId
is passed, by which we can identify which speech is being processed. Thus, we can take action upon it.
Stopping the TTS Engine
You can also interrupt the TTS Engine at various points in time. Say the user hit the cancel button or something similar—here, you can stop it by using the stop()
method. This way, we discard the current utterance and those in the queue as well.
Also, don’t forget to release the resources used by the TTS engine when you don’t need them. As a rule of thumb, when the activity destroyed or stopped, we should use the shutdown()
method.
Resources & References
One of the best resources to learn how to use android API’s is the official documentation—check out the TextToSpeech
class for more helpful methods.
Other tutorials are available on TutorialsPoint and JavaPapers, which showcase end-to-end TTS projects with Java code.
I hope you enjoyed reading this article and it contributed to your knowledge and interest in TextToSpeech. Android now talks, and so can your apps. Implementing such a feature can make your app contemporary, straightforward, and user-friendly, since most of the users expect such intelligent behaviors from today’s apps, especially for those with disabilities.