Android devices are ubiquitous and contain a lot of technologies that may help to build a rich experience for your users, one of these technologies is Speech Recognition, a lot of applications work with the user speech to provide new features like responding to commands, translating a phrase from language to language instead of typing it, and an innumerable list of other things. In this article, we will shed light on this technology and try to clarify how you can integrate it into your android applications to provide a great experience for your users.
This article assumes that you have some experience building Android apps with Kotlin Programming Language.
There are two main methods you can implement Speech To Text (STT) in Android Applications, the first one is throughout Google Dialog arising in each time you want to take user speech, whereas the second is without this dialog but it requires more code to accomplish the speech recognition.
Project Setup
The first thing to do is to add the required permissions in the AndroidManifest file in order to access this functionality, these permissions are the RECORD_AUDIO and the INTERNET, RECORD_AUDIO permission is a critical one, which means that you will need to ask for it in the runtime (Android Marshmallow API 23 and Higher) but just for the No Google Dialog method since Google Dialog method will handle it for you.
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
Google Dialog
To adopt this type of STT the first thing to do is to check whether you can use Speech Recognition in the current phone or not, we can verify it with these lines of codes:
if (!SpeechRecognizer.isRecognitionAvailable(this)) {
Toast.makeText(
this,
R.string.no_recognition_available,
Toast.LENGTH_LONG
).show()
}
Now we can start the speech recognition, we will launch an intent with some parameters, this one will be received and processed by the SpeechRecongnizer and its results (The user speech) will be provided in the onActvitiyResults overridden method, here is the way to launch the intent:
private fun askSpeechInput() {
val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE,Locale.getDefault())
intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Hi speak something");
startActivityForResult(intent, REQ_CODE_SPEECH_INPUT);
}
As I said, the results will be provided in onActivityResult:
override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
super.onActivityResult(requestCode, resultCode, data)
if (requestCode == REQUEST_SPEECH_RECOGNIZER && resultCode == Activity.RESULT_OK) {
val results = data?.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS)
Log.d("TAG-R", results?.toString())
}
}
The output is an array of Strings, which is the sentences that the user may say, for example, “Hello world” could be interpreted in different ways like [“hello world” and “hello word”], for that the resulting array will hold the two previous interpretations and maybe others as well, you can limit the size of the result with a parameter in the intent named EXTRA_MAX_RESULTS. It’s pretty easy to use Google Dialog for getting user speech, but sometimes we want to customize our interface to not display this dialog, to achieve this we have to resort to the following way.
Without Dialog
In this part, we will do the same thing but without Google dialog, to achieve this we need first to create an object called SpeechRecognizer:
val speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this
After that, we will need to set up callbacks for this SpeechRecognizer object to tell us about its state, for example when the recognition starts, when it ends, the results and so on. I will let my activity implements the RecognitionListener with its callbacks methods, then I will set the SpeechRecognizer RecognitionListener to this activity which implements these callbacks:
speechRecognizer.setRecognitionListener(this)
And here is the overridden methods:
override fun onReadyForSpeech(params: Bundle?) {
Log.d(TAG, "onReadyForSpeech")
}
override fun onRmsChanged(rmsdB: Float) {
Log.d(TAG, "onRmsChanged")
}
override fun onBufferReceived(buffer: ByteArray?) {
Log.d(TAG, "onBufferReceived")
}
override fun onPartialResults(partialResults: Bundle?) {
Log.d(TAG, "onPartialResults")
}
override fun onEvent(eventType: Int, params: Bundle?) {
Log.d(TAG, "onEvent $eventType")
}
override fun onBeginningOfSpeech() {
Log.d(TAG, "onBeginningOfSpeech")
}
override fun onEndOfSpeech() {
Log.d(TAG, "onEndOfSpeech")
}
override fun onError(error: Int) {
Log.d(TAG, "onError $error")
}
override fun onResults(results: Bundle?) {
Log.d(TAG, "onReadyForSpeech")
val result = results?.getStringArrayList("results_recognition")
}
And thus you can do the same thing you have done with google dialog, but now you get your results from onResults, the names of these callbacks are self-expressive except onRMSChanged, which tell you the value of the sounds recording at the instance in float representation.
This leads us to the final step, which is to turn on the recognizer so as to recognize and receive the user’s speech, to do that we need to call the method startRecognize(), whereas to stop it we call the method stopRecognize(), in fact, it will stop immediately, but in some cases say that you wanted it to stop when it reached a certain condition, you have the way to do it.
Optional View
When the SpeechRecongizer starts recognizing the user’s speech, it will not be visible to the user and for that we have the onRMSChanged which you can use to make some animation or something similar and I found a View on github called SpeechRecognitionView that does just that like the Google Assistant, you can customize the bars’s color, their lengths, the number and many more, it is worth checking out.
You can also check the official documentation of SpeechRecognizer for further detail.
With that, we have reached the end of this article, I hope it got you interested in speech recognition even for a little, in fact, why to stop here, you can implement it in a more complex system using machine learning techniques, Natural Language Processing, and other more sophisticated methods.