Google speech to text online demo

9/22/2023

If you're interested in speech recognition, Glen Shires had a great writeup a while back on the voice recognition feature, " Voice Driven Web Apps: Introduction to the Web Speech API". The post briefly covers the latter, as the API recently landed in Chrome 33 (mobile and desktop). We programmed our application to be available in English, German and Spanish, for instance, but the list could be extended with as many languages as you want to support.The Web Speech API adds voice recognition (speech to text) and speech synthesis (text to speech) to JavaScript. We first mark both with “_“ in front of them to be stored in temporary variables in the concept, “$1“ and “$2“ respectively, and next we store them in variables as we have done in previous examples by using the “$“ symbol before the name we gave them in the code.

For the target language, we choose from the list of available languages. In the topic file, we signalize we want to recognize free speech by using the symbol “*“ as a wildcard for the part of the text that corresponds to what needs to be translated. We will make use of the free speech recognition function to be able to translate any text. It takes care of detecting speech in progress and also the end of it and doing the speech-to-text conversion for us, thus providing the heard text directly to the Chatbots that are running and listening to input, as explained in the introduction article of this series. Replacing this whole system would be possible although arduous. If for the development of an application restrictions would apply, such that we are not allowed to send speech excerpts to this cloud, the alternatives would be either working with the mentioned offline variant, which limits the recognition to predefined sentences, or replacing Pepper’s standard speech recognition with an own. This cloud-based speech recognition is not needed where we only translate and respond to words and sentences we know in advance and can list and hardcode in the dialog topic file written in QiChat language. While the translation will be happening completely offline, to be able to recognize free speech, i.e., any word or phrase, without needing it to be predefined, the standard remote speech recognition engine from Pepper, based on Nuance’s technology, is needed. When the user asks any of the variants of the question “how do you say in German/Spanish/…?“ defined in the dialog topic file, a bookmark is reached, through which the activity gets informed and after what it will call the method in the fragment. The difference is that after the initialization, there is nothing we want to be updated on the screen or keep running in the background except for the speech recognition engine, but rather the robot simply waits to be asked to translate something and all actions will be triggered from our dialog, the QiChat topic file. We have a fragment, a ViewModel, and a TextTranslator, where we interact with the API. The structure of this demo is very similar to those we had in our Object Detection Demo and our Text Recognition Demo. Here you can find the full code of the application we’re building throughout this series. These requirements include guidelines on how the app must handle layout, Google attribution, and branding. Usage guidelinesīefore using this Google product in your application, make sure to refer first to the Guidelines page for important guidelines and restrictions on the usage of this API, as it must comply with the Google Cloud Translation API attribution requirements. Google warns that this on-device translation is intended for casual and simple translations only as it does not offer the same quality as the Cloud Translation API, i.e., it should not be used for translating long texts, but that is not a problem because, for our demo, in which the translations are short and not too complicated, this is more than enough and the results are satisfying. The translation will be powered by the ML Kit’s on-device Translation API, which makes use of the same models used by the Google Translate app’s offline mode. Pepper will respond, uttering the translation in the target language by means of the TextToSpeech android library. With this demo, you can ask Pepper to translate a word or a sentence between any pair of languages of those available in your robot.

0 Comments

Google speech to text online demo

Leave a Reply.

Author

Archives

Categories