|
Post by egghead on Aug 28, 2020 13:41:26 GMT
Requirement is to read out text written to an NFC tag. While researching, I found this : ruby-lang.co/text-to-speech/However, I'm not sure any of the two methods would work in the context of Rho or even whether there are better ways. But it's preferable to not depend on Google TTS service, but to make it work even without internet (i.e. should work even in offline mode). Any pointers/suggestions? Thanks
|
|
|
Post by Alex Epifanov on Aug 28, 2020 14:03:15 GMT
I guess the best way is not to get tied to Ruby per se. You can choose any TTS engine, crossplatform or native for target platform ( for instance Android TTS API ), and then provide a Rhodes API over it with CommonAPI and native extensions mechanism.
|
|
|
Post by Alex Epifanov on Aug 28, 2020 14:05:04 GMT
BTW espeak looks ok at the first glance
|
|
|
Post by egghead on Aug 29, 2020 12:33:11 GMT
BTW espeak looks ok at the first glance Looks like this will work only in desktops as espeak and lame are required to be installed. Whereas espeak is also available for Android, lame isn't available in PlayStore. Plus it sounds too robotic 🤖 for my liking. Where can I find more information on Android TTS and how do I use it's API in conjunction with Rho? How does it sound? Hope it isn't too Robotic
|
|
|
Post by Alex Epifanov on Aug 29, 2020 12:38:12 GMT
BTW espeak looks ok at the first glance Looks like this will work only in desktops as espeak and lame are required to be installed. Whereas espeak is also available for Android, lame isn't available in PlayStore. Plus it sounds too robotic 🤖 for my liking. Where can I find more information on Android TTS and how do I use it's API in conjunction with Rho? How does it sound? Hope it isn't too Robotic You can implement different engines for each platform and wrap their functions into unified Rho API. docs.tau-technologies.com/en/7.2/guide/native_extensionsdeveloper.android.com/reference/android/speech/tts/TextToSpeechnot sure how good is Android TTS. Never tried it.
|
|
|
Post by egghead on Aug 31, 2020 12:01:58 GMT
I tried using JS WebSpeechAPI [Ref. developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API]and used a test page like in below code, but it seems its not supported in QT webkit or Android or Windows Phone, although it's showing Android WebView is supported under Speech Synthesis in above ref. page. Moreover, the below code works in device browser. Does it have anything to do with WebView or device speech engine? <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>Document</title> </head> <body> Select Voice: <select id='voiceList'></select> <br><br> <input id='txtInput' /> <br><br> <button id='btnSpeak'>Speak!</button> <script> var txtInput = document.querySelector('#txtInput'); var voiceList = document.querySelector('#voiceList'); var btnSpeak = document.querySelector('#btnSpeak'); var synth = window.speechSynthesis; var voices = []; PopulateVoices(); if(speechSynthesis !== undefined){ speechSynthesis.onvoiceschanged = PopulateVoices; } btnSpeak.addEventListener('click', ()=> { var toSpeak = new SpeechSynthesisUtterance(txtInput.value); var selectedVoiceName = voiceList.selectedOptions[0].getAttribute('data-name'); voices.forEach((voice)=>{ if(voice.name === selectedVoiceName){ toSpeak.voice = voice; } }); synth.speak(toSpeak); }); function PopulateVoices(){ voices = synth.getVoices(); var selectedIndex = voiceList.selectedIndex < 0 ? 0 : voiceList.selectedIndex; voiceList.innerHTML = ''; voices.forEach((voice)=>{ var listItem = document.createElement('option'); listItem.textContent = voice.name; listItem.setAttribute('data-lang', voice.lang); listItem.setAttribute('data-name', voice.name); voiceList.appendChild(listItem); }); voiceList.selectedIndex = selectedIndex; } </script> </body> </html>
|
|
|
Post by Alex Epifanov on Aug 31, 2020 12:16:23 GMT
guess it is expected that different webview engines won't have full support for TTS. To have full control and functionality for the feature I'd suggest to implement the best suit engine for all target platforms and to expose Ruby or JS API to use it in the app. We'd be happy to cooperate on that.
|
|
|
Post by egghead on Sept 1, 2020 7:39:43 GMT
Looks like SpeechSynthesisUtterance object isn't supported in mobile webviews that's why WebSpeech API doesn't work. Following this : codinginflow.com/tutorials/android/text-to-speech and it does work, but not sure how to integrate it with Rho or how to wrap it into a unified Rho API. I have very little knowledge of native extensions. But then as Rho NFC is supported only on Android platform, and in my case, I need this only in Android, I don't see the need for such a wrapper (although it would have been great if I could write a cross-platform extension for it)
|
|
|
Post by jontara on Sept 15, 2020 22:37:23 GMT
I think the best approach is to look for a TTS engine that is written in C or C++. There are many.
It's been a few years since I worked with TTS/STT. I have integrated both into an early prototype tele-medicine project developed by Intel. I think all of the C/C++ engines are similar, in that they work strictly with memory buffers, and it is up to you to populate some hooks to fill or empty buffers. So, for example, my task on that long-ago project was to integrate a TTS and an STT engine with the Linux ALSA sound-system. By writing the hooks, you can integrate without whatever your real-time source or sink might be.
As far as Rhomobile goes, it is easy to write pure C or C++ native extensions. It's actually a bit easier than so-called "native language" extensions, where you need to e.g. write in Objective-C for iOS and Java for Android.
I recently had a project where we needed to integrate C code generated by Matlab Coder. (It does some traditional (non-AI) image processing. Code was written in Matlab, and MatLab Coder converts it to C. (You can also choose C++). My first thought was, oh, no, I have to write some glue code in Objective-C for iOS and then some glue code in Java for Android, what a pain!
But the Tau Team pointed out that I could just write my extension using just C. So, only one codebase for the extension with identical code for every platform. You still call from Ruby (or Javascript if you wish, I never call from Javascript myself).
Rhodes uses C/C++ internally, and so supports/requires C/C++ support already on every platform. For Android, this means that you will always be building with the NDK (native development kit) and you are free to use C/C++ code. (Or, indeed, any language that can transpile to C/C++ or even pure native instruction set, though I haven't worked that out - haven't had a need.)
For the typical "native" Android app, using C/C++ code can be a barrier, since most Android developers aren't familiar with the NDK. For Rhodes, the build system already includes NDK, there is nothing to figure out. And the fact is, the whole "write once, run anywhere" concept of Android has been long since diluted. All of the important Google-supplied apps use C/C++ or other compiled-to-machine-code languages extensively. Most of the Google apps "cheat" in this way for performance reasons. I had read this previously, but they were rumors. I spent some time discussing this with a VP-level Google employee at last October's Zebra conference in US, and confirmed that this is true. Google does this for performance, and so should you (when needed). Tau platform can build Android images with support for the important Android hardware architectures - I always build for ARM, ARM-64 and Intel (as many tablets and Chromebooks are Intel).
Actually, now I see both iOS and Android provide built-in TTS frameworks/libraries. That is your best bet, really, as they will be optimized for the device, and probably will leverage the GPU or other specialized onboard processor for efficiency. In this case, you probably WILL have to write some "glue" code for each platform in "native" language for each platform.
Still there might be a reason to use some third-party TTS engine, in case it provides features or voices that you want and not provided by the native engine. Another reason might be if you want to unify the experience across platforms. For example, you want to use the same voices across different platforms, then probably you have to incorporate a third-party engine.
I do not know the current state of TTS/STT engines in order to recommend one. In general, though, I will say "you get what you pay for" and if you want quality and support, license a commercial engine.
|
|
|
Post by egghead on Sept 16, 2020 1:13:21 GMT
Thanks Jon for taking the time 😊 You've written quite a bit, but as I see it some points to take note of :
1. C/C++ : They say it's never too old to code, but is the same true for learning a new language? I will be 55+ soon.
If you have some "readymade" TTS code, in C/C++ I could at least try to write a native extension for it.
2. Built-in engines (best bet) : Yeah I notice the TTS settings under Accessibility settings on the device, but not sure how to "hook" into them
3. 3rd party TTS engine : I just need the book title to be read on a search, and not the whole book's contents, so I don't think I really need one.
4.(this is my own): Since this is an enterprise and not a public facing app, I could go with using another TTS app and calling that with intents. Probably easiest way, but still have to do some research on it.
|
|