Keeping it Simple
Technically, some time was initially spent (as planned) in an exploratory phase evaluating different approaches to creating the App for the iOS platform. As was expected this was tricky and after trying some technical options the developers have settled on using the Cordova framework (https://cordova.apache.org/) together with some new Cordova software library components that gave access to the iOS on-device speech recognitions system. This means the code base for both the App platforms is substantially the same and is based on the web App version.
Going forwards, the web version will be used to prototype and test basic functionality and that will then be tested in App forms.
Media Sourcing and Creation
- Text to voice services for instruction and feedback –Amazon Polly for the tutor voice -is working well
- Now using bootstrap Animations for interface icons (from various sources including Google Material Design icons and Fonts) with time-synch cues to mp3’s produced by Amazon Polly – working really well
- Some screen movies of interface may be needed for user training
Top down video of tutor at table speaking and rearranging cardboard letters – early tests are very promising – brings a human touch into the app – this rough test show it could work – will need to work on audio quality and image framing / lighting. https://www.youtube.com/watch?v=Gl3V_WDVmC8
- Images for a sound alphabet – using pixabay etc. – they are proving surprisingly good (must make sure to add credits!)
Way back in 2018c We were fortunate to have the assistance os students from the Digital Skills Academy of Digital Skills Global https://digitalskillsglobal.com/. They helped us work out some crucial ideas about interface design, logos etc. Along with some supporters working pro bono (Mr. Ll we owe you!) and our 2 software companies ReachWill Ltd and Micro-phonics Ltd. Here are some screen movies of that earlier work:
An early demo of the web app version:
Video of the app working on android phone – early test version
Video od Beta 1.2 Interface with the Amazon Polly Voice and using Bootstrap icon animation synched to the audio
It is useful to have a placeholder to record unexpected outcomes in a project So far we have identified these.
- The web app can be a separate product in its own right – and as progressive web apps become more mature they might replace the traditional forms
web app as a teaching tool for classroom use with extra features looks very
promising and could be a powerful publicity tool
- The use of web sockets also open up interesting collaboration opportunities in the future
- Google cloud services for voice are developing and need to keep an eye on them
- User accounts – initially we were not going to have them and rely on standalone installs on the device, partly for GDPR concerns. Now we have decided that we can have anonymous accounts and provide monitoring and feedback on user performance as that will be useful to learners and potentially provide useful analytics in the future
- Creating a ‘White Paper’ has been very effective at clarifying thinking between the partners and as a basis enabling future collaboration
- Chatbots and AI are worth exploring in other related projects
Word Recognition Limitations
Our app features a limited number of very short words (3-4 letters) that learners need to be able to read and then speak into the app to confirm that they have read correctly. We have found that the existing device natural language processing (a form of AI) tends to break down when confronted with users inputting short words (a lack of contextual words sems to be the problem) this is exacerbated when the user has a regional accent. Accuracy is crucial for these learners to progress. The accent issue features in numerous YouTube videos about voice recognition failures. We have become aware of the Google AI initiative Teachable Machine that uses the open source TensorFlow software library.
It seems recognising a single short word for these systems is tricky. When you speak a sentence like a command to Alexa or the Google voice search the voice recognition system has a set of words to work on that provide a context and the performance seems much better. But there are still accent problems. There are recognised accent problems for these systems – there are some hilarious examples on YouTube – see the ’11’ one.
As luck would have it, we have stumbled on a possible solution called Teachable Machine from Google that uses an open source AI software library called TensorFlow. We did some basic testing over the break and so far, the results are impressive. It looks like once ‘trained’ the system will be able to be personalised to individual users’ accents. In our tests it could distinguish between 4 people with different accents saying the same word with a high degree of accuracy. This opens up some intriguing possibilities.
We think this could make our app much more effective. An example of using Using AI to improve AI. Update: We have recently been successful in getting funding to explore this option – more details to follow
The classic ’11’ lift sketch about voice recognition and Scots Accents