Back to all posts

Building a text-to-speech Android app as a hobbyist developer

Billy Wood11/05/2020Developmentalt

Following his inclusion in the BIMA 100 list in the Tech for Good category, hedgehog lab Product Manager, Tom Ford, gives us the lowdown on his successful MyVoice Text to Speech app which he built in his spare time as a self-taught hobbyist developer.

Since graduating from University, I have always worked with or around developers. In one of my first jobs out of university as a project manager at HP, I oversaw large digital transformation projects for the likes of Deutsche Bank and Procter & Gamble and managed delivery teams from afar. Here, the world of development was a dark and mysterious place for me. We’d dictate to the development teams what needed to happen, and these modern-day wizards would make it happen from remote offices in India, Bulgaria or America. 

When I joined hedgehog lab in 2017, I was determined to learn the basics of this magic myself. This was in part a necessity, as being in a smaller consultancy with often co-located teams means you are often in the thick of it. Our relationships with clients are also often more personal and as such a higher level of information sharing is required, much of it of a technical nature.

Beginning to code

I started learning Java to develop Android applications towards the end of 2017, with a cheap but well-reviewed course on Udemy. One of the mantras of this course was if you have an idea - write it down, and just try it! With this encouragement in mind, after watching around 20 hours of the course video material, I started making all sorts of small and ultimately useless applications - all of which, of course, remained less than half complete.

Some of the projects I worked on in those early days included:

  • a companion app for the board game Munchkin

  • a mood monitoring app

  • a planning app for hiking.

The ‘Aha!’ Moment

A common refrain that new developers are told when learning is that you should start to build something that is useful to you or has an impact on your life in some way. This helps to keep you engaged, motivated and ultimately reduces the risk that your brand spanking new app will become yet another unfinished side project. 

For me, my idea came from a deeply personal place when my mother was diagnosed with Motor Neurone Disease (MND) in summer 2018, an aggressive terminal illness that affects every aspect of a patient's life. MND can manifest in a few different styles, which ultimately influence the way the disease can affect you. In our case, it’s Bulbar onset, which means that the first symptoms commonly experienced are a loss of speech and oral control. When my mother began to become unable to speak, we turned to technology.

There are some great apps available for someone with speaking difficulties which are broadly termed Text To Speech (TTS) applications. The concept is simple, type what you want to say, hit speak and your device will speak it aloud using the default TTS engine on your OS. Whilst the existing apps are pretty good, often they are expensive, full of adverts or lacking in features. There was also a lack of decent options available to Android users. So I had a pressing need, I had an idea, and I had some competitors to study and compare. Perhaps most importantly, I had a user tester and a massive amount of motivation.

How not to build an app

I diligently started coding with the little experience that I had and had an early prototype up and running within a few days. 

This was a simple text box, speak, and a clear button. You could adjust some basic settings and save a phrase. With basic user needs achieved that's an MVP, right? Well yes, but it’s also about as bare-bones as it could have been. I’d learnt how to use local storage to save phrases, and use the devices TTS engine to say something out loud. Great success!

Despite being surrounded by some of the best designers around at work, my UI obviously left a lot to be desired. But the basic concept of being able to type and speak, save phrases, and adjust the sound of your voice remains today. 

My next version had a slightly better (but ultimately still bad) design and introduced the concept of categories giving users the ability to group phrases with common themes together. 

It also gave the user some more settings with the ability to change the voice language to something other than English, and I experimented with a French-language version of the app. This wasn't a data-driven decision (I had next to no users and none of the ones I did have came from France), but it was a great learning opportunity.

Out in the wild

My Voice, the TTS app was born and released into the wild on December 23rd 2018. You can the screenshots of the first release version below (minus the placeholder advertisement you see below!)

[object Object]

Amazingly, I actually attracted around 200 users with this first version relatively quickly. I posted on Reddit to promote the app, and shared with colleagues, friends and family.

I quickly realised that my intended user base was likely to have a range of disabilities and physical impairments that would impact how they interacted with their device, so I started to think about accessibility. I tried to ensure all touch points were of adequate size, and that all objects had good content descriptions from this point onwards. 

It was also at this point that I realised that the architecture of My Voice, both from a UX and technical point of view was frankly crap. Technically, I was using purely Activities (what’s a fragment?!) and Intents, I was passing data around everywhere instead of storing things in SharedPreferences or locally; it was a mess. 

I expect our designers already vomited a little further up this read, but visually there were some massive flaws. The navigation of the app felt clunky and you couldn't get to where you wanted to be quickly, while the visual cues of the app were all over the place, and I needed a better flow. 

Getting somewhere with version two

I released an updated version in late January 2019. This included bottom navigation and loads more settings (pilfered from other apps), such as the ability to clear text after speaking, speak words aloud as you type them and more. 

This version was a major jump and included a whole host of new features for users. One was allowing people to change their colour scheme using a library I found - my first time working with someone else's library. 

You could now adjust text size and opacity, a feature that has since proved to often go unused, but which a small minority find critical. 

I also had a new store icon and dare I say it, a small amount of brand identity was starting to develop in the visuals and the way I communicated with users via the store listing, review feedback, etc.

This is probably the first version that resembles what the app looks like today. The most technically challenging part of this was figuring out a way to get ‘speak as you type’ working. I still have nightmares about this. In addition to the obvious features and visuals, the app had a whole host of new things going on behind the scenes.

Implementing user feedback

In direct response to user feedback, I implemented a broader range of voices and locales into the app including male/female variations wherever they were available. 

This was always something I knew I needed to do, but I had been putting it off. Initially, My Voice launched with a hard-coded list of 10 languages and it was pot luck whether you got a male or a female voice. I assumed users would adjust their pitch and speed to make the voice as suitable as possible. 

This update opened up all available voice languages from the device’s TTS engine (providing it was Google’s TTS engine… more on this below), and for each language, the list of available dialects often with obscure names, some regional dialects, some just male or female, or variations on existing sounding voices. 

Support from the MNDA

Due to my mother's condition, I was in close contact with support workers from the Motor Neurone Disease Association (MNDA). I learnt that they were about to launch a major website update and enquired about their list of recommended communication aids. Through this conversation, I was able to get My Voice featured on their new site as a recommended aid, increasing downloads by about 10% a month.

[object Object]

The MNDA also put me in touch with Model Talker, a voice banking company that enables people to record their own voice and have this converted into a voice usable with TTS devices. Model Talker gave me access to 3 of their sample voices, so I began working to integrate this with My Voice.

I was also contacted by a university in New York who were conducting research into communication aids (particularly apps) and their integration with screen readers and switch devices. They shipped me over some physical switches to test with and while this is still ongoing, My Voice will soon be optimised to work with this setup. The basics are there but I still need to develop support for some alternative keyboard layouts to make My Voice easy to use with this equipment.

How MyVoice looks now

In recent months, I have been working hard on a ton of new features to improve usability and performance, including:

  • introducing Kotlin for all new development

  • bringing my crashes per 1000 users metric right down (I'm at about 8.6 now and I'm confident I can reduce this further)

  • major design improvements, along with guidance a hedgehog lab colleague about the power of visual spacing (thanks Jack!)

  • user prompts for updates using Google's In-App Update API

  • support for Model Talker voices.

This last one is particularly important and one I’m especially proud of. When you’re forced to use a TTS device, having it speak in your own voice makes a major difference and enables users to retain a vital aspect of their personality that would otherwise be lost.

[object Object]

What’s next?

Honestly, I have no idea! Having just released the latest version I think I’m going to let it settle for a few weeks and see how users react to it. I have a number of ideas in my roadmap, including changing TTS engine in-app, saving audio clips and adding empty state visuals, which are still to be validated as ‘good’ ideas yet. 

My Voice currently has over 6000 downloads in approximately 7 months and over 1600 monthly active users. Of our regular users, most people continue to use the app almost daily over a 30 day period. Favourite phrases are popular but categories not so much, so this might be an area to explore more in a future release. My top locations for users are perhaps unsurprisingly the US, UK and India, with Germany a close 4th. 

How hobbyist development has helped me (and could help you too)

When I started this journey, as alluded to at the beginning of this story, developers were wizards and the code they wrote was their magic. I had no real understanding of what it took on a technical level to build an application, and a bug was something that could be fixed immediately!

I’ve since grown to understand a lot more about the technical ‘flow’ of work. Requests that seem small (“just change that image to a cat, but only when they’ve visited that section 6 times”) can have huge knock-on effects and have a much wider impact than first anticipated. I’ve learnt the value of technical planning and foresight, having an idea of what your finished product might be and building a suitable core architecture to suit that saves time and headaches in the long run. 

The biggest thing I've gained is probably empathy towards my dev teams. You can’t always estimate the size of something and you’re always fighting those small bugs, all whilst trying to explain to other stakeholders why it is taking so long. 

Whether you’re sitting at home or in a coffee shop and with music pumping or in silence, one of my favourite parts about this experience was discovering the ‘zen’ of a good coding session. There is nothing quite like it. An uninterrupted period of focus and progress, where you are one with the IDE, and before you know it five hours have passed and your parking has run out at the machine. This state of forgetting all the troubles around you and moving forward is a small slice of heaven for me, and I hope I continue to find it entrancing in years to come. 

I hope if you’ve made it this far that you’ve enjoyed the read, and that it might inspire someone to give it a go themselves. It’s been a rollercoaster with more than a side helping of frustration at times, but my colleagues tell me this is the life of Android developer the world over!

You can download and try MyVoice Text to Speech yourself over on the Google Play Store.