This post is part of my 30 blogs in 30 days series. More details here.

After reading the previous parts(part 1 and part 2) of this trilogy, you might think I hate spoken-word audio. Nothing can be further from the truth. I love people talking in my headphones. Which is why I can tell you how voice messaging will be the future of communications. Once we decouple them with text messages and treat it seriously playfully.

Podcasts vs Audiobooks

I listen to a lot of podcasts. According to my podcast app(Pocket Casts), in the last four years or so, I have listened to over 129 days of true crime, comedy, audio drama, business news, science, media, and economics podcasts. If I was driving, working or working out in the past 4 years, chances are I was listening to a podcast. I am listening to one as I type this. I love this medium. It would obvious that I must like audiobooks just as much, right?

No. On the surface, both are the same medium; people talking about about a topic. However, Podcasts are usually written specifically for speaking. Whether it is unscripted interviews or brilliant 20 minute documentary on White Bread, the main concern for the writers, producers and editors is to effectively communicate through the human voice, sound effect, and music. Audiobooks on the other hand, are just adaptations of the written word to audio. It is usually a person just reading. A talented, experienced performer who is reading, but just a person reading words. Words that were written for the page, that did not have the benefit of the tone, timbre, inflection and cadence of the human voice. Words that are far more detailed than a human voice needs to be.

We tend to treat voice-messaging more like audiobooks, and less like podcasts. Voice-notes are just supposed to be adaptations of text messages, rather than a unique mode of communication. The first company to create a messaging product around the human voice may become bigger than Facebook.

The Product

In the beginning it would be simple direct messaging app that only allows voice-notes. I know there are already products on the app stores that do exactly that. However, they are like audiobooks. Boring, lame and simple. They are like WhatsApp but none of your friend use them and you can’t text message anyone. To succeed you need to differentiate your product significantly.

How will the product succeed?

Fidelity

The true gift of computers and digital technology is fidelity. A mid-tier smartphone can produce a better, clearer picture than most TVs from 15 years ago. Most phones have a better DAC than most portable CD players from that era as well. This also extends to recordings. Your phone is capable of taking phenomenal pictures and videos. Where it fails is the mic. The voice recordings and audio in you videos never sounds all that great. Even if you have an expensive device with a fancy multi-microphone array for noise cancelation, WhatsApp/Messenger/iMessage are not using it for voice-notes.

However, you do not need fancy hardware to run basic noise-cancellation algorithms on your average recording. Over time, you can improve your noise-reduction through Machine Learning and AI. The fidelity will be a differentiator you can use to attract the customers over to your app.

Filters

Why can’t I add auto-tune to my voice notes? What about other sound effects? How about replacing my background with claps or even a laugh-track?

Filters are common on camera-focused apps. Photo filters is how Instagram made a name for itself. Video and AR filters is how Snapchat remains competitive against the Facebook juggernaut. Why has nobody thought about filters for audio-messages? Filters add a playfulness to your communication. They can also add an extra layer of expression to your communication. It is the reason TikTok has taken the world by storm. Why are we making monotonous, droning speeches to each other? The technology is already there.

Video

Dubsmash, Musica.ly, and now TikTok, take audio from popular media and allows users from perform over them. Why not the other way around? Lets not do video, lets start with GIFs. This would create further springboard for expanding beyond chat, a la Snapchat.

The Marketing

If you are an adult, I can already see you eyes-roll. Just like adults rolled their eyes at me when I showed them Youtube. Just like I rolled my eyes when I first heard of Snapchat and TikTok. The success of social media and messaging apps is predicated on old people adopting them. These technologies succeed when they attract young people. Once you have enough of a user base to establish a Network Effect, you can expand. This will be your Buffetian Moat.

Your hypothetical app would attract 14–18 years old through the usual targeting channels and then start an influencer campaign. Then we expand to older audiences, 19–24. New channels of growth will subsequently open up as we get more usage data and telemetry from users.

Voice Messaging is the future of messaging. This app would eventually be made, or just added as features to existing apps. If you beat them to the punch you will not only have first-to-market advantage, but by creating the purely aural experience, you can avoid the discord that comes with mixing ‘hot’ and ‘cold’ media. Now on to fundraising…