024: A Speech Coach in Your Pocket with Ummo’s Anshul Bhagi

By June 15, 2016Podcasts

 

 

Anshul Bhagi says: "[In] day-to-day conversation or casual speech, filler words are sometimes an intentional chocie."

 

Entrepreneur and app developer Anshul Bhagi introduces Ummo, a powerful app for public speakers, and the lessons learned along the process of making it.

You’ll learn:

1) How the new speech-coaching app Ummo can enhance your speaking
2) When the use of filler words is helpful
3) What is “upspeak” and its implications on how you’re perceived

About Anshul
Anshul Bhagi is a 2017 Harvard MBA with an undergrad and Masters in Computer Science from MIT. Previously, Anshul did PM and development at Microsoft / Apple / Google, spent two years at McKinsey & Company, and founded education startup CampK12 to teach kids in India to code. Together with his Harvard / MIT classmates Yasmin, Andrea, Sam, Damola, and Sinchan, he is building Ummo, a personalized speech coaching app (available for download here).

Items mentioned in the show:

Anshul Bhagi Interview Transcript

Anshul Bhagi
Thank you. I’m excited to be here.

Pete Mockaitis
Well, in perusing your LinkedIn, you’ve got a lot of cool name brands there: MIT, Microsoft, Apple, Google, McKinsey & Company, Harvard Business School. But it seems like the latest thing sounds fascinating. What’s Camp K12 all about?

Anshul Bhagi
Camp K12 is a company I started in India. The goal there is to build young innovators and entrepreneurs in the school system. So as the name would suggest, Camp K12 works with K12 students in kindergarten through 12, teaches them how to code, how to build companies while they’re still young. And what we’re doing through the organization is providing empowerment, trying to get them addicted to making stuff.

And I built this because I realized one thing lacking in the education system there is opportunities for hands-on learning, something that I had been fortunate to get in the various institutions I was able to be a part of in the U.S., and I wanted to bring that to the Indian school system. And I intentionally targeted the K12 segment there because the empowerment that I’m trying to provide, these habits I’m trying to form, they’re formed pretty early. So working with students in the college age group is going to be too late, and therefore, I wanted to work with students very young and give them the chance to make things.

So my team and I there, we teach Android app development to middle school students. We have them build Android games and publish their own Android games. We’re teaching high school students to make drones fly. We’re teaching the elementary school or primary school students to make their own animated movies, things that they want to do but they never knew that they could, and once they experience that they can, they’re hooked. And that kind of addiction is what I’m trying to provide through Camp K12.

Pete Mockaitis
That sounds really cool, and we’ll definitely link to that in the show notes. And that’s exciting to see what sort of innovations will blossom from that.

Anshul Bhagi
Yeah.

Pete Mockaitis
But the main focus for our chat today is there’s some innovation blossoming with you and your team over at HBS there with this fascinating speech coaching app, Ummo. And I want to talk in some detail about that, but could you maybe first tell us sort of the background story? How did the team get together, and what was your inspiration for saying “We should have a speech coaching app.”?

Anshul Bhagi
Sure. So my team and I are actually very excited. We are all first year students at the Business School at Harvard, and it’s been a whirlwind for us, and just a blur of excitement and fun, and we’re very excited about where Ummo is today. So I’ll tell you a bit about the app and then I could tell you more about the team.
The app itself, Ummo, is a speech coaching app. Our goal here is to bring self-awareness and intentionality to our public speaking, to our day-to-day speaking. So this is an app that listens to you as you talk and, in real time, gives you feedback on whatever metrics you want to target. And the key here is customizability and being able to personalize this technology-based speech coach that you have in the form of Ummo.

So you can track filler words, such as “um,” “uh,” “like,” the things we add in our speech to bridge our thoughts. You can emphasize pace, volume, get a sense of how you’re varying over time in terms of words per minute. You can track your pauses. A lot of times, as we practice our speeches, we want to pause at the right moment, for the right amount of time, and you can track where you’re actually pausing and for how long.

You can track your clarity, which right now is a measure of how your pronunciation matches or does not match the average pronunciation in American English. And in the future, we plan to support the UK dialect. We plan to support foreign languages. We’ll be rolling out those features. But the idea here is there are various metrics one might want to track as a speaker.

And there’s no gold standard. There’s no “This is the right way to speak,” because it depends on your situation, on who you are, how you’d like to speak, your personal style. What we want to build is a data-driven speech coach that can give you that self-awareness so you can yourself decide, “Am I speaking the way I wanted to speak?” and then see some feedback on top of that. So that’s what Ummo is.

In general, how we built this, how we stumbled upon this idea. So my team and I, we’re all students at the Business School at Harvard, and each of our classes at the Business School starts with something called a cold call. A cold call is the professor randomly identifying one student at the start of a session and calling on that student without any warning, usually presenting him with a pretty big task, such as “You read 20 pages last night about this country, or this company. I want you to open the class by answering Question XYZ.”

And the student is put in that spot where you have at least 90 eyes looking at you waiting to hear what you’re going to say.You have a question you have to answer. You have that nervous energy, and as a result of that, what you want to say doesn’t always come out the way you want to say it. So we’ve all had this experience as a team, as students at Business School being cold called, and we’ve also had this experience in our former lives giving public speeches, giving day-to-day conversations. We all say filler words. We all do things that we wish we didn’t in our speech. And we realized that technology had come to a point where we could listen and analyze in real time as people are speaking.
So this concept of speech recognition had evolved to the point where you could take speech in real time and convert it to text and do a lot of analysis on top of that text and present it to users. So our experiences giving these speeches and saying our filler words and giving less than perfect orations in class, combined with our understanding that the tech existed, made us want to see what we could do for ourselves and for other people who want that self-awareness in speaking, and so we built Ummo.

Pete Mockaitis
Well, that’s great. And so tell me. You all had that experience, but I imagine… So there’s five of you on the team?

Anshul Bhagi
Yep. There’s five of us, and we have an adviser from outside. Yeah.

Pete Mockaitis
And so you’ve all got some different experiences. So tell me a little bit, what did each person bring to the equation?

Anshul Bhagi
Sure. So I can start with examples from my own life. I worked at McKinsey & Company, a consulting firm, prior to doing Camp K12 in India and prior to Business School. And while I was a student there, in fact when I was a new hire, when I just entered the firm, I went through training which, Pete, I imagine is similar to what you would have gone through back in your days as a consultant, but this was new hire training where the goal was to make sure people are ready to speak to clients. Confident enough. They have the right tone of voice. They have the right pace.

And so we went through what could be described as communications coaching. So I’ve been through exercises where I have a feedback buddy and someone is counting the number of times I say different words. Someone is in charge of keeping track of how I’m seeing things. And the fact of the matter is if I want to do this and if I happen to not be at McKinsey & Company, if I happen to not have a speech coach, how am I going to get that sort of feedback?

I could ask a friend. I could record myself on video. There are substitutes, but those substitutes can be inconvenient. There can be stigma.For example, if I ask a friend to sit in the room as I’m practicing a pitch, there’s suddenly stigma around that. What we wanted to do with Ummo was build a speech coach that is accessible to anybody.It’s not only reserved for the privileged few who would choose to hire executive coaches in their seat as a CXO of a company. It’s not for people who happen to be at a firm that pays for learning.

So some of us on the team have had experiences at these companies that work with speech coaches, and we have personally been through that experience of what it’s like to have a speech coach. And so we knew what kinds of metrics people would be most excited about. Our team comprises both engineers as well as people who bring a lot of experience when it comes to working with enterprise partners and sales, and it’s been really fun for us to work together. Some of us on the coding, some of us on relationships with some of the companies we want to take Ummo to. And I think it’s a good skill seton the team.

Pete Mockaitis
Well, that is cool. And it really seems to be a pretty darn polished app right now. I mean, I was playing with it. I was utilizing it for my speech development earlier today. And so I was just testing it out with a little prayer, and I did it 10 times to see kind of the accuracy, and it was spot on. It was like, “For the sake of your sorrowful passion, have mercy on us and on the whole world.” And it even counts each of those numbers of words I use, and it was perfect. “On” is 20, as it should be. Twice, times two. “Mercy” is 10. And it was so spot on. So it seems like it’s even better than Siri. Or did I just get lucky?

Anshul Bhagi
Haha! Well, I’m, first of all, very excited that you had that positive experience. I think it is quite good. We’ve been testing it and we’ve been having a broad group of users test it. We have built Ummo on top of a speech recognition engine which is not our speech recognition engine. So we would hope that it is really good because we’re using something built by PhDs and experts that have spent years and years understanding what are the language models out there, what does American English look like, what does UK English look like, how can we map sounds to words to text.

And so because we’re building on top of this, we certainly expect a certain quality bar and we’re actually very delighted with the quality. So in our case, we’re using IBM Watson right now. We have been playing with different technologies out there. There is Nuance. Siri is actually wonderful but not available to people like us to use as developers. And so we were looking at other technologies.

And in the future, there will be a lot of custom technology work we do on our end in understanding what different sounds are and mapping them to text, and this will be particularly focused. Our custom work here, the things we build on top of the existing speech recognition engines I told you about, that will be focused on custom disfluencies.
For example, the um’s and the uh’s that we say in our speeches, they differ based on our locale, on our geography. I have friends at the Business School who are from Latin America and they have a slightly different sound. They’ll say “eh” instead of “uh.” And Canadians say it differently. Americans say it differently. We want to be able to track these various things. And the built-in tools that we’re using for speech recognition, they don’t capture all of that, so we’ll be doing our own machine learning to train language models and be able to catch those side cases.

So we’re excited this is good. I’m very happy it worked for you, and we hope it’s going to become better and better as we add our custom analysis and custom innovation to this. And we’re very excited to be building on top of technology that has been tested by a lot of experts and built by a lot of experts.

Pete Mockaitis
That’s fantastic. And so I guess I’m curious now a little bit here. You mentioned that there was no gold standard, and it’s kind of about your style and preferences. But maybe if there’s no gold standard, you could share with us a little bit of some of the pros and cons or benchmark levels. So I imagine, in some ways, having zero um’s or uh’s, vocal pauses, can make you seem like a super polished orator rock star, but at the same time, having none of them maybe seems a little bit unnatural, like you don’t have as much as a human person to person feeling, mistake making connection. So could you maybe speak a little bit to what are some varying levels you see on some of these dimensions associated with vocal pauses and what that means?

Anshul Bhagi
Absolutely. And I think you hit it on the head already. So there are situations where you might want to say filler words because of how it makes people perceive us as speakers. There is that connection that we have. It softens our tone. It makes it sound less robotic if we do add in some filler words. And the fact of the matter is sometimes when we are speaking casually, if I sit down with you on a Starbucks and we’re having a conversation, there will be moments when I’m thinking and I do want to bridge my thoughts. I can bridge them with silence if I have trained myself to eliminate any sound, or I can say the “Uh…” as I’m thinking, and it makes me sound more human almost.

And that’s exactly what we’ve been hearing from speech coaches that we have been talking to. So as we built Ummo, we built it with speech coaches who had experience talking with clients and giving them feedback on what to say and not to say in different types of situations and what to track. And with regard to filler words, we heard that when it comes to day-to-day conversation or casual speech, filler words are sometimes an intentional choice.

Even on a big stage, even in front of a podium, depending on the audience you’re talking to, you might want to keep a bit of the slurring, keep a bit of the filler words in there. But there might be situations where if you’re giving a keynote speech that you have rehearsed multiple times, you had it more or less memorized, the filler words may not be necessary given your audience. And if that is the case, you want to practice and you want to remove them from your speech, or you at least want to have control.
What we’re trying to provide with Ummois that self-awareness so that if a speaker is practicing a pitch, they know the audience they’re talking to. They know whether they want to speak filler words or not. They should know. Even in a 5- or 10-minute speech, they should know how many times they said “um” or “uh.” They should know how many times they said the words that they don’t even know they say. For example, I’ve given speeches where I’ve used a word like “today” 10 times in a speech. And I wouldn’t have even known that if I hadn’t gone back and watched the video of me speaking because that’s not a word I’m trained to track.

So this is all about self-awareness for those filler words that you specify. Ummo lets you specify which filler words you want. And for the custom words you specify. So you could imagine. We were talking about filler words now, but we could extend this to jargon. If you look at lawyers and legalese, if you look at consultants and their business speak… Exactly. So there’s things you might want to track that go beyond just filler words. So we can call it jargon more broadly. And we want you to be able to turn it on or off on the app, and on and off in your speech, as and when intended.

There are other metrics out there that speech coaches have been talking to us about, such as pace. We have heard from one speech coach, for example, that when you’re giving a public speech, speaking between 120 words per minute and 150 words per minute is a good pace. That’s something that people can understand. But there are obviously pockets of your speech that you would want to speak faster in. There are pockets where you would want to speak slower in.

So as an average, 120 to 150. It’s advisable, according to one speech coach. But as we have heard, as we hear politicians speaking these days, they go through very passionate phases where they’re speaking very quickly, their volume goes up, and then they go through other phases where the volume goes down, the pace goes down significantly. And that variation is beneficial, too, in terms of keeping the attention of your audience. So it’s not always about maintaining the same pace throughout the pitch. It’s about how it varies. And therefore, when we built Ummo, we didn’t want it to be one number ever.

When you get a report card, so when you were trying your prayer on this, you got a report card. Nowhere was there just one number saying “This was your overall score,” because that is not how we believe speech coaches work. That’s not how the speech coaches we talk to believe speaking should work. You should have data backing the different metrics you want to track, and within each of those, you can have benchmarks. For example, 120 on the lower end for the words per minute, 150 on the upper end. But Ummo shows how you vary over time so that you can see if you were above that benchmark or below it, was that intentional, was it unintentional, and how do I bring it to where I want it to be.

Pete Mockaitis
That’s lovely. And so, now, that number is fun. Again, there are benchmarks, there are guidelines, and there are certainly times where I’ll slow it down because it’s a thoughtful moment, we should reflect on it, and we’ll speed it up because we’re fired up, we’re excited. So I guess just in the rate of speech world, we’ve got that number, 120 to 150. Broadly, generally understandable.Are there some other kind of numbers you might pin down for a filler percentage or volume differentiation or clarity?

Anshul Bhagi
So for those ones, we have avoided having numbers, even in our own minds, because that one purely for us is about self-awareness. And I don’t ever want to venture a guess or share what we have heard from different speech coaches because it has varied from speech coach to speech coach. And this consensus we’ve gotten, the only consensus we’ve gotten, is that it only depends on the style, the audience situation. And therefore, we are trying not to be an app that scores you along one dimension.

We realize there are differences between men and women. We realize there’s differences between people who come from different linguistic backgrounds, different cultural backgrounds. We don’t want to rate one higher than the other just because of that background. We want to show the differences, we want to show benchmarks, and we’re working towards an app that goes even deeper into the analytics, but we don’t want to be an app or a company in terms of what we stand for that has a version of the right speaker in mind.

What we will ultimately do is we will allow people to specify their own benchmarks, to customize them. So we can have a value that the app is initialized to, but beyond that, you should be able to set where the lower words per minute mark is, the upper words per minute mark is. You should be able to set what filler percentage you think is ideal. You should be able to set what filler words you want to track. And then beyond that, we want to tell you that “Hey, these are words you’re not tracking right now but that you’re saying a lot of.”

Again, we want to be a Fitbit for your speech fitness. That’s what we’re going for. So it’s something that right now it’s on your phone. In the future, it could be something that sits on your wrist, maybe on your Apple Watch. It could be something that sits on a Bluetooth mic that you wear, and throughout the day, gives you feedback whenever you wanted on how you’re speaking.

Pete Mockaitis
I like that a lot. And it’s interesting with the consensus pieces. And I hope… I don’t know if it’s in the works if you’re going to share any sort of blog posts or articles or summary findings from where there is consensus versus divergence from all your work with speech coaches. I think that’s so fascinating because that’s kind of my experience. I look at the app, and it’s like, “Oh, cool.” I know that number, and the thought is “Is that good?” And so I think the more that there is context, the more that’s handy. But I also totally respect and appreciate how it’s not trying to shove everybody into the same framework. So that’s a delicate balancing act you’ve got over there.

Anshul Bhagi
Yeah. Exactly. That’s what we are dealing with, and I think it’s too early to tell where we will stabilize. I’ve shared with you what we stand for as a team, and we don’t want to have a gold standard. That said, as we have been getting a lot of publicity as the Ummo team, there have been a lot of speech coaches reaching out to us and just offering to speak with us, share with us how they would like to use the product, and in those conversations, we are starting to learn about what are the different metrics people value and which of them should we build in, how should we build those in. If we are giving scores, if we are gamifying this, making it a competitive social app, what would that look like?

This is an early stage for Ummo. It’s just been a beginning, and we’re very happy with how it’s gone in just the last few months. We launched the app on April 27th, and so this is fairly recent. And for the entire team, it’s been really fun to see that this is making a difference for people. That makes us feel really good that what we’re building is being used. We are really excited about the feedback that we are getting from customers, whether it’s on the App Store page via comments, whether it’s through our website, people emailing us, a lot of people giving feedback on “Hey, here’s how you can make it better.” And then some people say, “Hey, talk to me. I would like to use this for my company, for my speech coaching academy.”

And we’re very excited about both of those things. The feedback makes it easier to build a better product, which is our focus right now as we go into this summer. And then the conversation we’re having with folks who want to use Ummo in the enterprise, we see that being a big part of this business going forward, meaning speech coaching for the consumer is one thing, but speech coaching in the workplace, in educational institutions, in organizations like Coach Masters that focus wholeheartedly on public speaking, we see technology being able to serve a role that humans cannot serve.

For example, if a speech coach is listening to someone talk for 10 minutes, a speech coach, being a human, has a working memory. And yes, some humans have very large working memories, but in general, you can’t construct a graph in your head as someone is speaking of the number of times they said every single word in their 10-minute speech. That’s something that’s very difficult to do. We can’t retain that in our head. We can’t usually write that fast.
So technology can supplement that human being, that speech coach, or in the case of Coach Masters, this international organization that is focused on building better public speakers, they have people speak impromptu and prepared presentations every week when they meet, and they will always assign someone in the front row to be a feedback buddy, listening and giving feedback. They even have an app where right now you would manually tap a button each someone says “um.” And we think that technology that listens and analyzes in real time, that uses speech recognition, and has an analysis there on top of that, can really help all of these people.

Pete Mockaitis
That’s fun. Well, I’m curious to hear what sorts of features you think are on the horizon. I’d love to see pitch in there personally, making sure you’re not a monotone. What’s on the roadmap?

Anshul Bhagi
Yeah. Pitch is on the roadmap. And particularly around pitch, we’ve been hearing about upspeak, which is a phenomenon where people end their phrases or their sentences with a rising tone. And the result is, in some situations, it’s fine. It’s acceptable. In other situations, we have heard from speech coaches that it might make a speaker seem less confident in what they’re saying. It may make a statement sound like a question mark. So the pitch is important both from a variation in the five minutes of speaking point of view, but also within each word point of view. And so that pitch is something that we will be adding. That is a product on the horizon.

The most popular feature right now, the most requested feature, is people want to be able to create accounts on our app and store their report cards over time so that they can see how they’re improving over time. For example, sometimes when you’re prepping for a keynote presentation you’re giving, you will practice it more than once. And when you practice it more than once, you want to see your variation as a speaker, how you have evolved, how you’ve improved over the multiple practice sessions. So this concept of sessions and storing their results from those sessions, that’s the most popular, requested feature right now, and that’s at the top of our list.

In addition to that, we have support for new languages and new dialects on the horizon, because right now the app is built with the American English language model. We want to support the UK English language model, the Australian English language model. We want to support Spanish. We want to support Japanese. We think that speech coaching is not just for English. It’s something we could provide people no matter where they are, no matter what language they speak, and the underlying technology is the same.

Pete Mockaitis
Oh, that’s lovely. I’m so glad that your team is committed to it because I already like the app and I think with those extra things, that gets even better. Well, kudos to all of you. I’m glad you’re doing what you’re doing. Would you tell me, is there anything else you want to make sure that you share before we kind of shift gears and get your take on the fast faves?

Anshul Bhagi
I would love for your audience or the folks watching your podcast to try it out and send us feedback. We are very open to feedback. We are iterating rapidly, trying to build a product that makes or gives people that self-awareness they need to be a speaker. And I would encourage folks listening to this to give us a shout and let us know how we can make something that they would want to use.

Pete Mockaitis
Perfect. Thank you. Well, I do hope you get a good chunk of actionable tidbits from them. So then, could you start us off by maybe sharing a favorite quote, something that inspires you again and again?

Anshul Bhagi
Yeah. A quote that I’ve been going back to recently is “You cannot change the direction of the wind, but you can always adjust your sails to set yourself in the right direction.” I really like that because I find it empowering. I like that it puts the control in my hands, even though sometimes it’s not realistic. Sometimes, the control is beyond you. I like, in general, being able to come back and realize that no matter how difficult the situation or what I’m going through, there’s usually something I can do to make it better for myself, for other people, and I find that to be very empowering.

Pete Mockaitis
Lovely. How about a favorite study or piece of research you find yourself thinking about or mentioning somewhat often?

Anshul Bhagi
Right now, it’s a piece of research regarding Ummo. It has to do with the fact that in our public speaking, avoiding filler words is not always a good thing. And this research actually came to us from speech coaches we’ve been talking to. So the research says that filler words, that these disfluencies can often help you connect with your audience. And these are the different pieces of data that help us build a better app at Ummo, knowing that we shouldn’t be imposing too much on what’s right and what’s wrong, and that it’s more situational.

Pete Mockaitis
Lovely. And how about a favorite book?

Anshul Bhagi
Favorite book. I really like “The Things They Carried,” about the Vietnam War.

Pete Mockaitis
And how about a favorite personal practice or habit, something that you do that’s been very helpful for your effectiveness?

Anshul Bhagi
Writing down the night before I sleep the five things I want to do the next day.

Pete Mockaitis
Excellent. And best way to find you or the Ummo folks?

Anshul Bhagi
We can be found on our Facebook page or via email. So I’ll give you both. The email address is hello@ummoapp.com, and the Facebook page is facebook.com/ummoapp.

Pete Mockaitis
And final parting word or challenge or call to action for those seeking to become more awesome at their jobs?

Anshul Bhagi
Download Ummo and be self-aware about how you speak. Even if you don’t download Ummo, I would just say self-awareness is a wonderful thing. And even building this app with Ummo and having conversations like this, I’ve already become more self-aware about what I’m saying just by virtue of the fact that I know people will be listening, that if I’m building an app for speech coaching, I should work on my own public speaking. And that self-awareness has helped me tremendously. And I would say whether it’s for public speaking or it’s for organization skills, whatever it might be, developing that intentionality is wonderful. It’s easier said than done, but it makes a big difference in your effectiveness.

Pete Mockaitis
Mm-hm. Well, Anshul, thanks so much for taking the time here. This has been a real treat. And I wish you tons of luck with Ummo and Harvard and Camp K12 and all your upcoming adventures.

Anshul Bhagi
Thanks so much, Pete.

2 Comments

Leave a Reply