Advanced Voice Mode is Coming

And I will personally use it to relearn my mother's tongue.

Aug 17, 2024

For the past few weeks, there have been a number of posts demonstrating the latest abilities of OpenAI's ChatGPT's Advanced Voice Mode. Right now, only a select few users have the privilege of accessing and playing around with the latest update1, and some of the demo’s are nothing short of phenomenal. Here is a snippet from a longer clip by @CrisGiardina on X that really stood out for me:

After viewing this clip, I immediately thought about how I could use ChatGPT to enhance my own Punjabi language learning. My ability to speak Punjabi is poor - I can mostly understand what is being said to me, but I often struggle to reply back with coherent sentences. This has always been the case when speaking with my Nani and other elders in the family who aren’t as familiar with the English language. I pulled out some Punjabi books from my youth2 but quickly found it cumbersome and static, so I'm switching to AI Large Language Models (LLMs) for a more dynamic and practical learning experience. When this version of advanced voice mode becomes available for all subscribers, I’m going to make the most of it by practising Punjabi speaking exercises.

These LLMs can make excellent teaching companions and have the the ability to create a bespoke personalised learning experience. The wonderful thing about using LLMs for learning is that no one judges you. There’s no such thing as asking a stupid question and so you can crack on, make mistakes and learn at your own pace without any pressure. Adding a realistic low-latency voice mode enhances the utility of these models even more. However, learning languages is just one use case. I’ve already used the current voice mode on ChatGPT as a knowledge companion when going for lengthy walks ~~or drives~~3. It’s kind of like having a limited version of J.A.R.V.I.S. from Iron Man in your pocket4. Hands-free, I ask all sorts of questions ranging from topics in diagnostic medicine to gluten-free recipes, although I’m fully aware that some of the responses may contain mistakes or hallucinate5.

With such vast improvements in the latency and the cadence of responses, it goes without saying that more and more folks will anthropomorphise this technology. As interactions become smoother and more intuitive, some folks may even fear this technology (although I firmly believe there is nothing to fear… yet). Others may jump at the chance of having an actual useful virtual assistant. Let’s be honest, Siri and other virtual assistants have historically been kind of shit and so it will be interesting to see how non-tech audiences make use of these newer AI assistants in their day-to-day lives. At the end of the day it is the adoption by everyday users that will really drive the advancement and refinement of this technology.

Feel free to follow me on X or Threads as I occasionally repost some of the best AI use cases I come across.

As of August 2024, although, apparently, OpenAI plans to enable it for all paying customers at some point this Autumn.

Long story short, me and my bro used to attend Punjabi school when we were kids but were thrown out because we never took it seriously. There’s a letter in the Punjabi alphabet called ਙ “nganngaa” which kind of sounds like the Punjabi word for “naked” (“nangā”). This used to crack us up all the time (we were kids man). Nonetheless, it was the last straw for the Punjabi teacher.

Crossed out just in case someone has a problem with it 😉.

I say “limited” because these LLMs currently can’t perform many actions on your behalf - like calling out your Iron Man suit.

An AI hallucination is where the model generates incorrect or nonsensical information, despite being confident in its output. It’s important to be aware of hallucinations when interacting with LLMs and so it’s best to fact-check certain responses, especially in niche topics you’re not so familiar with.

Keeping Humans in the Loop

Discussion about this post