ChatGPT Voice Mode on CarPlay
On paper, I should be right in the center of the Venn diagram for ChatGPT Voice Mode.
I love audio. It’s a great format for both input and consumption because it doesn’t require your full attention. You can listen while working out, doing things around the house, or driving to work. I use an iPhone, spend about ~1:15 in the truck each day, and use ChatGPT constantly.
So when I saw that OpenAI added Voice Mode to CarPlay, I updated the app right away and used it for about 45 minutes.

My main takeaway is that Voice Mode is good at what it’s trying to do, but I'm far from the target market. I'm almost the opposite market.
What stood out most is that Voice Mode is heavily optimized for latency. That makes sense. OpenAI wants it to feel like a natural conversation, and in that respect, it works. Driving home felt like chatting with a friend who is very patient, positive, but also very dull.
Low-latency design comes with trade-offs. Voice Mode currently runs on 4o, which was first released in May 2024. I’m sure 4o has improved a lot since then, but it still feels far from the best models I use most often.
Most of my day-to-day use is 5.4 Thinking and 5.4 Pro. Compared to those, the difference is pretty striking. The model felt less sharp, less analytical, and more likely to use words or sentence structures I associate with older AI writing.
It sometimes repeated itself, probably because of a small context window, and struggled to move toward a better answer when the first answer missed. When I asked it to do web searches, it could find current information, but wouldn't search very deeply.
One area it excelled at was speech recognition. Most of my conversations were on the highway in a loud Jeep. It's a tough recording environment, but Voice Mode handled my voice very well.
What I’d want from Voice Mode
If I could customize Voice Mode, I’d want almost the opposite set of trade-offs.
Use the most intelligent model available, even if it takes longer. I use 5.4 Thinking or 5.4 Pro for almost everything and am perfectly happy to wait for a better answer.
Eliminate interruptions. Since I’m mostly thinking out loud, I don’t want Voice Mode jumping in while I’m still talking. I’d rather have a clear way to signal “I’m talking” and “I’m done,” similar to what Perplexity used to have.
Don't try to simulate a conversation. The way I use AI, I prefer to throw it a ton of context, wait, then read (or listen) to the reply. I’m not really looking for a conversational companion. I’m looking for a high-quality input/output tool.
The obvious workaround is to dictate into regular ChatGPT and have it read the answer back, which I do a lot. The problem is that the touch targets are small, and I don’t want to be fiddling with my phone while driving.