Why Voice Might Be the Most Underrated Way to Use AI

24 June

The Problem with Typed Prompts

People don’t type the way they think.

Typing creates friction. You have an idea in your head, then immediately start trying to compress it down into a few neat words because writing everything out feels like effort. It is effort and well, humans love taking the path of least resistance.

Which then turns prompts into simple sentences like:

Write an article about AI and productivity.
Help me create a prompt for a business idea.
Summarise this concept.
Make this sound more professional.

When prompted like this AI is left to fill in blanks. And the more blanks you leave, the more generic the response becomes.

AI gives you a very average safe answer. Or, aptly, AI slop soup. Outputs that sound polished but are actually empty, devoid of any original thinking… your thinking…!

The issue is the quality of the input of course. When it comes to AI, context is everything for getting useful results.

This is where use of voice changes things in a very practical (and massively useful) way.

Voice Removes the Bottleneck

The simple advantage of voice is speed.

Most people type somewhere around 40 to 50 words per minute. Normal conversational speech is often closer to 120 to 150 words per minute.

So in practical terms, you can speak roughly three times faster than you can type.

Typing forces compression because it’s painful to write a 1000 word prompt. Voice allows expansion and is much more natural.

Instead of typing:

Help me write an article about using voice with AI.

You can say something much more useful:

I want to write an article about why voice interaction with AI is underrated. The point is not just that it’s faster, but that because you can speak three or four times faster than you type, you naturally give the AI more detail. More context means better outputs. I also want to include the idea that voice feels awkward at first because people think they need to get the perfect prompt out in one go, but really the value is in the back and forth iteration.

That second version gives the AI far more to work with.

The intended argument is clearer. The angle is clearer. The likely structure is clearer. The output will almost always be better, not because the AI suddenly became smarter, but because the input became richer.

This Isn’t Just About Prompting

I’ve touched on this concept before in a different context, using the dictation tool voice to text in field data capture.

In that setting, the value was obvious. If you’re trying to capture what actually happened in the field, forcing people to type long notes into a phone, iPad or laptop is a terrible way to get rich information. Information that you need for reporting, billing, record keeping.

People shorten things tapping with one finger.

Context, gone. Juuuuuuuust enough to satisfy the form.

Voice changes that because it lowers the effort required to capture detail. A worker, Supervisor or Manager can describe what they saw, what changed, what was awkward, what nearly went wrong, and what should probably be captured for next time.

Whether you are capturing operational data, project lessons, safety observations or building prompts for AI, voice improves the input layer.

And the input layer is where most systems quietly fail.

Bad input creates bad records which can lead to all sorts of problems such as work done but not captured and potentially not billed.
Bad records create weak analysis.
Weak analysis creates poor decisions.

The same is true with AI.

Minimum effort prompts create generic outputs. Richer prompts create better ones.

Good Prompting Is Mostly Context

There is a misconception that good prompting is about finding the perfect magic sentence (or a combination of them)

It isn’t.

Good prompting is about giving useful context.

That means explaining:

What you are trying to achieve.
Who the output is for.
What tone you want.
What you do and don’t want.
What assumptions should be avoided.
What examples or constraints matter.
What a good outcome would actually look like.

The problem is that writing all of that out is annoying, so people don’t do it.

Voice changes this because it lowers the cost of detail. You can ramble a bit. You can explain the background. You can correct yourself mid-thought. You can say, “Actually, no, that’s not quite what I mean.” You can add examples as they occur to you.

That messy, natural thought process is often exactly what the AI needs.

Not perfect wording.

Better context.

The First Learning Curve Is Feeling Rushed

One of the strange things about using voice with AI is that it can feel uncomfortable at first.

You start speaking and suddenly your brain treats it like a formal presentation. You feel like you need to deliver the perfect prompt in one clean run.

So you rush.

You over-explain one part.

You forget another.

Then you stop and think, “That was terrible.”

But that is the wrong way to think about it.

Voice prompting is not about delivering one perfect instruction. It is about starting a thinking loop.

The first spoken prompt only needs to be good enough to start the conversation. From there, you can refine it.

You can say:

That’s close, but make it more practical
Pull back on the hype
That sounds too corporate
Use better examples
Keep the structure but rewrite the introduction
That’s not quite the point, the real point is this…

This is where voice becomes powerful.

You are no longer trying to type the perfect prompt. You are thinking out loud with a system that can immediately reflect, reshape and organise your thinking.

That is a very different workflow.

You can iterate much faster to the result you want.

Voice Captures the Thinking Around the Thought

When you speak, you naturally include more of the “around the edges” context. Why something matters. What triggered the thought. What you are trying to avoid. Where the nuance sits. What example made the idea clearer. What you are unsure about.

AI does not “feel” your emotion like a human does. It is not sitting there interpreting tone in the way another person might.

But the way you speak often changes the words you use.

If you are frustrated, you explain the friction. If you are excited, you explain the opportunity. If you are cautious, you explain the risk.

The AI may not understand the emotion itself, but it can work with the extra context your spoken explanation produces.

And that is the useful part.

The Real Gain Is Not Just Speed

The obvious benefit of voice is that it is faster.

But the deeper benefit is quality.

Being able to speak three times faster does not just mean you finish the prompt sooner. It means you can provide far more useful input in the same amount of time vs typing.

The AI can produce work with:

Better structure
Better alignment to your intent
Better examples
Fewer wrong assumptions
Less trash filler content
A tone closer to what you actually wanted

Drafting articles. Building business plans. Creating client emails. Developing procedures. Reviewing safety documentation. Preparing tender responses. Designing workflows. Brainstorming products. Capturing project lessons.

In every one of those examples, the quality of the output depends heavily on the quality of the input.

Voice helps you provide that input without turning the whole exercise into a typing saga.

Iteration Is Where the Value Really Appears

This is where personally I find the most value.

You ask.
It responds.
You clarify.
It refines.
You clarify again.
It improves again.

This back and forth using voice makes that loop much faster because correction becomes easier. Instead of typing a long explanation, you can simply speak:

“No, the issue isn’t really that AI saves time. The issue is that voice lets you add more detail, and that improves the thinking quality of the output. Make that the central argument.”

That kind of correction sometimes gets skipped when people are typing because it feels like effort.

So they accept a mediocre output or worse they just get frustrated and start again with another short prompt worded slightly differently.

Voice reduces the friction of correction, and reducing correction friction matters because correction is where quality comes from.

A Real Practical Use Case:

A practical example of this is how I use this exact process for developing articles and content ideas.

I have built a custom Insight Content Creator GPT that is designed to slow the content creation process down first.

Sounds counterintuitive, because AI is supposed to speed things up, right?

The issue, for me, is that the actual insight behind an idea has usually not been fully developed yet.

The better workflow is done in stages.

The GPT works through my idea before it tries to create any type of asset.

The basic workflow is:

It starts with my rough thought or from a previously captured echo (you can read about my own capture framework here - which also leans heavily on voice as a capture process to reduce friction),
then explores the underlying insight,
then refines what the idea is actually saying,
then translates it for the intended audience,
then works out what asset could be created.

The staging process is powerful because it stops AI jumping straight into generic content production.

My GPT has a detailed set of custom instructions that force the workflow to move through stages: capture the raw idea, explore the underlying insight, refine the thinking, translate it for the audience, and only then create the final asset.

That changes the role of AI. It is no longer just a generic writing tool. It becomes more of an iterative thinking partner.

I still need to refine it, but it gives me a massive head start with getting my own ideas out into the world.

But there is an obvious problem.

Typing your way through that entire process is too slow.

You have to type the rough idea. Then answer follow-up questions. Then clarify the angle. Then explain the audience. Then correct the framing. Then review the structure. Then ask for the draft.

At every step, typing creates friction.

And whenever there is friction, people shorten the input.

And well we already covered the downside to that.

Voice changes my workflow completely because it makes that staged process much easier to use. I can keep adding context, correcting the direction and refining the idea without typing my way through every step.

Instead of trying to type a perfect starting prompt, I can just talk through the idea naturally or give the GPT my previous echo capture and just talk about it naturally.

Something like:

I want to write an article about why voice is underrated when using AI. The obvious point is speed, but I don’t think speed is actually the most important part. The real point is that voice lets you give the AI more context because you’re not compressing everything into a tiny typed prompt. It also makes iteration easier because you can correct the AI more naturally. I also want to connect this to how my content GPT works, where the process is staged rather than just jumping straight into writing.

That spoken prompt is not perfect. It’s not going to be and it doesn’t need to be.

It just needs to get enough thinking into the system to start the process.

From there, the GPT can start doing what I designed it to do. It can pull out the core insight. It can identify the supporting concepts. It can challenge whether the article is really about voice, prompting, context, workflow design or AI-assisted thinking. We go back and forth discussing, debating, adding more context where it might help.

I answer again by voice:

Actually, the article is not really about voice as a feature. It’s about voice as an input layer. The point is that better AI outputs come from richer inputs, and voice makes richer input easier. That’s the real argument.

And so it goes until I arrive at the insight I’m crafting.

Not because the AI magically knew what I meant from the start, but because the voice workflow made it easier to keep correcting and iterating the thought.

The custom GPT provides the structure.

Voice provides the flow.

Together, they turn a messy thought into a developed insight, and then into a usable asset.

That is a very different workflow from typing a one-line prompt and hoping for magic.

Where the Time Saving of Voice Actually Comes From

The time saving of voice is not just in the first context prompt.

That is obvious and yes it certainly does add up over time. If someone types at around 40 to 50 words per minute, a 1,000 word context for the prompt can take roughly 20 minutes or more to write out.

If they speak at around 120 to 150 words per minute, the same amount of raw context can now be captured in around 7 to 8 minutes.

So yes, the first input is faster obviously but the bigger saving is in the iteration loop.

The normal typed workflow looks something like this:

Type prompt.
Read output.
Type correction.
Read revision.
Type another correction.
Read revision again.
Get annoyed.
Either accept something mediocre or start again with another short prompt worded slightly differently.

This is where a lot of quality is lost.

Not because the person does not know what they want.

They often do know.

They just can’t be bothered typing the correction properly.

So instead of saying:

No, the issue isn’t really that AI saves time. The issue is that voice lets you add more detail, and that improves the thinking quality of the output. Make that the central argument.

They type:

Make it better.

Or:

Try again.

And then everyone wonders why the output is still average!

Voice reduces the friction of correction.

You can speak the missing context as it occurs to you. You can clarify the actual point. You can say what feels wrong. You can redirect the AI without turning the whole thing into another typing exercise.

You feel like you are steering the AI ship so to speak, because you are!

The first output is rarely the final output. The value is found in the back and forth.

Voice makes the back and forth easier to sustain, which means you are more likely to keep refining until the result actually reflects what you originally meant or envisaged.

That is where the real productivity gain sits.

Not just in faster input.

In faster clarification.

Faster correction.

Faster iteration.

Better steering.

And ultimately, better outputs.

A Construction Example: Project Manager Prompting

Take a Project Manager preparing a client update.

The typed version might be:

Summarise project progress for the client.

That prompt will probably produce something polished and mostly useless.

Why?

Because the AI does not know what actually happened.

It does not know which delay matters.
It does not know what the client is sensitive about.
It does not know what should be positioned carefully.
It does not know what needs to be mentioned now to avoid a painful conversation later.
So the AI fills in the blanks.
And as usual, the more blanks you leave, the more generic the output becomes.

Now compare that with a spoken prompt:

I need to prepare a project update for the client. The foundation works are running about four days behind because of the supply issue with the reinforcement delivery last week, but we’ve already sourced an alternate supplier locally and recovered about two days in the programme. I don’t want the update to sound defensive. I want it to be clear that there was an issue, we responded quickly, and the current forecast still has us completing the stage by July. Also mention that the weather risk is still being monitored because next week’s forecast could affect the external works. Keep the tone professional, calm and practical.

That is a completely different input.

The AI now has actual context.

It knows the audience.

It knows the issue.

It knows the mitigation.

It knows the tone.

It knows what not to overstate.

It knows what the Project Manager is trying to achieve.

The output will almost always be better.

Because the Project Manager gave it the information it needed.

This is where voice becomes very useful for people who are busy, mobile or working across multiple jobs.

A Project Manager walking back from site can speak a proper project brief into AI in two minutes. A Supervisor can explain what actually changed during the shift. A Contracts Administrator can talk through the commercial context behind a variation. A business owner can describe the real issue behind a tender response before asking AI to help structure the draft.

The quality improvement comes from the context.

Again.

The quality improvement comes from the context!

The time saving comes from not having to type it all out.

Business value comes from getting better information into the system earlier, before it gets compressed into one-line prompts, half-useful notes or vague updates that create more questions than they answer.

Again, the principle is the same.

Better input creates better output.

Voice just makes better input much easier to provide.

Final Thoughts

Voice is not just a convenience feature. It changes how you interact with AI because it changes the quality of what you put into it.

Typing makes people compress their thinking. Voice lets them expand it.

And when you combine voice with a structured AI process, whether that is an insight development GPT, a project update workflow, a tender response or a field observation process, the result is not just faster prompting.

It is better transfer of context.
Better correction.
Better iteration.
Better outputs.

The productivity gain is not in replacing thinking.

It is in getting more of the actual thinking into the system at the beginning.

The people who get the most out of AI will be the ones who can transfer context, judgement and intent clearly.

Voice will end up being one of the best ways to do exactly that.

Papillon Trust