GPT-Voice Vs. Siri AI: What’s the Difference?

One of the next big things in the world of artificial intelligence is voice-enabled AI systems. And while the technology in itself isn’t exactly brand new – Siri, for instance, has been around since 2010 – it is rapidly changing and progressing.

Ethan Mollick, MIT graduate and Associate Professor at the Wharton School of the University of Pennsylvania, has been making waves with his discussion about and opinions on voice-powered AI.

Most significantly, Mollick has highlighted an important new way of separating different types of platforms within the industry and why their varying capabilities are important to highlight.

Essentially, he’s created two categories, including voice-AI platforms that act as agents as opposed to models that act as, what he refers to as, co-pilots.

Co-pilot AI platforms are smaller models that have limited capabilities. They’re fairly cheap, fast and run on simple devices like cell phones, making them more accessible for use as virtual assistants and such.

Agent Ai platforms, on the other hand, are more complex and take full advantage of the potential of AI technology. It’s able to engage in more natural conversations and is generally capable of far more.

However, separating different types of voice-AI technologies is one thing, but what does this difference mean and how do the products available to us – or soon to be available to us – compare in terms of real-world applications? Furthermore, what does the future hold for AI-voice technology?

We’re going to go ahead and compare two industry leaders in the world of voice-AI technology, Siri and its new AI model and OpenAI’s updated, more advanced voice feature on ChatGPT.

 

Siri Vs. ChatGPT Voice

 

According to the distinction set out by Mollick, Siri is, for all intents and purposes, a co-pilot AI platform, while ChatGPT Voice is an agent AI platform.

Let’s have a closer look at each of them and what sets them apart.

 

What Does Siri Do?

 

Siri company logo

 

As we know, Siri itself has been around for a good long while now, having initially launched in 2010.

However, early models used more basic types of AI, while its newest model, Siri AI, is using generative AI technology to enhance its capabilities dramatically.

Here are the most important aspects of Siri AI and its functionality:

 

  • Prioritises Safety and Security: Siri AI prioritises safety and security which can be difficult to achieve in the AI space. In order to do that, Siri needed to accept certain limitations, including being installed directly on phones rather than just being accessible via the Internet.

 

  • It’s a Small Model: Siri AI is small relative to other AI models, allowing it to be run on “weaker” hardware like cellphones. However, since it’s small, it’s not as fast as larger models can be, so it isn’t always able to provide accurate responses to complex questions. However, there are ways in which this is set to be changed in future models by giving users the option to scale up.

 

  • A Less Risky Approach: Siri AI’s limited capabilities are more a result of intentional decisions made by the company. The reasons behind this include the fact that larger AI models that aren’t carefully controlled are unpredictable and they can hallucinate. So, while there are potential ways to deal with that, Siri has taken the easier approach to simplify the model completely to avoid these issues altogether.

 

These decisions are what have resulted in Siri AI becoming a classic co-pilot, with the ability to complete specific yet less risky tasks.

The most significant takeaway from this type of model is that while co-pilot models have a lot of utility in certain contexts, their capabilities are simply not comparable to those of agents, and essentially, they’re not meant to be.

Co-pilot models are not anticipated to be the ones that result in massive progress in the AI industry. Rather, they’ve given up their lack of potential for increases in productivity and innovation for improved security and predictability.

 

 

What Does GPT-Voice Do?

 

OpenAI company logo

 

Now, GPT-Voice falls on the opposite end of the AI spectrum to Siri due to its classification as an agent, according to Mollick, and the differences between the two are vast.

While Siri AI is a more cautious approach to using AI, GPT-Voice is an attempt to fully utilise the immense potential of the technology, along with its complexities and risks.

ChatGPT-40 has had an AI speech feature for a little while, but the recently updated model takes things to a whole new level, and this is the one that’s achieved status as an agent in the world of AI.

So, what exactly are the most important features of the new model of GPT-Voice?

 

  • Large, Multimodal Model: OpenAI’s new GPT-Voice feature is multimodal, although all capabilities haven’t yet been totally enabled. In the future, AI models will likely be able to watch, listen to and interact with the world in far more advanced ways. This will allow these AI models to not only observe but use the information they gather to take action on your behalf.

 

  • Natural Speech: While previous versions of AI speech have been very robotic sounding, with extended pauses and other odd features, Open AI’s GPT-Voice model sounds significantly more natural. So much so, in fact, that Mollick referred to it as, “just plain weird because it feels so human in pacing, intonation, even fake breathing”. Essentially, it’s starting to sound far more human.

 

  • Fewer Constraints, More Risk: Co-pilot AI models have held back in many aspects of its technology to ensure security and decrease the risk of hallucinations and more, but GPT-Voice has fewer constraints. This means that it is walking a fine line between the risk associated with having fewer constraints and the potential for growth that this introduces.

 

OpenAI’s unrestrained, or at least less restrained (for now), approach to AI voice technology is where we’re likely to see progress in the industry, even though it brings with it significantly more risk and potential for unpredictable behaviour.

 

A Philosophical Difference

 

While the debate between Siri AI and GPT-Voice is an interesting one, in many instances, it actually missed the point of the main differences between the two platforms.

That is, the fact that they’re not competing with one another directly. They may both be operating within the world of artificial intelligence and specifically, AI-voice technology, they have different objectives and most importantly, different philosophies about the tech.

At the end of the day, this means that both platforms offer a great deal of utility to users and the industry at large, this utility is just different for each platform.