OpenAI has added a new feature to ChatGPT called “Agent”. It’s now live for Pro, Plus, and Team users, and lets the AI actually do things and not just chat. Think calendar management, planning dinners, checking the weather for a trip, comparing products, or building a slide deck from online research. All of that can be done in one place, and in one conversation.
The system uses a virtual computer that can run code, browse websites, click buttons, read text, and make sense of complex instructions. The user stays in control. ChatGPT asks before making any big decision and allows the person to pause, stop, or take over whenever they want. You can also watch what it’s doing in real time through a visual interface.
It can be told to, say, read your Gmail and check which meetings today need preparation, then read the latest news about those companies and summarise them in a few slides. Or it can book a dinner, generate a shopping list based on the menu, and add all of that to your calendar. It’s not just finding links. It’s doing the task from start to finish.
How Is This Different From What ChatGPT Could Already Do?
Before Agent, OpenAI had separate features like Operator, which could interact with websites, and Deep Research, which could analyse and summarise information. These worked well in isolation, but weren’t built to switch between clicking around and doing analysis.
Agent now puts all that into one. It can run code, fetch files, click through websites, and write clean summaries in one go. The interface is set up to keep the memory of the task intact across different tools… so even if the AI switches from browsing a site to analysing a downloaded spreadsheet, it doesn’t lose context.
During a demo, OpenAI showed the Agent planning for a wedding. It visited the wedding website, checked the date, found hotels nearby, looked up tuxedos, added weather forecasts, found gifts, and even helped a user find shoes. It presented everything in a final report, like a digital assistant who didn’t need to be reminded twice.
More from News
- How Are Smartphones Helping People Prepare For Earthquakes?
- Experts Share: Is Reddit’s Age Verification In The UK A Privacy Risk?
- 1ST Airport Taxis Expands to UAE: Launching Operations in Dubai and Abu Dhabi From September 2025
- Louis Vuitton UK Faces Serious Data Breach Amid Retail Attacks
- How Much Electricity Will AI Need By 2030?
- Small Business Owners Say They’re Worse Off Than During Covid, Here’s Why
- Reddit Will Now Have Age Verification Checks For Users In The UK
- Expert Advice On How To Stay Ahead Of Job Market Amid Layoffs
Does It Actually Work Better Than Before?
On “Humanity’s Last Exam”, which tests expert-level knowledge across subjects, the Agent scored 41.6%. That’s the best result on record for any AI model in this category, and the score jumped to 44.4% when the system was allowed to try a few approaches before picking the best one.
It also performed well on FrontierMath, which includes math problems designed to stump even experts. The Agent reached 27.4% accuracy which almost triple the performance of OpenAI’s previous o3 model.
OpenAI tested the Agent on real world tasks like building a financial model, scouting locations for green hydrogen wells, or putting together a competitive analysis. In about half the cases, the Agent produced work equal to or better than that of top human performers. It also beat o3 and deep research across almost every task type.
On SpreadsheetBench, which checks how well an AI can handle spreadsheets, ChatGPT Agent scored 71.3% when editing directly in .xlsx format. Copilot in Excel scored just 20%.
Is It Safe To Let AI Act On Your Behalf?
The launch does bring safety concerns, because his is the first time ChatGPT can carry out real world actions like clicking around the web, downloading files, and accessing services such as Gmail or Github.
To prevent trouble, OpenAI has added restrictions. For example, the Agent will not make purchases or send emails without asking first. It avoids anything high risk, like transferring money or giving legal advice. If something feels off, you can stop it mid-task, review what it has done so far, or delete all browsing data in one click.
One of the biggest risks OpenAI is watching for is “prompt injection”. That’s when someone hides a hidden instruction in a web page that tricks the AI into doing something wrong… like leaking your private data.
Luckily, the company has trained the model to detect these attempts, and extra checks are in place for anything sensitive.