Do you have a KVM job?

Sharon Gai
Aug 4
7 min read

And the AI acceleration curve keeps on curving.

On July 17, 2025, OpenAI announces ChatGPT agent. ChatGPT Agent is an AI-powered assistant that can carry out complex, multi‑step tasks on your behalf using its own virtual computer. It merges the powers of previous tools like Operator (which handles web interactions) and Deep Research (for long-form reasoning and web synthesis) into a unified system capable of both reasoning and action. I recently got access to ChatGPT agent and sent it on three different adventures. The below is my result.

My first task was simple.

Create a banner on Canva with a quote that I often say like Embrace AI.

It started to execute and asked for my log-in. It wasn’t long before I grew impatient and took over control and finished up the task myself.

It was repeatedly stuck on this screen where it needed a human to pass the CloudFlare test.

My second task was a bit more complicated.

I asked the Agent to create a new clothing line for me with quotes pulled from my keynotes.

It started to read all sorts of information. (Sometimes I forget how much content of mine there is on the world wide web) Eventually it kept on reading. For some reason it wouldn’t stop READING. We can blame my lack of clear instructions (I didn’t specify how many quotes or really how many shirts I wanted to create, or really what piece of clothing for that matter) I did want to see how it would deal with this ambiguity. The results of this errand proves that a bit of common sense intelligence is still missing. Instead of asking me, the owner, how many quotes it thought was enough, it made its best decision – which was not stopping.

I had to interrupt the flow and say you have enough, keep going. It then proceeded to create designs. It didn’t think to call on DALL-E to make the design. Instead it continued work on the terminal side to code the results. It ended up with this:

Here are mock‑ups of the clothing line designs based on the concepts we discussed:

· T‑shirt with “The web is about to change”

· Hoodie with “AI has drastically changed search”

I’m not sure how it thinks those shapes remotely resembles clothing.

In the end, I went back to DALL-E myself and created the following designs.

Not bad. But it needed me again.

Would you buy this shirt? I will need a new name for my clothing line. Hmmmm.

My third task was for it to email out a series of introduction emails to a list of emails I had. This, Agent completed without much fuss. However, it was miles slower than if a human administrative assistant were to do it.

My Takeaway

The autonomy of ChatGPT Agent still leaves much to be desired. It doesn’t yet demonstrate the practical intelligence or “street smarts” of a capable assistant familiar with how to leverage AI tools. That said, it’s still extraordinary that we’re at a point where we can even trial something like this.

A better-performing alternative I’ve found is Vy Intercept. It lets you record a task on-screen, training Vy to mimic and repeat it. If the workflow is straightforward and repetitive, Vy handles it well.

Why are they still so slow?

Apart from my review, I’ve also read numerous other reviews on the web about other users’ experience testing our the Agent for different tasks. The unified response is that it’s very slow, and most of the time, the user ends up pretty frustrated and completes the task themselves. So why are they still slow?

Because of how they're fundamentally built. Right now, most agents interact with the web the same way a human would: by clicking through websites, logging in, waiting for pages to load, and trying to visually understand what’s on the screen. This is inherently slow compared to direct API calls, which are much faster and more precise. The agents aren't just following a script; they’re trying to reason through what to do next with each step, often generating code or instructions on the fly. That kind of reasoning takes time, especially when the system is built to be cautious and avoid mistakes.

They’re also limited by their lack of deep memory or context. If they don’t remember your preferences or past actions, they end up repeating steps or asking for things you've already provided. This adds friction and slows things down. Security restrictions make things worse. These agents are often run in sandboxed environments to keep them from doing anything dangerous or unauthorized, but that security comes with performance tradeoffs.

On top of that, because most websites are built for humans, not machines, agents are forced to use awkward workarounds, like scanning the page visually instead of calling an internal API. It’s the equivalent of having a robot navigate a house without being allowed to use the doors or light switches directly. Until these agents are given more direct access, deeper memory, and tighter integrations with the systems they’re meant to help with, they’ll continue to feel sluggish. The promise is there, but the infrastructure still needs to catch up.

Be Careful What You Connect

One thing I quickly learned: you need to be cautious when logging the Agent into external sites like Gmail, Google Drive, or GitHub. Once authenticated, it can access files, emails, calendars, even take actions on your behalf, like modifying settings or sharing documents.

That opens the door to something called a prompt injection attack. Say you’re planning a dinner and ask the Agent to cross-reference your calendar and a recent email thread to find the best restaurant. Seems safe. But if the Agent stumbles across a malicious blog comment during its research, one that includes hidden instructions, it could be tricked into, say, pulling sensitive data from your inbox and unknowingly sending it to a third-party site.

OpenAI has layered in multiple security mechanisms to guard against this, but the safest move is still to only enable the connectors necessary for your task, and to log out when you're done.

I once wrote a piece on why I think we should use burner accounts when interacting with AI tools. It leads me to believe that this will be more and more important in the future.

The KVM Job

Ever heard the phrase KVM job? It stands for Keyboard, Video Monitor, Mouse. If your job mostly relies on those three tools, it may be ripe for automation. Tools like ChatGPT Agent and Vy are learning to click through interfaces just like a person would.

As of right now, in three of my tasks, the Agent achieved only one. It will take many more months (probably years) for this type of technology to be at a point where it is completing 99% of the tasks we give it. Until then, humans will be in the loop, to guide and demonstrate (and probably in many cases) just takeover because the Agent is not performing up to par.

When will they get better?

Very slowly.

First, models like GPT-4o can now handle reasoning, vision, speech, and interaction in one place, which cuts down on complexity and error. Second, developers are building libraries of reusable tools, memory modules, and agent frameworks like AutoGen, LangGraph, and CrewAI, which make agent behaviors more reliable and structured. Third, companies are starting to open up internal APIs or build native agent interfaces (think: what OpenAI’s partnership with Stripe or Canva enables). That’s a huge unlock, because agents get dramatically faster and more useful when they can stop screen-scraping and start speaking the native language of the app.

Realistically, you’ll start to see major speed and reliability improvements in late 2025, especially in tools built into core productivity apps (think Google Workspace, Microsoft 365, Notion, etc.). General-purpose agents that can navigate multiple websites seamlessly will take longer, probably 2026 before they feel fast, dependable, and “invisible.” The breakthrough moment will come when agents move away from mimicking human browsing behavior and instead operate as first-class API clients with persistent memory and built-in goals. That’s when you’ll stop noticing the slowness altogether.

Why AI Browsers Matter

Perplexity recently announced its upcoming AI browser, and it's not alone. Multiple AI companies are racing to own the browsing layer. Why? Because the internet was built for humans, not AI. Buttons, icons, and GUIs (graphical user interfaces) were never designed for machines to parse efficiently. AI agents operate more naturally in code-based environments, not the visual interfaces we’re used to.

It’s like speaking two different languages: one made for humans, the other for machines. AI needs an interface optimized for its own kind, and that’s what the AI browser revolution is all about.

We’re Hurtling Towards Her

Right now, there are about 500 million weekly active ChatGPT users. A much smaller number are actually using ChatGPT Agent (there are only ~10 million paid users, that’s less than 2% of ChatGPT’s total user base). So no, this isn’t mainstream, yet.

But it’s coming. Fast.

I used to play a clip from Her in my talks last year and people would laugh at how absurd the premise was. But now? With products like ChatGPT Agent, it’s not so funny anymore. It feels closer to that reality, especially with its recent acquisition of io Products, Inc., which had been founded just a year earlier by Sir Jony Ive, the legendary former Chief Design Officer at Apple.

Jony Ive created io with the vision of rethinking how people live with technology. The company stayed largely under the radar, but it was known to be working on AI-native consumer hardware, something like a little device like the Humane pin, that could act as a personal assistant. This fits squarely within OpenAI CEO Sam Altman’s broader vision: that new devices will emerge not as competitors to phones or laptops but as a new category altogether, ones that rely on large language models and ambient computing to assist users contextually.

Altman has hinted that the first product from this collaboration may be something radical, calling it “potentially the coolest piece of technology the world will have ever seen.” While no official product has launched yet, rumors suggest prototypes include AI-enhanced wearables like headphones or pocketable assistants that use cameras, sensors, and LLMs to interpret the user’s surroundings and anticipate their needs. Remember Friend.com? The necklace that shocked the world? I wrote about it here. That was a year ago. Fast forward to today, this type of device now has tons of competitors. Apparently parties in San Francisco have all sorts of people wearing small recording devices. A Black Mirror episode come true.

Do you feel like you need to catch up on AI and its latest tools? Here is a course I made inspired from some of my conversations with the audience at a recent keynote.