Now that AI can control your web browser, the next frontier might be to take over your entire computer.
At least that’s what Seattle-based startup Vercept is trying to do with Vy, a currently free Windows and Mac app that can manipulate your mouse and keyboard to automate tedious or repetitive tasks. You just tell it what you’re trying to do, and then it takes control. Vy first launched as a beta for Macs in May, but has now been rebuilt and is available for Windows as well.
My experiments with Vy have yielded mixed results. If you’ve ever yelled at ChatGPT for failing to follow instructions, that frustration becomes magnified when AI is piloting your entire computer — tasks you might want to automate might just be done faster manually. Still, I can see some areas where an AI computer agent could be useful, which is why other companies (including Microsoft) are pursuing the same goal.
I spent a lot of time waiting
Kiana Ehsani, Vercept’s CEO and co-founder, says Vy is more human-like than the agent features in AI web browsers such as Perplexity Comet and ChatGPT Atlas.
While those browsers reportedly work by inspecting the underlying structure of web pages, Vy takes frequent screenshots to analyze what’s happening on your screen. It then executes mouse or keyboard commands to mimic the way you’d control the computer yourself. Ehsani says people are using it to automate Excel work, extract data from the web for sharing into apps like Slack, or figure out how to use new software.
“We want to have a model that understands your screen and takes action very similarly to how you do it,” Ehsani says.
This ends up taking a while, though, as each individual action requires Vy to take a screenshot and upload it to its servers for analysis. Everything from opening an app to clicking a menu button requires another screenshot and more time waiting for a response — so a routine that takes 10 seconds for a human might take Vy five minutes.
Vy has a couple ways to mitigate this. One option is to run tasks in “Background” mode, which lets you keep using your computer while Vy does its work in an invisible browser window. Vy’s capabilities are limited in this mode, though, as it can interact with files and web pages but can’t control other apps. (I had some impish fun getting Vy to fulfill various Microsoft Rewards tasks on my behalf—performing daily Bing searches, filling out various quizzes—but felt guilty about how much compute power must’ve been burned along the way.)
The other option is to schedule tasks for when you’re not around. For instance, I set up a daily routine for 7 a.m. that minimizes any open windows on my desktop, opens Obsidian, moves it to the center of the screen, and loads my to-do list. Watching Vy do this in real-time is excruciating, but scheduling it to run before I sit down at my computer—thereby forcing me to confront my to-do list—is pretty helpful.
Ehsani hopes that on-device AI will speed things up in the future. Instead of having to constantly upload screenshots and download instructions, the goal is for Vy to process everything directly on the computer, though it’s unclear when that might happen or how powerful a PC you’d need.
It needs a lot of hand-holding
Getting Vy to perform tasks on your computer can be a bit like bossing a child around, in that it’s liable to ignore or misinterpret your instructions.
A quirk of Obsidian, for instance, is that if you load the app while it’s already running, it will load an entirely new instance of Obsidian with a menu for choosing which notebook vault to open. To keep this from happening in my to-do list scenario, I asked Vy to only click the Obsidian icon on the Windows taskbar, which would load any existing instance of Obsidian instead of launching a new one.
But every time I tested the routine, Vy kept ignoring my instructions and would try to click the Obsidian icon on the desktop, thereby opening a new window. I would interrupt the assistant and tell it to focus on clicking the taskbar icon, but it had trouble finding it and kept trying to open the app in other ways. At one point it even clicked the Windows Start menu to launch Obsidian from there.
Ultimately I had to edit my workflow with clear instructions to never click the desktop icon, never open the Windows Start menu, and avoid using other methods to open Obsidian outside of the taskbar. I also had to lay out explicit guidance to look for a purple crystal icon that appears next to other icons in the taskbar. All told, I spent about 20 minutes troubleshooting this tiny routine that mostly involved minimizing some windows and clicking a button.
Vy does have an alternative “Watch and repeat” tool for creating workflows, in which it records your screen while you perform the desired steps. But this was even less reliable in my experience. When I tried setting up my Obsidian automation this way, Vy didn’t minimize any of my open windows and instead just moved its own app to the middle of the screen.
It raises some privacy and security concerns
Watching Vy take persistent screenshots of my desktop was also a reminder of how much personal info could wind up on Vercept’s servers. Every time Vy takes a screenshot, it captures everything on your screen, even if it’s unrelated to the task.
Until I started asking Vercept about its data retention policies, the company did not publish them on its website. Vercept now says it keeps screenshots for six months unless you delete the underlying chat manually. Either way, it keeps data for up to 30 days for safety purposes.
Ehsani says it doesn’t capture screenshots when Vy isn’t actively working on a task, and doesn’t perform any post-processing on screenshot contents. Still, a few people at Vercept have full access to users’ data, including their screenshots.
“There is a trade-off here,” Ehsani acknowledges.
As with any agentic AI system, Vy risks making users vulnerable to prompt injection attacks, in which an attacker hides malicious instructions in web pages, emails, or calendar invites. Vercept says it has some ways to mitigate this—for instance, by instructing Vy to watch for signs of malicious behavior—but no AI system has a foolproof answer to this problem yet.
It seems inevitable anyway
Despite the potential problems and limitations, AI agents that control your devices are coming. Microsoft already has a mode for its Copilot Windows assistant that can scan what’s on your screen and provide guidance, and it’s testing a “Copilot Actions” feature that can perform tasks on your behalf.
Other developers are also pursuing this idea. Github is full of experimental AI control projects, and commercial alternatives include NeuralAgent and Screenpipe. Vercept is notable among these efforts for having raised a $16 million seed round in January, with backers including former Google CEO Eric Schmidt and DeepMind Chief Scientist Jeff Dean.
Ehsani says the goal is to expand beyond just a single computer. An Android app is also in the works, and she hopes that you’ll eventually be able to give Vy instructions on your phone and have it carry the actions out on your computer, or vice versa. “One of our main visions is getting rid of mouse, keyboard, and touchscreens altogether,” Ehsani says.
For now, at least, the natural speed at which humans can click around a desktop gives them the edge.
The final deadline for Fast Company’s World Changing Ideas Awards is Friday, December 12, at 11:59 p.m. PT. Apply today.


