Tech bros are always chasing the next big thing, so much so that some developers are already trying to imply that chatbots like ChatGPT are old hat. The real big AI innovation, they say, is language model-powered AI “agents” able to carry out multiple tasks in a row.
Compared to the “prompt, response” model of current chatbots, these agents like Auto-GPT are potentially capable of writing whole reams of code, building websites — or in one surprising case — making a call to a physical pizza place and placing an order.
Massive Update for Auto-GPT: Code Execution! 🤖💻
Auto-GPT is now able to write it’s own code using #gpt4 and execute python scripts!
This allows it to recursively debug, develop and self-improve… 🤯 👇 pic.twitter.com/GEkMb1LyxV
— Toran Bruce Richards (@SigGravitas) April 1, 2023
These agents are essentially self-contained systems that use modern genartive AI models to automate tasks. Most agents use OpenAI’s ChatGPT and GPT-4 as a base, but several other homespun agents also take in generative AI image and voice models to create some surprising, if sometimes creepy results. These systems feed the AI’s outputs back into themselves, creating a program that can run semi-autonomously with an overarching goal.
Say I wanted the AI agent to create a plan to upgrade my PC with a limited budget. In several Agent models, I can set it on tasks like “find and rank the most-current different graphics cards based on price for under $US500 ($694)” and then do the same with a CPU, RAM, and more. Then I can list a task like “Use those lists and determine the best PC one can build for under $US1,000 ($1,388).” Depending on the model, it could give me a good idea of where to find my next upgrade. It could also lock up and tell me it doesn’t know how to complete the task.
Compared to your regular old AI chatbot like ChatGPT, these AI agents can connect to the internet and search for information that isn’t present in their own training data. The other big selling point is these agents have more memory than a regular ChatGPT session. The thing is, while these agents work surprisingly well on very basic, specialised tasks, you really can’t leave them alone for too long. Large language models are already prone to spitting out false information, and running multiple instances of a large language model can dramatically increase the likelihood of failure. AI is fully capable of coding, but even one mistake can make the entire thing fail. Sure, you could automate routine code checks, but what if those fail as well?
In my experiments, AutoGPT can solve certain simple & well-defined knowledge tasks well, but is unreliable *most of the time* for harder tasks that are truly useful. I also worry a lot whenever I give it python execution and disk access.
2/
— Jim Fan (@DrJimFan) April 16, 2023
So are agents actually the evolution of AI, or just a chain of Google searches? Well, the answer lies somewhere in the middle. Despite the moniker, AI simply isn’t intelligent by any real standard. These agents need quite a lot of guidance, and along the way there’s plenty of opportunity for the system to produce wrong information, spoiling the entire process. Before they become truly autonomous, these agents are little more than clever toys.
That doesn’t mean they aren’t interesting or don’t have the capacity to radically change how we currently think about AI. We’ve gone through some of the more interesting AI agents models currently out there, plus a few of the more dramatic agents built for specific tasks that you can check out by clicking through.
Want to know more about AI, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligence, or browse our guides to The Best Free AI Art Generators, The Best ChatGPT Alternatives, and Everything We Know About OpenAI’s ChatGPT.
Yo dawg, I heard you like AI…
Created by dedicated AI evangelist and developer Toran Bruce Richards, Auto-GPT is more of a parent program used to generate AI agents. Essentially, the program uses a script to link outputs of the GPT-4 large language model, feeding itself based on its responses so it can iterate and correct itself. It requires a bit of setup, though you can find a good tutorial for creating your own Auto-GPT instance in this Twitter thread by developer Sully Omar.
What’s most impressive about Auto-GPT is how it all runs off natural language prompts. A user can give the AI up to five goals to accomplish based on the original description. By default, users have to give it permission to complete each task, though there is the option of letting it go freestyle.
Some users said they to were able get the AI to order food or book flights online for them through a platform built on Auto-GPT. Omar showed how he managed to get Auto-GPT to complete some simple market research. So far, the main application for the agent has been creating lists and performing simple research tasks.
Just when I thought AI couldn’t get any more impressive, AutoGPT and AI Agents blows my mind 🤯
Here, AI agent performs product research and creates a summary on the top headphones.
This is insane! And the craziest part is that it’s powered by GPT-4. pic.twitter.com/47AwQZfaIC
— Sai Rahul (@sairahul1) April 13, 2023
And some of those designs can be malicious. As first reported by VentureBeat, security researcher Simon Willison wrote about his concerns that simple prompt injection techniques could create avenues for bad actors to attack people through external tools like Auto-GPT.
BabyAGI is the little infant that could
Alongside Auto-GPT, BabyAGI is the other major code repository causing waves in the AI scene. Yohei Nakajima, BabyAGI’s creator, said the whole project came together thanks to a side project he was working on with ChatGPT. It’s similar to AutoGPT, but instead of planning each step individually, it plans a sequence all at once and then acts on them.
After open-sourcing the code, users have been able to connect it with other online tools. And of course, this has allowed more developers to make their own UI for easier access to the BabyAGI agent.
As with all the general-purpose agents currently around, they only have limited capacity for specific tasks beyond making lists. Some users say they have managed to get BabyAGI to simplify and test accurate code through a separate application, though it took quite a lot of handholding and trial and error.
My brother (@0xDACA) and I worked on a fun project tonight, inspired by @yoheinakajima : Coding Agent that follows the Test Driven Development (TDD) methodology!
You write the tests – and the agent runs in a loop until it creates the feature properly!
1/6#buildinpublic pic.twitter.com/EhkysuIdJW
— Adam C.H. (@adamcohenhillel) April 8, 2023
Camel holds a lot of water
The Camel AI agent is essentially two agents that work side by side with each other. Since humans often need to be there to hold the AI agent’s hand, the developer’s idea is to add another AI agent to “role-play” as the human and guide its counterpart.
Essentially, the program acts by creating individual tasks and inputs them into the agent. If I ask it to make a peanut butter and jelly sandwich, the AI will first tell its counterpart to gather all the ingredients and tools, then place the slices of bread on a plate, and on and on until it completes the task.
At this point, Camel is more of an experimental model than a user-side platform, but more developers have talked about exploring multiple agents working in tandem, and we can expect to see more of this in the future.
If you don’t want to go through GitHub
There’s a web-based version of Auto-GPT called AgentGPT, though it offers very limited controls on what tasks it will perform, and the demo will eventually shut itself off after a certain period of time. Still, with a few simple prompts I created an agent called “Crunchatize me Captain” trying to create a breakfast cereal combining the worst, most-processed ingredients into one horrid box. The new cereal combined sugary Frosted Flakes with stale Rice Krispies and “chemical-laden” Lucky Charms marshmallows. Yummy.
This program asks for an OpenAI API key. That may be a real sticking point, as OpenAI explicitly tells users not to share their keys with outside clients. It’s also limited in how users can force the system to approach each task.
“God Mode” is less dramatic than it sounds
“God Mode,” is essentially like AutoGPT and AgentGPT, though it’s also in-browser and requires a connection to a Google account or Twitter account, and an OpenAI API key. A reminder: OpenAI says not to give out your key, so take that into consideration.
The system first asks for a prompt then creates a suggested multi-point action plan and then asks for user input for each part of the process. For instance, I asked God Mode to “Identify the best way to peel a banana.” The system then created action items like “research different methods of peeling bananas” and “conduct experiments to compare the efficiency and ease of each method.” Users can add their own tasks or accept the suggested tasks before running the program.
Cognosys takes the hard work out of implementing AutoGPT and BabyAGI
After working with several different AutoGPT and BabyAGI implementations for ease of use, I’ve found that Cognosys had the most complete UI for automating tasks. It’s essentially the same as God Mode and AgentGPT, but it requires the least amount of user data up front, and I personally found it did not need an OpenAI API key to work upfront. Still, that could change in time.
Cognosys’ creator Sully Omarr said this beta is still a very early version, and that he plans to add more search capabilities, custom agents, and connection to GPT-4. Unfortunately, the UI does not include a search function like AutoGPT does running it by itself, but Omarr said he is working on it.
I asked it my banana peeling question, and it established that both the “backwards peel” and “freeze” methods were the best options for human hands, noting the pros and cons of both. Unfortunately, the system was much less thorough when I asked it to create a 5-point action plan for dealing with New York City rats. It didn’t even manage to finish a fifth point, and the best it could offer was using rat-proof containers on garbage and food waste cans.
The “Do Anything Machine” is AutoGPT that uses your own data
Over the weekend I finished the to-do list that does itself.
Everytime you add a task, a GPT-4 agent is spawned to complete it. It already has the context it needs on you and your company, and has access to your apps.
It’s called the Do Anything Machine (Link in thread) pic.twitter.com/4Mn7cf67va
— Garrett Scott 🕳 (@thegarrettscott) April 11, 2023
While agents like AutoGPT have the ability to search the internet for information, the so-called Do Anything Machine also advertises it can access users’ data and apps, as long as you’re willing to give the platform access to that information on whichever platform you’re using. Developer Garrett Scott based the system on BabyAGI and showed how it acts almost like an AI-based to-do list. It automatically spawns tasks based on an initial prompt and then runs those in the background.
The platform is currently free, though there’s currently a waitlist to access the AI agent on the Do Anything Machine website. The agent platform’s privacy policy mentions it collects basic user information as well as “other information you choose to share.”
AI Developers really want a forever version of Seinfeld
The Twitch stream Nothing, Forever was an early attempt at an AI-fuelled program generating its own content, and it could be considered an AI agent before the moniker pulled itself out of the evolving AI lexicon. The show was taken down at one point after a character got a little out of hand and was later un-cancelled, but the creators aren’t done with the idea of a self-developing Seinfeld.
YouTuber All About AI created his own version of Seinfeld using AI agents to generate fully-voiced episodes. Two separate agents simulate a Jerry clone, while another pretends to be George. Their voices are provided thanks to ElevenLabs, a program for creating AI-generated speech using voice clips. Using a Python script, the YouTuber managed to get the AI to create a back-and-forth script where the two characters worked off each other like separate characters. The AI could chat back and forth about using GPT-4 to try and find a date. Jerry, it seems, is a “connoisseur of classic television, particularly Seinfeld.” How meta.
And perhaps if you’re not too interested in Jerry and friends, you could watch a different AI-based Twitch stream meant to recreate one of The Simpsons most-memed episodes.
ChatGPT Sim is like an autonomous session of The Sims using ChatGPT
One of the most impressive examples to come out of this new push for AI agents is this fun experiment from researchers at Google and Stanford University. ChatGPT Sim is a hands-off version of Harvest Moon that includes cute, Pokemon-like 2D sprites living and working in a row of small apartments. These AI avatars perform daily activities like working and eating, and then they initiate conversations with each other. The AI characters can reference past conversations and express their plans for the day.
In effect, each character is an individual AI agent unto themselves with their own memories, friendships, and goals.
In this environment, users can still interact with the individual agents to see what they’re thinking or what they’re doing. It’s an example of what some hope will be a greater expansion of what’s capable in a virtual environment.
Ordering a pizza using an AI agent
This guy built a GPT4 AI that followed a prompt to find and call a Pizza Co and order a pizza without them realizing it was a bot. #AI is doubling its power every 3 months. Scary to think where we will be 6 months from now… #GPT4 #ChatGPT Full video: https://t.co/niOGgbLGZC pic.twitter.com/l3EZWershI
— Roger James Hamilton (@rogerhamilton) April 13, 2023
There’ve been plenty of successful efforts to get an AI to order a pizza for a few months now, but one user combined an AI agent and AI-generated voice software to place an actual order on the phone.
The original creator delisted their video on YouTube, but a separate tweet shows how the creator used GPT-4 to create an agent that could not only find and order a regular ol’ 11-inch pie, but it would call the pizza shop and use an AI-generated voice to actually place the call.
The agent uses multiple different interfaces such as ElevenLabs AI voices and Twilio to actually make the call.