Manus: The World's First True General-Purpose AI Agent

Almost simultaneously with Apple's new product launch last night, the entire tech circle was flooded with news about a product called Manus. This is the world's first true general-purpose AI Agent. From the use cases showcased on its official website, it's clear that Manus can think independently, plan and execute complex tasks, and directly deliver complete results. Compared to Claude's Computer Use which can also handle multiple tasks, or Agents that can help you order food delivery and book hotels, Manus covers a wider range of domains and achieves higher execution quality.

Manus has set a new record in the authoritative GAIA benchmark test, with performance far exceeding OpenAI's similar products. The name "Manus" comes from Latin, meaning "Mens et Manus", or "mind and hand". This is also the motto of MIT, encouraging students to turn ideas into practical results. Just a few hours before Manus' launch, founder Xiao Hong posted on the Jike platform saying "the climax is coming", and shared an excerpt from Shakespeare:

It's hard to judge now whether the birth of Manus is a milestone for AGI, but it's very likely to truly bring the Agent era into its "climax moment". Here's the link to apply for Manus: https://manus.im/invitation

Can Manus really "get work done" like screening resumes, selecting houses and trading stocks? Officially, Manus is claimed to be not just a conversational AI tool that only chats, but a truly autonomous intelligent agent. While other AIs may only generate ideas, Manus can think independently and take actions.

The official view is that Manus represents a new paradigm of human-computer collaboration, and may even be a window to AGI. Along with Manus, a 4-minute demo also went viral. In these use cases, Manus completely autonomously completes the entire process from planning to execution, demonstrating true Agent capabilities rather than simple assistant functions.

For example, let's start with a common HR task - screening resumes. Right from the start, the demo drops a big move. The official sends Manus a compressed file containing 10 resumes, and Manus works as efficiently as a professional recruiter. It first decompresses the file, then browses each resume page by page, recording important information. Manus can also process files asynchronously, meaning you can shut down your computer anytime and it will notify you when the task is done. Of course, during this process, you can also give it new instructions at any time.

Next, upload 5 more resumes to Manus. After carefully reading all 15 resumes, Manus provides ranking recommendations, along with candidate information and evaluation criteria for reference. But that's not all. We can also have Manus generate a spreadsheet. Since Manus has knowledge and memory capabilities, next time it performs a similar task, it will directly deliver the results in spreadsheet format.

In another demo case, based on family income and children's schooling requirements, Manus is tasked to select a safe, low-crime community in New York and purchase a property that meets the criteria. Faced with such complex tasks, Manus methodically breaks it down into multiple steps and creates a detailed to-do list:

  1. Search and read articles about the safest communities in New York.
  2. Research middle schools in New York.
  3. Write a Python program to calculate the budget.
  4. Based on the budget, filter suitable properties on real estate websites.
  5. Integrate all information, write a detailed report and organize relevant materials.

Switching to the third use case, Manus transforms into a professional stock analyst. When asked to analyze the correlation between the stock prices of Nvidia, Marvell Technology, and TSMC over the past 3 years, Manus can access authoritative data sources via API. After verifying the data, it starts writing code for data analysis and visualization.

Upon completing the data analysis and visualization, Manus can also create a website based on this data. With the user's authorization, it can even deploy the site online and provide a shareable link.

After experiencing Manus, netizen @DavidAIinchina gave it extremely high praise - "an incredible use case".

According to official statements, the content showcased above is just the tip of the iceberg of Manus' capabilities. The official website (https://manus.im/usecases) shares more cases of Manus handling real-world tasks. From personalized travel planning, in-depth stock analysis, insurance policy comparison, supplier sourcing, financial report analysis to professional data organization, Manus can handle them all with ease.

Although Manus is not yet fully open, its popularity has already swept the entire internet. On major platforms, netizens are flooding the comment sections late at night asking for invitation codes, showing how much buzz it has generated.

And in the GAIA benchmark test used to evaluate the ability of general AI assistants to solve real-world problems, Manus has achieved SOTA levels across all three difficulty tiers. To ensure reproducibility, Manus was tested using the exact same configuration as its production version.

In addition to benchmarks, Manus has also solved real-world problems on platforms like Upwork and Fiverr, and proven its strength in Kaggle competitions. None of this would be possible without the excellent open source community, so the official team also hopes to give back to the community.

Manus uses a multi-signature (multisig) system driven by multiple independent models. Later this year, the official plan is to open source some of these models, especially the postering (inference) part of Manus.

So who is behind this industry-shaking product made by a Chinese team with millions of users? According to reports, the founder of Manus AI, Xiao Hong, is a 2015 graduate of the Software Engineering program at Huazhong University of Science and Technology. After graduation, he founded multiple startups. In 2015, he founded Nightingale Technology, launching "One Companion Assistant" and "Micro Companion Assistant", serving over 2 million B-end users and receiving investment from Tencent, Zhen Fund and others.

Another prominent AI product associated with Xiao Hong is Monica. This is an AI assistant claimed to be "All-in-One", initially launched as a browser plugin. By integrating mainstream large models (such as Claude 3.5, DeepSeek, etc.), Monica provides functions like chat, translation, copywriting, etc. Users can create customized tools via natural language and share them to the tool plaza.

Monica is also focused on overseas markets early on, with over a million users, becoming a leading product in the AI plugin field. In February this year, the Chinese version of Monica (monica.cn) started internal testing and is currently open to domestic users for free. This version is built on DeepSeek R1 and V3 models, with deep reasoning and thinking capabilities, and supports memory functions and real-time internet searches.

Manus' technical philosophy: less structure, more intelligence

Manus' technical philosophy is a bit different from the mainstream, advocating "less structure, more intelligence." They believe that when the data is high-quality enough, the model is powerful enough, the architecture is flexible enough, and the engineering is solid enough, capabilities like computer use, deep research, coding agents, etc. will naturally emerge without needing to be designed as specific product features.

As one of the representatives of making a big splash, GPT-4-Turbo's average score on the GAIA public leaderboard is less than 7%, and even solutions using complex multi-agent systems only reach 40%. Manus' performance can be said to be "far ahead".

In a recent interview with Zhang Xiaojun, founder Xiao Hong also talked about the then-unreleased Agent product Manus. "It does look like it should just be a chatbot, which fits people's imagination, but it's very complex on the application side, different from Monica, just using different models well is quite complex."

Xiao Hong also divides current AI applications into two categories: one is to fill the gaps and shortcomings of major application products, and the other is to provide unique solutions for specific scenarios. For example, Perplexity (providing internet search functions) and Monica (browser plugin form) both belong to this category, filling the gaps left by existing products.

Model-driven new scenario applications mainly appear in the image and video fields, directly driven by model technology advancements. Products like Pika and Runway leverage model capabilities to create new application scenarios.

Some users joked that Manus is "extremely good at repackaging", but in fact Xiao Hong doesn't mind letting users know that his products use other people's models. As early as last year, he compared Monica to consumer electronics products and put ChatGPT's logo on the official website.

The new era of human-computer interaction has arrived, but don't rush to put Manus on the AGI pedestal.

In early 2024, tong made a prediction: large models will become the new operating system for smartphones, and natural user interfaces (NUI) will gradually replace existing graphical user interfaces (GUI). And the important entry point to realize this new interaction is the Agent.

Last year we saw similar cases at many phone launch events. vivo's launch event showcased "Phone GPT" that can order food via AI, Huawei's HarmonyOS' Xiaoyi and intent framework, Honor's YOYO agent, and Zhipu's AutoGLM - the core is the same: let AI imitate the human Plan-Do-Check-Act loop to operate devices like humans do.

Zhipu AI CEO Zhang Peng previously mentioned that the current Agent capability is more like adding an intelligent scheduling layer between the user and applications, linking all applications and even all devices. This can be seen as a prototype of the large model universal operating system LLM-OS, which will have a huge impact on the form of human-computer interaction.

OpenAI founding member and AI technology expert Andrej Karpathy has also talked about the large language model operating system (LLM OS) many times. He believes that large models are to some extent a new type of computer and operating system that can connect various software and hardware, as well as all modal information that makes up peripherals, and perform various tasks through function calls.

In traditional operating systems, you need to build a bunch of peripherals around the CPU, such as mouse and keyboard, disk storage, and cache space. In LLM OS, the large model itself is the central processing unit. I/O peripherals are no longer just mouse and keyboard, because LLMs can accommodate more modal data input and output. At the same time, the external tools invoked by large models will also be upgraded from traditional software to intelligent agent tools.

Cross-application operation is a very critical part, meaning that Agents can achieve more complex autonomous coherent operations and may move towards real commercial landing. As for whether the services provided by various Internet companies can be connected, it may be the biggest obstacle to realizing this interaction in the future.

However, the way many AI assistants currently implement agent operation is actually by invoking the accessibility features permissions of the phone to control screen clicks.

The emergence of Manus means that under the Agent model, AI can understand needs and work independently until the task is completed. This is undoubtedly a big step in the field of human-computer interaction, showing us the potential for AI to transform from a tool to a partner. But to say we've already stepped into the AGI door with one foot is still premature.

Xiao Hong himself also mentioned that early Agents are more like "feature phones" that need continuous iteration and improvement. Current Agents still rely on the improvement of model capabilities and more complete virtual environment support to truly handle various long-tail tasks. If compared to intelligent driving, it's probably equivalent to upgrading from L2 to L3 level assisted driving.

Although Manus performed well in the GAIA benchmark test, this does not mean it already possesses all the characteristics of artificial general intelligence. The road to AGI is still long, requiring solving multiple challenges such as model capability, autonomous learning, task generalization, etc.

But because of Manus' breakthroughs in autonomy and generality, in the great navigation towards AGI, we have one more star to light our way.