The Technology Behind the Controversial Manus: No MCP, 29 Tools, and Based on Claude and Qwen Models

After causing a sensation and sparking controversy in the domestic AI circle overnight, the so-called world's first general-purpose AI Agent product, Manus, has seen its overseas attention rise. Many users who obtained invitation codes have conducted technical dissections, with mixed reviews. According to the numbers on the official Manus AI Discord server, in just a few days, the platform's user base has grown to 185,752 members, with over 15,000 people using it online.

Unlike some companies that develop large AI models from scratch, Manus did not build its AI Agent entirely from the ground up. Instead, it leveraged the power of the open-source community to integrate existing advanced models and AI tools. The core technology lies in how to efficiently combine and coordinate these models and tools.

Within a day of its release, Manus was reverse-engineered and "replicated" by the open-source community. One of the most popular open-source projects, "OpenManus," garnered 270,000 stars on GitHub in just a few days. How Manus plans its next steps in terms of technology and building barriers is a topic of great external interest.

According to the latest technical dissections and user reviews, setting aside the hype, the Manus team has made some commendable technical explorations in "AI assembly."

Mixed Feedback

Many users who have tried Manus have expressed that we may have just scratched the surface of integrating existing models with the right tools and integrations to achieve powerful functionalities. The experience of deeper integration in the future could be very promising.

Victor M, the product lead at Hugging Face, posted after his experience that Manus is an impressive AI tool he has used. Its agent capabilities redefine possibilities, and the user experience (UX) is largely as promised. He created a plane control game using prompts on Manus.

Another user conveniently created an endless running game with similar prompts, and the overall effect was quite good.

Deedy, a venture capitalist at Menlo Ventures, posted that this is a novel AI product worth promoting. When asked to "conduct a professional analysis of Tesla stock," it completed what would normally take about two weeks of professional-level work in just one hour, with a visual and interactive analysis interface.

Moreover, Deedy believes that although Manus is just a "wrapper," projects like Cursor, Glean, Perplexity, and Moveworks are also wrappers. Even without their own large models, wrappers' ARR (Annual Recurring Revenue) could exceed $50 million, becoming unicorn-valued. Building excellent products and business capabilities on top of top models is also a business path.

Despite Manus claiming at launch that it achieved leading (SOTA) performance on GAIA (a benchmark for evaluating the real-world problem-solving capabilities of general AI assistants) across all difficulty levels, outperforming OpenAI's DeepResearch feature, the actual experience may not completely overshadow DeepResearch.

Derya Unutmaz, a biomedical scientist, expressed that Manus' performance was a bit disappointing, especially in the professional scientific research knowledge domain. Derya Unutmaz believes that at first glance, Manus' quality seems quite close to OpenAI's DeepResearch, but Manus AI lacks the citations and references in the DeepResearch style, which are important for scientific research.

In response to the issues arising from the first batch of user experiences, Manus stated: "As a small team, our focus is on continuously improving Manus and creating AI agents that truly help users solve problems. The main goal of the current closed beta is to stress-test various parts of the system and identify issues. We greatly appreciate the valuable insights everyone has shared."

AI Technology Integration

What technologies Manus integrates and calls have sparked the most curiosity among netizens. In response, Yichao Peak Ji (Ji Yichao), the co-founder and chief scientist of Manus, did not dodge the question and revealed the technical details through multiple interactive posts.

Regarding the base models, Manus uses Claude and the Qwen-finetunes micro-tuning tools developed by Alibaba Cloud's Tongyi Lab. When the team started building Manus, they only had the Claude 3.5 Sonnet v1 version, so they borrowed some auxiliary models. Now, Claude 3.7 looks promising, and Manus is internally testing for an upgrade update.

Many netizens have indicated that they can dissect the toolchains integrated by Manus through the "sandbox" (a security mechanism) during the code execution process. In response, Ji Yichao said that this is not complicated. The design intention of Manus is that each user can directly access the sandbox.

Specifically, each Manus session has its own sandbox, completely isolated from other sessions. Users can directly enter the sandbox through the Manus interface. The code in the sandbox is only used to receive commands from the agent, so it is only slightly obfuscated.

Moreover, the tool design is not a secret. The action space design of the Manus agent is not much different from common academic methods. Moreover, due to the RAG mechanism, the tool descriptions obtained through jailbreaking will vary depending on the task.

Multi-agent implementation is one of the main features of Manus. When sending a message through Manus, users only communicate with the execution agent, which itself does not know the details of the knowledge, planner, or other agents. This helps control the context length and is also why most prompts obtained through jailbreaking are hallucinations.

Manus indeed uses the open-source code of browser_use. Ji Yichao said, "In fact, we use many different open-source technologies, which is why I specifically mentioned in the release video that without the open-source community, Manus would not exist. We will roll out a series of acknowledgments and collaborations."

The last technical question is whether MCP is used? MCP has recently been widely spread in Silicon Valley. This is an open-source model context open protocol and standardized interface proposed by Anthropic, the developer of the Claude model, at the end of November last year. It is designed for the expansion of various AI applications.

Industry insiders analyze that the MCP protocol is equivalent to the http protocol in the Internet era. It connects large models, Agents, RAG, tools, and other ends for data communication and aims to standardize various intelligent agent interfaces, representing the beginning of the Agent Internet era.

In a reply post, Ji Yichao revealed that Manus did not use MCP but instead utilized an open-source research result from a Chinese team: an executable code operation that leads to better LLM agents. This study suggests using executable code to merge the action elements of LLM agents into a unified action space (CodeAct).

CodeAct provides three key insights:

  1. Coding is not the ultimate goal but a general method for solving problems.
  2. Since LLMs are good at coding, it makes sense to let agents perform tasks most closely related to their training distribution.
  3. This approach significantly reduces context length and makes the combination of complex operations possible.

As for why MCP was not used, Ji Yichao said the Manus project started before MCP was launched.

In addition to base models and interface protocols,