Decoding Manus: The All-Purpose Task Butler

AI Intelligent Relativity

Exploring the boundaries and future of artificial intelligence, insights into AI applications, and contemplations on the intelligent era—join us to decode how AI is reshaping the world!

4 original articles

2025-03-06 14:08 USA

Click the blue text to follow us

"Manus" claims to be a "universal task agent" that can help you get things done with a team of multi-agents. Let's talk about how such intelligent tools work in a way that everyone can understand. We'll also discuss its strengths and weaknesses to see if it can truly become the "jack-of-all-trades" in our lives.

I. Understanding What You Want

Imagine you tell Manus, "I want to travel to Japan; help me make a travel plan." It first needs to understand what you're really asking.

Keyword Extraction: It breaks down your words to grasp the key points—"Japan," "travel," "plan"—and labels the task as "japan-trip" with the type "travel."

Clarifying the Request: If your request is too vague, such as "help me make a plan," it will chat with you a bit more, or even ask you to upload some materials (like pictures or documents) to ensure it understands your meaning.

II. Preparing for the Task

Once it understands the demand, Manus needs to set up a "workspace."

Creating a Workspace: It builds a folder based on the task keywords and uses Docker (a tool for isolation) to create a small space to ensure that the tasks don't interfere with each other.

Organizing Outputs: Anything generated during the task (such as files or notes) will be stored in this folder, and cleaned up afterward.

III. Making a To-Do List

Next, it needs to plan how to proceed.

Breaking Down Steps: Using your request plus background information, it asks a smart model to help break down the task into smaller parts. For example, "search for Japanese attractions," "check flight prices," and "create an itinerary."

Recording the List: These steps are written into a file called todo.md, like a to-do list.

IV. Executing the Tasks

With the list ready, Manus and its team of agents get to work.

Checking the List: It goes through todo.md one by one, marking unfinished tasks with [ ] and completed tasks with [x].

Assigning Tasks: Each to-do item is assigned with context (for example, "search for Japanese attractions" is related to travel) to built-in agents—there are search agents good at finding information, code agents for writing code, and data-analysis agents for analyzing data.

Storing Outputs: After the agents complete their tasks, the results (such as web content or code) are saved to the folder.

Updating the List: The main program updates todo.md and moves on to the next task.

Take the "search agent" as an example. When searching for a "Japanese travel plan," it:

Uses keywords to call Google's API and grabs 10-20 results;
Opens the webpage and browses like a human, capturing text and screenshots;
Uses a multimodal model to extract useful information (such as recommendations for attractions), continuing to scroll if more is needed;
Stores the collected information.

This agent acts like a "web detective," using a headless browser and smart models to mimic human web browsing. In comparison, the "code agent" and "data-analysis agent" are simpler: they write code, run it, and save the results.

V. Delivering the Results

Once the tasks are done, Manus tidies up.

Summarizing the Output: It organizes the results based on your initial request.

Handing Over to You: The output could be documents, code, images, etc., for you to view or download.

Asking for Feedback: Are you satisfied? Please give some feedback.

How does it search for a Japanese travel plan?

Take the "search agent" as an example:

Input "japan-trip" and call Google to get a bunch of results;
Open the webpage and capture text and screenshots;
Use a model to pick out useful information (such as "Tokyo Tower is worth visiting"), continuing to browse if more is needed;
Save the collected information to the folder.

The whole process is like hiring a "secretary who can search the web for information."

How can it be improved?

This multi-agent model sounds cool but has room for improvement:

Task Relationships: The current to-do list is linear. In the future, a DAG (a type of graph) could handle more complex dependencies, such as "checking flights" waiting until "deciding dates."

Automated Testing: Adding a "testing agent" to check the quality of results and retry if necessary.

Human-AI Collaboration: It can run fully automatically, but wait a few seconds for your confirmation after each step. If there's no feedback, it continues.

Is Manus Worth a Try?

Overall, Manus has put a lot of effort into engineering and is more user-friendly than many similar products. The interaction experience is good, but there are no particularly high technical barriers—it relies heavily on models:

A small model to understand your intent;
Deepseek-R1 for task planning;
Claude-3.7-Sonnet for image recognition and code generation.

The problem is that running these models is costly (high token consumption), and who will bear this cost is a question. Moreover, whether the tasks are accurate and whether you are satisfied still need more examples to prove.

In Conclusion

Manus is like a "task butler" with a team of helpers getting things done for you. From understanding your needs to planning and executing tasks, and finally delivering results, the process is clear. However, whether it will become popular depends on who covers the cost and how appealing the use cases are. Do you think this kind of intelligent assistant will become a standard in the future? Feel free to leave a comment and discuss!