Taking Manus as an example, this article dissects the workflow of a general-purpose intelligent agent based on a multi-agent system:

I. Intent Recognition

  1. User Input Processing: The system captures user input and performs intent recognition and keyword extraction. For example, if the user inputs "I want to travel to Japan and need a travel plan," the system extracts keywords like "Japan-trip" and identifies the task type as "travel".
  2. Guiding User Interaction: If the user's input is too vague or the intent cannot be recognized, the system prompts the user for more information, such as continuing the conversation or uploading relevant documents or images.

II. Task Initialization

  1. Task Folder Creation: The system creates a task folder using the extracted keywords and launches a Docker container to isolate the environment for task execution.
  2. Content Management: All content generated during task execution is written to the task folder. Once the task is complete, the Docker container is cleaned up.

III. Step Planning

  1. Task Decomposition: Using the intent recognition results and additional context, the system requests an inference model to break down the task into steps.
  2. Todo List Creation: The decomposed task steps are written into a todo.md file within the task folder.

IV. Task Execution

  1. Task Iteration: The system iterates through the todo.md file, where [ ] indicates pending tasks and [x] indicates completed tasks.
  2. Function Call and Agent Dispatch: For each pending task, the system makes a function call with context information, invoking specialized agents such as a search agent, code agent, or data-analysis agent.
  3. Content Generation and Update: The selected agent executes the task, and the generated content is written back to the task folder within the Docker container. The main thread updates the todo.md file and proceeds to the next task.

V. Summary and Output

  1. Task Completion: Once all tasks in todo.md are executed, the main thread consolidates the results based on the user's initial request.
  2. User Output: The generated content (e.g., documents, code, images, links) is made available for the user to view or download.
  3. User Feedback Collection: The system collects user feedback on task satisfaction.

Example Execution for a "Japan Trip" Task

Taking the "Japan trip" task as an example, the search agent's main steps are as follows:

  1. Keyword-Based Search: Using keywords like "Japan-trip," the agent calls a third-party API (e.g., Google) to retrieve 10-20 search results.
  2. Web Browsing Simulation: The agent simulates browser behavior by opening the first search result, extracting text and visual information from the webpage.
  3. Multimodal Information Extraction: The agent uses a multimodal model to extract relevant information from the webpage, determining if it meets the task requirements.
  4. Iterative Web Interaction: The agent repeats the process of clicking and scrolling to gather more content until the task requirements are satisfied.
  5. Content Saving: The collected information is saved to the task folder.

Improvements for Multi-Agent Systems

  1. Complex Task Dependency: The linear dependency in todo.md can be replaced with a Directed Acyclic Graph (DAG) to support more complex task dependencies.
  2. Automation Testing Agent: Introducing an automation testing agent to evaluate task results and correct errors if a step fails.
  3. Hybrid Automation and User Feedback: Allowing a hybrid mode where users can provide feedback after a step is executed, and the system continues automatically if no feedback is received within a few seconds.

Overall Evaluation

Manus has made significant engineering efforts and offers better user interaction compared to other products. However, it still heavily relies on models, with high token consumption costs. The ultimate task accuracy and user satisfaction will depend on more case studies.