Agent on KnightLi Blog

AI Terms Explained: Agent, MCP, RAG, and Token in Plain Language

Thu, 23 Apr 2026 13:13:40 +0800

When people first get into AI, what pushes them away is often not the models themselves, but the long list of terms that keeps showing up in every discussion. Agent, MCP, RAG, AIGC, and Token all look familiar, but without a simple explanation, many people only recognize the words without really understanding them.

This article follows a common beginner-friendly line of explanation and condenses 10 high-frequency AI terms into a set of meanings that is easier to remember. The goal is not to sound academic. It is to help you build a basic mental model that lets you follow everyday AI conversations.

10 common AI terms and what they mean

1. Agent: an AI that does more than chat

Agent can be understood as an AI assistant that actually gets work done.

A normal chatbot usually works in a simple question-and-answer pattern. An Agent goes a step further. It can break a task into steps, arrange a process, call tools, and return a finished result. If you ask it to organize materials, look something up, or generate a document, it may do more than give advice. It may actually chain those actions together and complete them.

That is why the key point of an Agent is not whether it can talk, but whether it can act.

2. OpenClaw: an AI assistant that stays on your computer

Here, OpenClaw is described as a kind of AI assistant that lives on your computer.

You can think of this type of tool as a more desktop-oriented AI helper. It does not only receive text. It may also observe the interface, call local tools, and execute tasks step by step. Compared with a normal web chat interface, this kind of tool emphasizes operational ability much more.

If Agent is the abstract idea of an execution-oriented AI, this kind of desktop assistant is a more concrete personal-computer version of that idea.

3. Skills: capability packs added to an Agent

Skills can be understood as functional modules or operating instructions for an Agent.

The same Agent can behave very differently depending on which Skills it has. Some may focus on copywriting, some on data organization, and some on code-related work. They are a bit like apps on a phone, and a bit like reusable workflows.

So in many cases, it is not that the model suddenly became smarter. It is that a clearer set of rules, tools, and steps was added behind it.

4. MCP: a unified way for AI to connect to tools

MCP stands for Model Context Protocol.

In everyday terms, it is a bit like a Type-C connector for the AI world. In the past, connecting a model to different tools often meant building separate integrations one by one. With a unified protocol, the way those tools connect becomes more standardized and easier to reuse.

For most users, the most important thing to remember is this: MCP is not about whether a model can answer a question. It is about how a model can connect to external tools and resources in a safe and stable way.

5. Gacha: AI output is inherently random

The term “gacha” often appears in AI image generation, video generation, and creative work.

The idea is simple. Even with the same prompt and the same general direction, the result can still be different each time. Sometimes the output is great. Sometimes it falls apart. That is why people compare repeated generation attempts to pulling gacha in a game.

What this really reminds us is that AI generation is not a fixed formula. It is a probabilistic process with variation.

6. API: the connection between an app and a model

API stands for Application Programming Interface.

You can think of it as the standard entry point through which programs communicate. When you call a model service from your own app, script, or editor, you are essentially using an API to send a request and receive a result.

If you compare a model service to a restaurant, then:

the menu is like the API documentation
placing an order is like making an API request
the kitchen sending back the dish is like the model returning a result

That is why many tools may look different on the surface while still calling some form of API underneath.

7. Multimodality: AI handles more than text

Multimodality means AI no longer only reads and writes text. It can process multiple kinds of input and output.

For example, it may be able to read images, understand voice, interpret video, generate pictures, or even support real-time voice and video interaction. Compared with early text-only models, multimodal models are much closer to having the combined abilities to see, hear, speak, and write.

That is also why many AI products are no longer centered around a single text box.

8. RAG: retrieve information first, then generate an answer

RAG stands for Retrieval-Augmented Generation.

It is useful for solving a practical problem: a model’s training data has a time boundary, and it does not automatically know your company’s newest documents, customer-service records, or business rules. The idea behind RAG is to retrieve relevant material from specified sources first, and then generate an answer based on that material.

Its value usually shows up in three ways:

answers are more likely to stay close to real source material
you can trace where the answer came from
new documents can be added and reflected quickly

That is why many enterprise knowledge bases, AI customer-service systems, and internal Q&A tools rely on RAG.

9. AIGC: the general term for AI-generated content

AIGC stands for AI Generated Content.

It is not a single tool. It is a broad label for content produced by AI, including text, images, audio, video, and more. AI writing, AI illustration, AI short-form video generation, and AI voice synthesis all fit under the umbrella of AIGC.

What matters most about this term is that it describes a way of producing content, not one specific model.

10. Token: the unit used to measure model processing

Token can be understood as the basic unit a model uses to process text.

It is not exactly the same as one character or one word, but in practice, you can treat it as the common unit used for model computation and billing. Your input consumes Token, the model’s output consumes Token, and the context kept in memory also takes up Token.

That is why model services keep talking about context length, cost control, and prompt compression. At the core, all of those topics are tied to Token.

Claude Code Multi-Agent Collaboration: How to Choose Between Subagents and Agent Teams

Wed, 22 Apr 2026 21:35:52 +0800

When people talk about multi-agent collaboration in Claude Code, the easiest two concepts to mix up are Subagents and Agent Teams. They both sound like “spin up several agents to work together,” but they are meant for different kinds of work. In short, the former is better for splitting off independent tasks, while the latter is better when several agents need to collaborate around the same problem and cross-check each other over time.

If you have used Skills before, this framing also helps:

A Skill defines the workflow and rules
A Subagent or Agent teammate does the actual execution

So the real question is not which one is “more advanced,” but what kind of collaboration problem you are solving.

Subagents: split off side tasks

Subagents are closer to temporary worker copies launched from the current session. Each one gets its own context window, and when it finishes, it returns only a summary of the result. The main conversation stays cleaner because it does not have to absorb all the intermediate logs and output.

That gives Subagents a few very practical strengths:

The main thread stays clean instead of being flooded by test logs, search results, or long output
Independent research or execution tasks can run in parallel
They work well for tasks where “just bring me the result” is enough

The original article notes that Claude Code comes with three built-in kinds of Subagents:

Explore: read-only, useful for quickly searching a codebase
Plan: read-only, useful for gathering information in the background during plan mode
General-purpose: can read and write, suitable for tasks that mix exploration and editing

Custom Subagents

If the built-in options are not enough, you can define your own Subagent. The mechanism is simple: write a Markdown file in one of these locations:

.claude/agents/: only active in the current project
~/.claude/agents/: active across all your projects

The file format looks like this:

---
name: code-reviewer
description: Expert code review specialist. Proactively reviews code for quality, security, and maintainability. Use immediately after writing or modifying code.
tools: Read, Grep, Glob, Bash
model: inherit
---
You are a senior code reviewer ensuring high standards of code quality and security.

When invoked:

1. Run git diff to see recent changes
2. Focus on modified files
3. Begin review immediately

Review checklist:

- Code is clear and readable
- Functions and variables are well-named
- No duplicated code
- Proper error handling
- No exposed secrets or API keys
- Input validation implemented
- Good test coverage
- Performance considerations addressed
Provide feedback organized by priority:

- Critical issues (must fix)
- Warnings (should fix)
- Suggestions (consider improving)

Include specific examples of how to fix issues.

The key field here is description. Claude uses it to decide when this Subagent should be called, so the more precise the description is, the more reliable the trigger tends to be.

A few other common configuration fields are also worth knowing:

tools: limits which tools the Subagent can use
model: chooses between sonnet, opus, haiku, or inherit
permissionMode: controls edit permissions and permission prompt behavior
memory: gives the Subagent a cross-conversation memory directory

If you only need a Subagent temporarily, you can also define it through the CLI:

claude --agents '{
  "code-reviewer": {
    "description": "Expert code reviewer. Use proactively after code changes.",
    "prompt": "You are a senior code reviewer. Focus on code quality, security, and best practices.",
    "tools": ["Read", "Grep", "Glob", "Bash"],
    "model": "sonnet"
  }
}'

When Subagents fit best

Subagents are usually the best fit for tasks like these:

Running tests and returning only the failure summary instead of flooding the main thread with thousands of log lines
Investigating several unrelated modules in parallel
Splitting “find the issue” and “fix the issue” into a simple pipeline

For example:

`1`	`Research the authentication, database, and API modules in parallel using separate subagents`

`1`	`Use the code-reviewer subagent to find performance issues, then use the optimizer subagent to fix them`

But if a task needs constant back-and-forth adjustments, shares a lot of context across stages, or concentrates changes in only one or two files, handling it directly in the main conversation is often simpler than spinning up a Subagent.

Agent Teams: multiple independent sessions working together

Agent Teams operate at a different level. Instead of launching worker copies inside one session, they start multiple fully independent Claude Code instances that collaborate around a shared task list and can also message one another directly.

That makes an Agent Team feel more like a real small team than a simple side-task worker setup.

The article notes that this is currently an experimental feature and needs to be enabled first:

{
    "env": {
        "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
    }
}

Once this is added to settings.json, you can ask Claude to organize a team around a specific goal. For example:

1
2
3

I'm designing a CLI tool that helps developers track TODO comments across
their codebase. Create an agent team to explore this from different angles: one
teammate on UX, one on technical architecture, one playing devil's advocate.

What an Agent Team consists of

An Agent Team mainly includes three parts:

Team lead: the main session you are using, responsible for organizing, assigning, and summarizing
Teammates: multiple independent Claude Code instances
Task list and Mailbox: the shared task list and communication channel

The biggest difference from Subagents is that teammates can communicate directly with one another instead of routing everything through the lead. Tasks usually move through states such as pending, in progress, and completed, and once a teammate finishes one task, it can pick up the next one.

When Agent Teams fit best

When a task needs several perspectives, active discussion, conflicting hypotheses, or parallel work across modules, Agent Teams are a better fit.

The article gives several representative examples:

Several reviewers inspect the same PR in parallel, each focusing on a different dimension
Multiple agents investigate the same bug with competing explanations and challenge each other’s conclusions
Frontend, backend, and testing move forward in parallel on different parts of the project

For example, parallel code review:

Create an agent team to review PR #142. Spawn three reviewers:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Have them each review and report findings.

And for debate-style debugging:

Users report the app exits after one message instead of staying connected.
Spawn 5 agent teammates to investigate different hypotheses. Have them talk to
each other to try to disprove each other's theories, like a scientific
debate. Update the findings doc with whatever consensus emerges.

The common pattern here is that you do not just want one answer. You want several agents to exchange judgments, challenge assumptions, and gradually converge on a stronger conclusion.

How to choose between them

If you want a quick rule of thumb, use this:

If you just need the result, use Subagents
If the work requires discussion and cross-validation, use Agent Teams

Expanded a bit further, the main differences are:

Communication style: Subagents mainly report results back to the main session, while Agent Teams members can talk directly to one another
Coordination model: Subagents depend more on the main conversation to orchestrate them, while Agent Teams work from a shared task list that members can claim themselves
Token cost: Subagents are cheaper, while Agent Teams cost more because each teammate is an independent instance
Best-fit tasks: Subagents are better for independent, result-oriented work, while Agent Teams are better for discussion-heavy and cross-check-heavy work

Practical cautions

Agent Teams are more powerful, but that does not mean every task deserves a full team. The article specifically calls out a few practical concerns:

token usage is noticeably higher
if multiple teammates edit the same file at once, overwrite conflicts become very likely
adding too many teammates increases coordination cost without guaranteeing better results

A safer default is usually:

start with 3 to 5 teammates
split tasks by module or file to avoid edit conflicts
if the lead starts doing teammate work too early, explicitly tell it to wait for the others first

The current experimental version also has a few limitations, such as:

no support for /resume and /rewind for in-process teammates
task status can lag and sometimes needs manual correction
one lead can manage only one team at a time
teammates cannot spawn child teams of their own

Short conclusion

These two features are not substitutes for one another. They solve two different collaboration problems.

If your goal is “parallelize side tasks and keep the main context clean,” start with Subagents. If your goal is “let several agents work like a small team, discuss, and cross-check each other,” then Agent Teams are the better tool.

Trying both in a real task usually makes the distinction obvious very quickly: one is optimized for context isolation and result collection, and the other is optimized for multi-perspective collaboration and ongoing interaction.

Original article: https://cloud.tencent.com/developer/article/2652960