Analyzing Anthropic's docx Agent Skill: Features, Code Structure, Usage, and Caveats

Based on SKILL.md and the supporting scripts under Anthropic's skills/docx, this post breaks down the docx skill's capability boundaries, code structure, practical workflow, and common pitfalls.

Anthropic’s skills/docx is essentially a workflow spec plus a script toolkit for handling Word documents more reliably with AI.
It does not just tell a model to “generate a .docx.” Instead, it breaks document work into explicit paths: create, read, edit existing files, handle tracked changes, add comments, convert formats, and validate OOXML structure.

If we reduce it to one line:

It treats .docx as ZIP + XML + Office compatibility constraints, not as a black box.

What this skill solves

When general-purpose models handle Word files, we often see the same failure patterns:

  1. They output text, but not a structurally valid .docx.
  2. They break OOXML while editing existing documents.
  3. They do not know which XML parts to update for comments or tracked changes.
  4. Output opens in one app but behaves inconsistently across Word, LibreOffice, and Google Docs.
  5. They lack clear routing for when to use pandoc vs. unpack/edit/repack.

The value of this skill is that it front-loads those decisions:

  • Use pandoc or unpacking for reading and analysis.
  • Use docx-js for creating new .docx files.
  • Use “unpack -> edit XML -> repack -> validate” for existing documents.
  • Use dedicated scripts for tracked changes/comments/schema-sensitive operations.

That approach works because Word problems are usually not about wording quality. They are about structural correctness and compatibility.

Directory and code structure

This skill can be understood in four layers.

1. Guidance layer: SKILL.md

SKILL.md does two important jobs:

  1. It defines trigger conditions.
    If a request mentions Word, .docx, comments, tracked changes, TOC, page numbers, or polished document formatting, this skill should be activated.
  2. It defines execution routes.
    Different task types map to different toolchains, instead of improvising every run.

It also captures practical compatibility rules, for example:

  • docx-js defaults to A4, not US Letter.
  • Landscape page sizing must follow docx-js internals.
  • Lists should not be built from manual Unicode bullets.
  • Table width needs coordinated settings at table and cell levels.
  • Image type is required.
  • Generated files should be validated.

That is a strong signal that the goal is not just “generate something,” but “generate something that is robust.”

2. Office package layer: scripts/office/*

This layer treats .docx/.pptx/.xlsx as Open XML packages.

unpack.py

This script unpacks files and prepares XML for safer editing:

  • Extracts ZIP package content
  • Pretty-prints XML and .rels
  • Optionally runs merge_runs for DOCX
  • Optionally runs simplify_redlines for DOCX
  • Escapes smart quotes into XML entities

So it is not just decompression. It normalizes content into an editing-friendly shape.

pack.py

This script repacks a directory into .docx/.pptx/.xlsx.
Before packaging, it can:

  • Run validation and auto-repair
  • Condense XML formatting safely

If --original is provided, it compares and validates against the source context.
That matters because “repacked successfully” is not equal to “semantically safe.”

validate.py

This is the quality gate. It checks:

  • XML well-formedness
  • Namespace correctness
  • Unique ID constraints
  • Relationship/content type consistency
  • XSD compliance
  • Whitespace preservation rules
  • Insertion/deletion/comment marker constraints

For DOCX work, this is a core component, not an optional extra.

soffice.py

This helper wraps LibreOffice execution for restricted/sandboxed environments.
It configures SAL_USE_VCLPLUGIN=svp and can apply a shim for AF_UNIX socket limitations when needed.

That tells us the skill is designed for automated agent workflows, not only local manual usage.

3. Word-specific layer: comments, revisions, and redlines

comment.py

This script adds comments to DOCX, including required package plumbing across multiple parts:

  • word/comments.xml
  • commentsExtended.xml
  • commentsIds.xml
  • commentsExtensible.xml
  • comment range markers in document.xml
  • declarations in [Content_Types].xml and document.xml.rels

If comment parts do not exist yet, it can initialize templates and required relationships/content types.

accept_changes.py

This script accepts all tracked changes via LibreOffice headless + macro (.uno:AcceptAllTrackedChanges) rather than fragile raw XML surgery.

That is a pragmatic choice because accepting revisions is a behavior-level operation, not just deleting <w:ins> / <w:del> tags.

validators/redlining.py

This is one of the most valuable pieces.
It removes tracked changes for a specific author in both original and modified documents, then compares resulting text to verify that changes are properly represented in revision markup.

So it validates revision semantics, not only XML syntax.

4. Schema and support layer: schemas/, helpers/, templates/

schemas/

Contains OOXML/ECMA/Microsoft-related XSD files used by validators.
Validation is therefore grounded in formal schema constraints.

helpers/

Includes utilities such as:

  • merge_runs.py
  • simplify_redlines.py

These stabilize XML structure for clearer edits and diffs.

templates/

Contains XML templates needed for comment support, including:

  • comments.xml
  • commentsExtended.xml
  • commentsIds.xml
  • commentsExtensible.xml
  • people.xml

These templates help avoid package-level inconsistencies when creating comment-related parts.

Typical usage patterns

From SKILL.md, the most common workflows are:

Scenario 1: Read/analyze an existing DOCX

Use pandoc for text-level extraction with tracked changes:

1
pandoc --track-changes=all document.docx -o output.md

Use unpacking for raw XML inspection:

1
python scripts/office/unpack.py document.docx unpacked/

Scenario 2: Create a new DOCX

Use docx-js for generation:

1
npm install -g docx

Then validate:

1
python scripts/office/validate.py doc.docx

Scenario 3: Edit an existing DOCX

Core workflow:

1
2
3
python scripts/office/unpack.py document.docx unpacked/
# edit XML under unpacked/
python scripts/office/pack.py unpacked/ output.docx --original document.docx

--original is the critical part because it enables stronger structural and revision-aware checks.

Scenario 4: Accept all tracked changes

1
python scripts/accept_changes.py input.docx output.docx

Requires LibreOffice; useful for producing a clean post-review file.

Scenario 5: Add comments

1
2
python comment.py unpacked/ 0 "Comment text"
python comment.py unpacked/ 1 "Reply text" --parent 0

You still need to place comment range markers in document.xml where the comment should attach.

Key caveats to remember

1. .docx is not a plain text file

A single edit may involve body XML, relationships, content types, comment parts, IDs, and schema constraints.

2. docx-js generation still needs explicit guardrails

Defaults can be wrong for your target layout and compatibility goals.

3. Comments and tracked changes are multi-part operations

They are package-level features, not single-tag edits.

4. “Opens successfully” does not mean “correctly modified”

Many issues only surface later during editing, reviewing, cross-app opening, or acceptance of changes.

5. Environment readiness matters

You need tools such as pandoc, LibreOffice/soffice, docx-js, and Python deps (defusedxml, lxml) available.

What this skill is good for (and not)

Good fit

  • Batch Word report generation
  • Structured formal document production
  • Automated edits to existing .docx
  • Tracked-changes aware workflows
  • Automated comment insertion
  • Agent/script-driven document pipelines

Not ideal

  • Very simple PDF-only output cases
  • Pure text extraction with no document fidelity requirement
  • Fully manual visual editing workflows
  • Zero-dependency expectations for end-to-end Word automation

Summary

Anthropic’s skills/docx is strong not because it can “generate Word files,” but because it encodes why Word automation fails and how to handle those failure modes systematically.
It combines generation, low-level XML editing, revision semantics, schema validation, and cross-app compatibility into one executable workflow.

If your use case includes existing DOCX edits, comments, tracked changes, or compatibility-sensitive automation, this design is very practical and high value.

Code location: https://github.com/anthropics/skills/tree/main/skills/docx

记录并分享
Built with Hugo
Theme Stack designed by Jimmy