web-video-presentation: an Agent Skill for turning articles into screen-recordable web videos

web-video-presentation is an agent skill in ConardLi/garden-skills. It solves a concrete problem: turn an article or narration script into a web-based presentation that can be recorded as a video.

Project: https://github.com/ConardLi/garden-skills/tree/main/skills/web-video-presentation

It is not a normal slide template or a React component library. It is a production process for AI agents: rewrite content into narration, turn it into an outline, choose a theme, build a 16:9 click-driven Vite + React + TypeScript web surface, then record it.

It is not trying to make slides

The README makes an important distinction: the skill generates a “video production surface”, not a slide deck.

Each click advances a narration beat. Each step owns a 1920×1080 stage. The UI progress controls stay hidden unless hovered, making recordings clean.

It is useful for:

Turning blog posts into YouTube or Bilibili-style explainers
Building visuals for narration scripts
Product demos
Tutorial videos
Keynote-style visual talks
Dynamic presentations that do not feel like PowerPoint

The value is not replacing video editing software. It makes the browser a controllable, iterative video canvas.

Core principles

The skill has several clear principles.

First, a fixed 16:9 stage. Design happens in a stable 1920×1080 coordinate system, then scales to the viewport. This prevents layout drift during recording.

Second, a global step cursor. Clicks and keyboard input advance (chapter, step) and save progress locally. It behaves like a video timeline, but controlled through web state.

Third, one idea per step. Every beat should have its own visual moment, not just more bullets on the same page.

Fourth, narration drives structure. The script defines rhythm; the outline defines chapters and steps; visuals follow the story.

Fifth, motion first. Each scene should have a moving visual anchor. If it is only static text, it has not become video language yet.

Sixth, theme tokens. A theme is not just colors; it controls typography, colors, cards, background, separators, decoration, and tone through semantic tokens.

Four-part workflow

The workflow has four stages.

First is content writing. If the user provides an article, the agent rewrites it into script.md, then creates outline.md. If the user already provides a narration script, it saves it as script.md and generates the outline.

Second is web development. The agent scaffolds a Vite / React / TypeScript project and implements scenes chapter by chapter. Chapter 1 must be completed by the main thread and approved by the user, because it becomes the style anchor.

Third is optional audio generation. The skill can extract narration definitions from each chapter’s narrations.ts and run a voice synthesis flow.

Fourth is recording and post-production. The web app is the recording stage; the user records the click-driven presentation.

The process has hard checkpoints: script, outline, theme, asset plan, and development mode must be aligned first; chapter 1 must be reviewed; audio generation must also be confirmed.

Why outline should not define animation

One interesting constraint is that outline.md plans rhythm and information density, but not concrete animations.

It may describe chapters, step count, screen content, information pools, asset plans, and estimated duration. It should not define CSS animation type, timing, clip-path, or filter implementation.

The reason is good: if outline locks animation, later implementation becomes mechanical. Video feeling should be designed per chapter based on content relationships.

narrations.ts as the source of truth

Each chapter has a narrations.ts. It stores the step count and corresponding narration text. The skill requires the maximum step used in the chapter .tsx to align with narrations.length.

This prevents drift across script.md, outline.md, chapter code, chapters.ts, and audio files. For video production, keeping narration, screen, audio, and step count aligned is essential.

Themes are more than skins

Built-in themes include paper-press, warm-keynote, midnight-press, blueprint, chalk-garden, terminal-green, bauhaus-bold, sunset-zine, newsroom, and monochrome-print.

These are not just color swaps. They define different visual languages: print, keynote, blueprint, terminal, newsroom, and so on.

During planning, the agent should recommend two or three themes based on the topic and tone. The user can also request a custom theme.

Three development modes

Chapter 1 is always built by the main thread and reviewed first. After that, there are three modes.

Mode A: chapter-by-chapter confirmation. Lowest risk and best quality control.

Mode B: sequential development. The main thread builds remaining chapters and reviews at the end.

Mode C: parallel development. After chapter 1 approval, subagents build later chapters in parallel. It is fastest, but visual differences may appear. Theme tokens provide consistency while each chapter can still have its own expression.

Who should use it

This skill is best for people who already have content: an article, script, product description, tutorial, or technical explanation.

If the user has no topic or material, the agent should ask for source content. This is not an ideation tool; it is a content-to-video production flow.

Summary

web-video-presentation is valuable because it turns content video production into a collaborative, reviewable, reusable workflow.

It connects article, narration, outline, theme, chapter implementation, audio, and recording, while hard checkpoints prevent the agent from running away.

Even if you do not use its scaffold, ideas like “one step, one idea”, “chapter 1 as style anchor”, “narrations.ts as source of truth”, and “outline does not hard-code animation” are worth borrowing.