Pixelle-Video is an open-source fully automated short-video generation engine from AIDC-AI. Its goal is direct: the user enters a topic, and the system automatically writes the script, generates AI images or videos, creates voice narration, adds background music, and renders the final video.
This kind of tool is useful for batch short-video creation, knowledge explainers, talking-head content, novel recaps, history and culture videos, and self-media experiments. It is not a single text-to-video model. It is a production pipeline that connects several AI capabilities.
What It Automates
Pixelle-Video’s default flow can be summarized as:
- enter a topic or fixed script;
- use an LLM to generate narration;
- plan scenes and generate images or video clips;
- use TTS to create voice narration;
- add background music;
- apply a video template and render the final result.
The README describes the flow as “script generation → image planning → frame-by-frame processing → video composition.” The modular design is clear: each step can be replaced, tuned, or connected to a custom workflow.
Key Features
The project covers a fairly complete set of capabilities:
- AI script writing: automatically generate narration from a topic;
- AI image generation: create illustrations for each line or scene;
- AI video generation: connect to video generation models such as WAN 2.1;
- TTS voice: support Edge-TTS, Index-TTS, and other options;
- background music: use built-in BGM or custom music;
- multiple aspect ratios: support vertical, horizontal, and other video sizes;
- multiple models: connect to GPT, Qwen, DeepSeek, Ollama, and more;
- ComfyUI workflows: use built-in workflows or replace image, TTS, and video generation steps.
Recent updates also mention motion transfer, digital-human talking videos, image-to-video pipelines, multilingual TTS voices, RunningHub support, and a Windows all-in-one package. The project is clearly moving beyond a simple script toward a fuller creation tool.
Installation and Launch
Windows users can first look at the official all-in-one package. It is designed to reduce setup friction: no manual Python, uv, or ffmpeg installation is required. After extracting the package, run start.bat, open the web interface, and configure the required APIs and image generation service.
For source installation, the README gives this basic flow:
|
|
The source route is suitable for macOS and Linux users, and for anyone who wants to modify templates, workflows, or service configuration. The main prerequisites are uv and ffmpeg.
Configuration Priorities
On first use, the key is not to click “generate” immediately. The important part is connecting the external capabilities properly.
LLM configuration determines script quality. You can choose models such as Qwen, GPT, DeepSeek, or Ollama, then fill in the API Key, Base URL, and model name. If you want to minimize cost, local Ollama is one option. If you want more stable results, a cloud model is usually easier.
Image and video generation configuration determines visual quality. The project supports local ComfyUI and RunningHub. Users who understand ComfyUI can place their own workflows under workflows/ to replace the default image, video, or TTS pipeline.
Template configuration determines the final visual form. The project organizes video templates under templates/, with naming rules for static templates, image templates, and video templates. For creators, this is more practical than generating raw assets only, because the output is a video that can be previewed and downloaded directly.
Who It Is For
Pixelle-Video is especially suitable for three groups:
- Short-video creators who want to turn ideas into draft videos quickly.
- AIGC tool users who want to connect LLMs, ComfyUI, TTS, and video composition.
- Developers and automation users who want to modify templates, workflows, or integrate their own materials and models.
If you only want to make one polished premium video, it may not replace manual editing. But if you want to generate many explainers, talking videos, or science and education videos with a consistent structure, its pipeline approach is valuable.
Things to Note
The ceiling of this kind of tool is determined by multiple links in the chain. A weak script model produces empty content; a weak image model gives scattered visuals; unnatural TTS makes the video feel rough; and a poor template weakens the final result.
So it is better to start with one fixed scenario, such as a “60-second vertical science explainer.” Fix the LLM, visual style, TTS voice, BGM, and template first, then expand to more topics.
The project supports a local free setup, but local setups often require a GPU, ComfyUI configuration, and model files. Users without a local inference environment can reduce setup difficulty by using a cloud LLM plus RunningHub, while keeping an eye on usage cost.
Short Take
Pixelle-Video is interesting not merely because it can “generate a video from one sentence.” Its real value is that it breaks short-video production into replaceable modules: script, visuals, voice, music, templates, and rendering. For ordinary users, it is a low-barrier AI video tool. For developers, it is closer to a hackable short-video automation framework.
If you are studying AI short-video pipelines, or want to connect ComfyUI, TTS, LLMs, and template rendering into a usable product, Pixelle-Video is worth trying and dissecting.