Prompt-Vault: a prompt specification library for testing AI coding ability

w512/Prompt-Vault is a small but useful prompt repository. It does not collect magic prompts; it organizes executable coding prompts into difficulty levels so they can be used to test LLMs and coding agents.

Project: https://github.com/w512/Prompt-Vault

The repository is small, but the structure is clear: Easy, Medium, and Hard. Each Markdown file is a standalone task. The README also says these prompts are suitable for testing language models or practicing small projects.

Not a prompt scrapbook

Many prompt repositories look large but are hard to evaluate. The titles are attractive, but the prompts lack acceptance criteria.

Prompt-Vault is closer to a specification library. Each task tries to describe:

What app to build
Required features
UI style
Technical constraints
Whether it must run as a single file
Whether dependencies are allowed
Whether data should persist

This is much better for testing models than “make a nice Kanban board”, because it reveals whether the model truly understands requirements.

Easy: basic interaction

Easy/Bubble_Sort_Visualizer.md asks for a single-file index.html that visualizes bubble sort with bars, start/reset buttons, a speed slider, comparison count, and a dark theme.

It tests whether a model can connect algorithm state to UI, control animation timing, handle reset and running states, and keep the code readable.

Easy/ToDo_List.md starts from static HTML and gradually adds task creation, completed state, deletion, counters, Active / Completed stats, and localStorage.

It is a simple task, but it tests whether a model can evolve code step by step instead of dumping a messy implementation.

Medium: state and animation complexity

Medium/Sorting_Visualization.md upgrades the challenge. The same page must support Bubble Sort, Insertion Sort, Selection Sort, Merge Sort, Quick Sort, and Heap Sort.

It also needs algorithm selection, speed and size sliders, reset, start / pause, and a live stats panel.

This catches many failures: an agent may implement one bubble sort animation, but multiple algorithms plus pause/resume and stats often break state management.

Useful checks include:

Does every algorithm really sort?
Does the animation match the algorithm steps?
Can it pause and resume?
Does reset stop old animation loops?
Does changing array size break state?
Are the statistics credible?

Hard: product completeness

Hard/Kanban_Board.md asks for a complete board: default columns, custom columns, double-click rename, delete empty columns, cards with title and description, priority, deadline, drag-and-drop, search, priority filter, localStorage, footer stats, glassmorphism dark theme, and responsive horizontal scrolling.

This tests product completeness, not just one feature.

Hard/Markdown_Editor_Desktop.md asks for a Tauri 2 cross-platform Markdown editor. It includes split editing and preview, sync scrolling, live rendering, preview mode, focus mode, open/save/save-as, unsaved title markers, formatting toolbar, shortcuts, themes, font settings, Vue 3, Pinia, marked.js, prism.js, and Tauri plugins.

This is no longer a simple web prompt. It tests frontend state, Tauri plugins, filesystem permissions, IPC boundaries, and desktop packaging.

Why it is valuable

Prompt-Vault is valuable because it provides reusable evaluation samples.

If you compare models or coding agents, you can run the same prompt repeatedly and observe:

Which model follows constraints
Which model misses fewer features
Which model handles edge cases
Which output is easier to maintain
Which model is better at UI details
Which model is stable under single-file constraints

This is more reliable than “it feels smarter”.

Frontend tasks are especially useful because many failures are not syntax errors. They are missing button states, broken animation, lost persistence, wrong drag targets, or stale statistics.

How to extend it

The repository could become a stronger benchmark by adding acceptance checklists, failure cases, scoring dimensions, reference implementations, and cross-model result records.

For example, a sorting task should include checks such as “rapid Start / Reset clicks must not create multiple animation loops.” A Kanban task should specify what happens when deleting a non-empty column.

These details make the prompt useful for human review and automated agent evaluation.

Suggested use

To test an AI coding tool:

Give one prompt unchanged.
Do not add extra hints.
Run the generated result.
Check features one by one.
Record missing features and bugs.
Give one repair round.
Compare time, token cost, and final code quality.

This is closer to real development than simply checking whether a page appears.

Summary

Prompt-Vault is a lightweight prompt specification library. It is useful for AI coding tests and for frontend practice projects.

It reminds us that a good coding prompt is not just a wish. It should define requirements, constraints, interactions, state, acceptance, and run mode.

If you compare Codex, Claude Code, Cursor, Gemini CLI, or other coding agents, this kind of leveled prompt is worth keeping.