ProgramBench 原始榜单数据:模型成绩、成本与 200 个任务记录

整理 ProgramBench 官网公开的原始榜单、扩展结果和 200 个任务实例记录,保留模型成绩、成本、调用次数、测试数量和最佳得分。

ProgramBench 是一个面向 AI 编程能力的新基准。它评估的不是“在现有仓库里修一个 bug”,而是让模型根据已编译的可执行文件和使用文档,从零重建一个行为一致的程序。

这篇文章只做数据整理和简要说明。下面表格保留 ProgramBench 官网公开页面中的原始记录数据,方便后续引用和对比。数据来源包括 ProgramBench 首页Extended ResultsTask Instances,抓取时间为 2026-05-10T12:42:41+08:00

数据口径

  • Resolved:完全通过隐藏行为测试的任务比例。
  • Almost resolved:通过不少于 95% 行为测试的任务比例。
  • Cost:每个任务实例的平均 API 成本,单位为美元。
  • Calls:每个任务实例平均调用 LLM 的次数。
  • 所有模型都使用 mini-SWE-agent 评测,任务总数为 200。

主榜单

# Model Provider Agent Resolved Almost resolved Run
1 Claude Opus 4.7 Anthropic mini-SWE-agent 0% 3.0% https://programbench.com/run/claude-opus-4-7/
2 Claude Opus 4.6 Anthropic mini-SWE-agent 0% 2.5% https://programbench.com/run/claude-opus-4-6/
3 Claude Sonnet 4.6 Anthropic mini-SWE-agent 0% 1.0% https://programbench.com/run/claude-sonnet-4-6/
4 GPT 5.4 OpenAI mini-SWE-agent 0% 0.0% https://programbench.com/run/gpt-5-4/
5 Gemini 3.1 Pro Google mini-SWE-agent 0% 0.0% https://programbench.com/run/gemini-3-1-pro/
6 Gemini 3 Flash Google mini-SWE-agent 0% 0.0% https://programbench.com/run/gemini-3-flash/
7 Claude Haiku 4.5 Anthropic mini-SWE-agent 0% 0.0% https://programbench.com/run/claude-haiku-4-5/
8 GPT 5.4 mini OpenAI mini-SWE-agent 0% 0.0% https://programbench.com/run/gpt-5-4-mini/
9 GPT 5 mini OpenAI mini-SWE-agent 0% 0.0% https://programbench.com/run/gpt-5-mini/

扩展结果

# Model Provider Agent Resolved Almost resolved Cost Calls Run
1 Claude Opus 4.7 Anthropic mini-SWE-agent 0% 3.0% $3.81 93 https://programbench.com/run/claude-opus-4-7/
2 Claude Opus 4.6 Anthropic mini-SWE-agent 0% 2.5% $11.38 260 https://programbench.com/run/claude-opus-4-6/
3 Claude Sonnet 4.6 Anthropic mini-SWE-agent 0% 1.0% $26.73 472 https://programbench.com/run/claude-sonnet-4-6/
4 GPT 5.4 OpenAI mini-SWE-agent 0% 0.0% $0.33 16 https://programbench.com/run/gpt-5-4/
5 Gemini 3.1 Pro Google mini-SWE-agent 0% 0.0% $1.51 94 https://programbench.com/run/gemini-3-1-pro/
6 Gemini 3 Flash Google mini-SWE-agent 0% 0.0% $0.30 85 https://programbench.com/run/gemini-3-flash/
7 Claude Haiku 4.5 Anthropic mini-SWE-agent 0% 0.0% $0.80 124 https://programbench.com/run/claude-haiku-4-5/
8 GPT 5.4 mini OpenAI mini-SWE-agent 0% 0.0% $0.04 18 https://programbench.com/run/gpt-5-4-mini/
9 GPT 5 mini OpenAI mini-SWE-agent 0% 0.0% $0.03 15 https://programbench.com/run/gpt-5-mini/

200 个任务实例原始记录

# Repository Description Lang Stars Tests Best Score Task
1 junegunn/fzf :cherry_blossom: A command-line fuzzy finder go 79,721 1,874 81.9% https://programbench.com/task/junegunn__fzf.b56d614/
2 jesseduffield/lazygit simple terminal UI for git commands go 76,901 855 56.4% https://programbench.com/task/jesseduffield__lazygit.1d0db51/
3 BurntSushi/ripgrep ripgrep recursively searches directories for a regex pattern while respecting your gitignore rs 62,855 1,994 79.7% https://programbench.com/task/burntsushi__ripgrep.3b7fd44/
4 FFmpeg/FFmpeg Mirror of https://git.ffmpeg.org/ffmpeg.git c 59,217 3,050 5.3% https://programbench.com/task/ffmpeg__ffmpeg.360a402/
5 sharkdp/bat A cat(1) clone with wings. rs 58,487 801 33.2% https://programbench.com/task/sharkdp__bat.f822bd0/
6 typst/typst A markup-based typesetting system that is powerful and easy to learn. rs 52,957 1,724 28.0% https://programbench.com/task/typst__typst.88356d0/
7 jgm/pandoc Universal markup converter hs 43,632 5,228 14.1% https://programbench.com/task/jgm__pandoc.5caad90/
8 sharkdp/fd A simple, fast and user-friendly alternative to ‘find’ rs 42,668 1,235 78.1% https://programbench.com/task/sharkdp__fd.40d8eb3/
9 php/php-src The PHP Interpreter c 40,030 14,288 4.8% https://programbench.com/task/php__php-src.c891263/
10 duckdb/duckdb DuckDB is an analytical in-process SQL database management system cpp 37,657 5,650 12.4% https://programbench.com/task/duckdb__duckdb.bdb65ec/
11 ajeetdsouza/zoxide A smarter cd command. Supports all major shells. rs 35,994 531 76.5% https://programbench.com/task/ajeetdsouza__zoxide.67ca1bc/
12 jqlang/jq Command-line JSON processor c 34,541 6,072 89.9% https://programbench.com/task/jqlang__jq.b33a763/
13 dandavison/delta A syntax-highlighting pager for git, diff, grep, rg –json, and blame output rs 30,445 950 37.3% https://programbench.com/task/dandavison__delta.acd758f/
14 sharkdp/hyperfine A command-line benchmarking tool rs 27,960 291 54.3% https://programbench.com/task/sharkdp__hyperfine.327d5f4/
15 ggreer/the_silver_searcher A code-searching tool similar to ack, but faster. c 27,080 1,006 59.3% https://programbench.com/task/ggreer__the_silver_searcher.a61f178/
16 facebook/zstd Zstandard - Fast real-time compression algorithm c 27,013 2,038 68.8% https://programbench.com/task/facebook__zstd.1168da0/
17 facebookresearch/fastText Library for fast text representation and classification. cpp 26,511 312 75.6% https://programbench.com/task/facebookresearch__fasttext.1142dc4/
18 robertdavidgraham/masscan TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes. c 25,544 2,549 57.0% https://programbench.com/task/robertdavidgraham__masscan.b99d433/
19 tree-sitter/tree-sitter An incremental parsing system for programming tools rs 24,953 1,232 37.2% https://programbench.com/task/tree-sitter__tree-sitter.5e23cca/
20 FiloSottile/age A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability. go 22,077 676 63.5% https://programbench.com/task/filosottile__age.706dfc1/
21 rust-lang/mdBook Create book from markdown files. Like Gitbook but implemented in Rust rs 21,541 1,114 55.5% https://programbench.com/task/rust-lang__mdbook.37273ba/
22 jarun/nnn n³ The unorthodox terminal file manager c 21,506 477 98.1% https://programbench.com/task/jarun__nnn.cb2c535/
23 antonmedv/fx Terminal JSON viewer & processor go 20,433 2,047 75.7% https://programbench.com/task/antonmedv__fx.86d0d34/
24 mikefarah/yq yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor go 15,281 2,000 39.5% https://programbench.com/task/mikefarah__yq.602586d/
25 Y2Z/monolith ⬛️ CLI tool and library for saving complete web pages as a single HTML file rs 15,024 713 51.2% https://programbench.com/task/y2z__monolith.8702e66/
26 direnv/direnv unclutter your .profile go 14,998 849 62.0% https://programbench.com/task/direnv__direnv.02040c7/
27 google/brotli Brotli compression format c 14,673 441 90.7% https://programbench.com/task/google__brotli.b3dc9cc/
28 tomnomnom/gron Make JSON greppable! go 14,424 224 90.2% https://programbench.com/task/tomnomnom__gron.88a6234/
29 XAMPPRocky/tokei Count your code, quickly. rs 14,300 732 69.5% https://programbench.com/task/xampprocky__tokei.505d648/
30 ast-grep/ast-grep ⚡A CLI tool for code structural search, lint and rewriting. Written in Rust rs 13,541 882 11.9% https://programbench.com/task/ast-grep__ast-grep.dde0fe0/
31 cheat/cheat cheat allows you to create and view interactive cheatsheets on the command-line. It was designed to help remind *nix system administrators of options for commands that they use frequently, but not frequently enough to remember. go 13,278 297 59.9% https://programbench.com/task/cheat__cheat.b8098dc/
32 jonas/tig Text-mode interface for git c 13,200 1,586 83.9% https://programbench.com/task/jonas__tig.8334123/
33 ninja-build/ninja a small build system with a focus on speed cpp 12,895 1,438 72.3% https://programbench.com/task/ninja-build__ninja.cc60300/
34 Canop/broot A new way to see and navigate directory trees : https://dystroy.org/broot rs 12,619 539 67.0% https://programbench.com/task/canop__broot.d6c798e/
35 orf/gping Ping, but with a graph rs 12,433 339 78.5% https://programbench.com/task/orf__gping.26eb5b9/
36 svenstaro/genact 🌀 A nonsense activity generator rs 11,995 232 59.1% https://programbench.com/task/svenstaro__genact.16f96e3/
37 lz4/lz4 Extremely Fast Compression algorithm c 11,781 1,496 82.7% https://programbench.com/task/lz4__lz4.1519f46/
38 o2sh/onefetch Command-line Git information tool rs 11,745 1,166 81.7% https://programbench.com/task/o2sh__onefetch.e5958ce/
39 bootandy/dust A more intuitive version of du in rust rs 11,609 584 70.9% https://programbench.com/task/bootandy__dust.62bf1e1/
40 ekzhang/bore 🕳 bore is a simple CLI tool for making tunnels to localhost rs 11,075 406 68.7% https://programbench.com/task/ekzhang__bore.8e059cd/
41 BurntSushi/xsv A fast CSV command line toolkit written in Rust. rs 10,757 1,182 82.7% https://programbench.com/task/burntsushi__xsv.f430466/
42 bellard/quickjs Public repository of the QuickJS Javascript Engine. c 10,565 3,034 3.6% https://programbench.com/task/bellard__quickjs.d7ae12a/
43 hatoo/oha Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation. rs 10,201 899 72.5% https://programbench.com/task/hatoo__oha.8dc6349/
44 tstack/lnav Log file navigator cpp 10,200 990 13.4% https://programbench.com/task/tstack__lnav.ee34494/
45 sharkdp/hexyl A command-line hex viewer rs 10,086 906 82.8% https://programbench.com/task/sharkdp__hexyl.2e26437/
46 lua/lua A copy of the Lua development repository, as seen by the Lua team. Mirrored irregularly. All communication should be through the Lua mailing list https://www.lua.org/lua-l.html c 9,908 1,338 43.1% https://programbench.com/task/lua__lua.c6b4848/
47 johnkerl/miller Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON go 9,842 14,637 22.9% https://programbench.com/task/johnkerl__miller.8d85b46/
48 sqlite/sqlite Official Git mirror of the SQLite source tree c 9,434 13,514 67.0% https://programbench.com/task/sqlite__sqlite.839433d/
49 boyter/scc Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go go 8,320 464 37.7% https://programbench.com/task/boyter__scc.515f91c/
50 ariga/atlas Declarative schema migrations with schema-as-code workflows go 8,311 1,318 54.8% https://programbench.com/task/ariga__atlas.6d81150/
51 pemistahl/grex A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases rs 8,103 1,312 73.9% https://programbench.com/task/pemistahl__grex.fa3e8ed/
52 htop-dev/htop htop - an interactive process viewer c 8,021 693 85.1% https://programbench.com/task/htop-dev__htop.523600b/
53 peco/peco Simplistic interactive filtering tool go 7,881 1,224 76.7% https://programbench.com/task/peco__peco.4e58dad/
54 bensadeh/tailspin 🌀 A log file highlighter rs 7,793 615 75.8% https://programbench.com/task/bensadeh__tailspin.6278437/
55 ducaale/xh Friendly and fast tool for sending HTTP requests rs 7,754 1,171 50.0% https://programbench.com/task/ducaale__xh.4a6e44f/
56 svenstaro/miniserve 🌟 For when you really just want to serve some files over HTTP right now! rs 7,561 304 78.6% https://programbench.com/task/svenstaro__miniserve.8449e8b/
57 mgdm/htmlq Like jq, but for HTML. rs 7,520 1,455 93.9% https://programbench.com/task/mgdm__htmlq.6e31bc8/
58 parcel-bundler/lightningcss An extremely fast CSS parser, transformer, bundler, and minifier written in Rust. rs 7,515 2,828 53.6% https://programbench.com/task/parcel-bundler__lightningcss.aa2ed1e/
59 universal-ctags/ctags A maintained ctags implementation c 7,149 2,258 13.3% https://programbench.com/task/universal-ctags__ctags.243595e/
60 chmln/sd Intuitive find & replace CLI (sed alternative) rs 7,072 810 90.9% https://programbench.com/task/chmln__sd.87d1ba5/
61 ogham/dog A command-line DNS client. rs 6,640 1,300 84.2% https://programbench.com/task/ogham__dog.721440b/
62 danmar/cppcheck static analysis of C/C++ code cpp 6,599 2,126 14.6% https://programbench.com/task/danmar__cppcheck.0a5b103/
63 doxygen/doxygen Official doxygen git repository c 6,422 229 34.5% https://programbench.com/task/doxygen__doxygen.966d98e/
64 sharkdp/pastel A command-line tool to generate, analyze, convert and manipulate colors rs 6,334 1,114 77.2% https://programbench.com/task/sharkdp__pastel.b60e899/
65 BLAKE3-team/BLAKE3 the official Rust and C implementations of the BLAKE3 cryptographic hash function rs 6,178 647 97.5% https://programbench.com/task/blake3-team__blake3.15e83a5/
66 Nukesor/pueue :stars: Manage your shell commands. rs 6,154 638 15.4% https://programbench.com/task/nukesor__pueue.8b9d6fe/
67 OSGeo/gdal GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats. cpp 5,875 657 25.4% https://programbench.com/task/osgeo__gdal.0847f12/
68 Byron/dua-cli View disk space usage and delete unwanted data, fast. rs 5,794 709 86.9% https://programbench.com/task/byron__dua-cli.8570c15/
69 dundee/gdu Fast disk usage analyzer with console interface written in Go go 5,578 1,161 70.1% https://programbench.com/task/dundee__gdu.ede21d2/
70 eradman/entr Run arbitrary commands when files change c 5,551 586 88.6% https://programbench.com/task/eradman__entr.8e2e8b4/
71 LuaJIT/LuaJIT Mirror of the LuaJIT git repository c 5,518 2,967 71.5% https://programbench.com/task/luajit__luajit.a553b3d/
72 mgechev/revive 🔥 ~6x faster, stricter, configurable, extensible, and beautiful drop-in replacement for golint go 5,486 727 46.4% https://programbench.com/task/mgechev__revive.201451e/
73 cweill/gotests Automatically generate Go test boilerplate from your source code. go 5,294 603 61.9% https://programbench.com/task/cweill__gotests.2a672c5/
74 cordx56/rustowl Visualize Ownership and Lifetimes in Rust rs 5,113 589 75.2% https://programbench.com/task/cordx56__rustowl.655bc5c/
75 abishekvashok/cmatrix Terminal based “The Matrix” like implementation c 5,042 508 97.0% https://programbench.com/task/abishekvashok__cmatrix.5c082c6/
76 quinn-rs/quinn Async-friendly QUIC implementation in Rust rs 5,041 522 61.7% https://programbench.com/task/quinn-rs__quinn.bb359cc/
77 alecthomas/chroma A general purpose syntax highlighter in pure Go go 4,910 515 15.9% https://programbench.com/task/alecthomas__chroma.8d04def/
78 anordal/shellharden The corrective bash syntax highlighter rs 4,778 1,095 81.7% https://programbench.com/task/anordal__shellharden.6a6ffd4/
79 yoav-lavi/melody Melody is a language that compiles to regular expressions and aims to be more readable and maintainable rs 4,748 1,205 78.9% https://programbench.com/task/yoav-lavi__melody.f4af9b4/
80 sayanarijit/xplr A hackable, minimal, fast TUI file explorer rs 4,735 463 60.5% https://programbench.com/task/sayanarijit__xplr.1751065/
81 hpjansson/chafa 📺🗿 Terminal graphics for the 21st century. c 4,648 1,931 58.4% https://programbench.com/task/hpjansson__chafa.dd4d4c1/
82 jhspetersson/fselect Find files with SQL-like queries rs 4,420 3,115 44.0% https://programbench.com/task/jhspetersson__fselect.c3559ca/
83 ivanceras/svgbob Convert your ascii diagram scribbles into happy little SVG rs 4,182 472 41.3% https://programbench.com/task/ivanceras__svgbob.6d00ad9/
84 multiprocessio/dsq Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. go 3,867 542 80.3% https://programbench.com/task/multiprocessio__dsq.c3ae0ba/
85 rcoh/angle-grinder Slice and dice logs on the command line rs 3,727 1,130 38.0% https://programbench.com/task/rcoh__angle-grinder.9c2fc88/
86 rs/curlie The power of curl, the ease of use of httpie. go 3,637 701 89.3% https://programbench.com/task/rs__curlie.5dfcbb1/
87 antonmedv/walk Terminal file manager go 3,598 470 74.3% https://programbench.com/task/antonmedv__walk.bf802ef/
88 JohannesKaufmann/html-to-markdown ⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. go 3,586 885 85.5% https://programbench.com/task/johanneskaufmann__html-to-markdown.3006818/
89 TheZoraiz/ascii-image-converter A cross-platform command-line tool to convert images into ascii art and print them on the console. Now supports braille art! go 3,284 465 64.1% https://programbench.com/task/thezoraiz__ascii-image-converter.d05a757/
90 hairyhenderson/gomplate A flexible commandline tool for template rendering. Supports lots of local and remote datasources. go 3,135 2,926 74.7% https://programbench.com/task/hairyhenderson__gomplate.05eb3aa/
91 ip7z/7zip 7-Zip cpp 2,967 1,043 33.9% https://programbench.com/task/ip7z__7zip.839151e/
92 madler/pigz A parallel implementation of gzip for modern multi-processor, multi-core machines. c 2,924 831 83.2% https://programbench.com/task/madler__pigz.fe4894f/
93 tinycc/tinycc Unofficial mirror of mob development branch c 2,843 1,978 12.8% https://programbench.com/task/tinycc__tinycc.9b8765d/
94 raviqqe/muffet Fast website link checker in Go go 2,597 293 88.1% https://programbench.com/task/raviqqe__muffet.a882908/
95 segmentio/chamber CLI for managing secrets go 2,588 1,748 82.0% https://programbench.com/task/segmentio__chamber.5f93f5f/
96 astaxie/bat Go implement CLI, cURL-like tool for humans go 2,563 1,091 71.8% https://programbench.com/task/astaxie__bat.17d1080/
97 zk-org/zk Plain text note-taking assistant go 2,542 1,108 43.1% https://programbench.com/task/zk-org__zk.10d93d5/
98 kisielk/errcheck errcheck checks that you checked errors. go 2,480 341 80.4% https://programbench.com/task/kisielk__errcheck.dacab89/
99 mkj/dropbear Dropbear SSH c 2,231 682 58.1% https://programbench.com/task/mkj__dropbear.75f699b/
100 noborus/trdsql CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats. go 2,159 1,312 66.8% https://programbench.com/task/noborus__trdsql.d8c5ff6/
101 sheepla/pingu 🐧ping command but with pingu go 2,087 383 96.6% https://programbench.com/task/sheepla__pingu.926d475/
102 go-critic/go-critic The most opinionated Go source code linter for code audit. go 2,041 493 41.6% https://programbench.com/task/go-critic__go-critic.9aea378/
103 OSGeo/PROJ PROJ - Cartographic Projections and Coordinate Transformations Library cpp 1,974 5,319 73.8% https://programbench.com/task/osgeo__proj.75d455c/
104 noborus/ov 🎑Feature-rich terminal-based text viewer. It is a so-called terminal pager. go 1,935 1,854 87.6% https://programbench.com/task/noborus__ov.b96c2ba/
105 samtools/samtools Tools (written in C using htslib) for manipulating next-generation sequencing data c 1,886 1,425 14.2% https://programbench.com/task/samtools__samtools.aa823b5/
106 gabotechs/dep-tree Tool for helping developers keep their code bases clean and decoupled. It allows visualising a code base complexity using a 3d force-directed graph of files and the dependencies between them. go 1,706 865 65.2% https://programbench.com/task/gabotechs__dep-tree.60a95a2/
107 cmatsuoka/figlet Claudio’s FIGlet tree c 1,606 872 77.5% https://programbench.com/task/cmatsuoka__figlet.202a0a8/
108 lh3/seqtk Toolkit for processing sequences in FASTA/Q formats c 1,537 429 67.4% https://programbench.com/task/lh3__seqtk.94e7070/
109 tukaani-project/xz XZ Utils c 1,522 1,410 36.0% https://programbench.com/task/tukaani-project__xz.1007bf0/
110 skeema/skeema Declarative pure-SQL schema management for MySQL and MariaDB go 1,361 1,708 76.5% https://programbench.com/task/skeema__skeema.6a76243/
111 mfridman/tparse CLI tool for summarizing go test output. Pipe friendly. CI/CD friendly. go 1,246 425 77.6% https://programbench.com/task/mfridman__tparse.2416b4b/
112 lfos/calcurse A text-based calendar and scheduling application c 1,243 666 53.8% https://programbench.com/task/lfos__calcurse.49180d5/
113 hooklift/gowsdl WSDL2Go code generation as well as its SOAP proxy go 1,219 391 86.4% https://programbench.com/task/hooklift__gowsdl.2a06cec/
114 guumaster/hostctl Your dev tool to manage /etc/hosts like a pro! go 1,216 1,051 82.8% https://programbench.com/task/guumaster__hostctl.d6d9699/
115 rs/jplot iTerm2 expvar/JSON monitoring tool go 1,178 583 89.0% https://programbench.com/task/rs__jplot.2a54bcc/
116 naggie/dstask Git powered terminal-based todo/note manager – markdown note page per task. Single binary! go 1,157 1,278 58.8% https://programbench.com/task/naggie__dstask.ff57396/
117 sigoden/argc A Bash CLI framework, also a Bash command runner. rs 1,135 995 44.1% https://programbench.com/task/sigoden__argc.04a08f1/
118 sibprogrammer/xq Command-line XML and HTML beautifier and content extractor go 1,109 792 75.9% https://programbench.com/task/sibprogrammer__xq.b89f681/
119 xorg62/tty-clock Clock using lib ncurses c 1,105 281 84.0% https://programbench.com/task/xorg62__tty-clock.f2f847c/
120 unhappychoice/gittype A CLI code-typing game that turns your source code into typing challenges rs 1,075 741 91.3% https://programbench.com/task/unhappychoice__gittype.34b72d0/
121 eudoxia0/hashcards A plain text-based spaced repetition system. rs 1,071 1,151 56.3% https://programbench.com/task/eudoxia0__hashcards.48aa136/
122 rvben/rumdl Fast Markdown linter and formatter written in Rust rs 1,051 3,322 40.7% https://programbench.com/task/rvben__rumdl.2d75c4d/
123 sclevine/yj CLI - Convert between YAML, TOML, JSON, and HCL. Preserves map order. go 1,041 767 74.4% https://programbench.com/task/sclevine__yj.8016400/
124 arq5x/bedtools2 bedtools - the swiss army knife for genome arithmetic c 1,029 1,053 38.9% https://programbench.com/task/arq5x__bedtools2.dd57059/
125 cslarsen/jp2a Converts jpg images to ASCII c 1,021 631 56.1% https://programbench.com/task/cslarsen__jp2a.61d205f/
126 blacknon/hwatch A modern alternative to the watch command, records the differences in execution results and can check this differences at after. rs 1,016 1,016 81.1% https://programbench.com/task/blacknon__hwatch.edfcb62/
127 eliukblau/pixterm Draw images in your ANSI terminal with true color go 1,014 430 74.9% https://programbench.com/task/eliukblau__pixterm.1a93fd5/
128 Canop/rhit A nginx log explorer rs 1,006 817 53.2% https://programbench.com/task/canop__rhit.ae90bcb/
129 stathissideris/ditaa ditaa is a small command-line utility that can convert diagrams drawn using ascii art (‘drawings’ that contain characters that resemble lines like | / - ), into proper bitmap graphics. java 1,005 609 20.4% https://programbench.com/task/stathissideris__ditaa.f2286c4/
130 rbakbashev/elfcat ELF visualizer. Generates HTML files from ELF binaries. rs 990 564 98.2% https://programbench.com/task/rbakbashev__elfcat.52f8cc7/
131 nuta/nsh A command-line shell like fish, but POSIX compatible. rs 966 1,963 83.7% https://programbench.com/task/nuta__nsh.bdd0702/
132 dalance/amber A code search / replace tool rs 941 567 71.1% https://programbench.com/task/dalance__amber.69a0f52/
133 pls-rs/pls pls is a prettier and powerful ls(1) for the pros. rs 932 332 62.3% https://programbench.com/task/pls-rs__pls.4e1ae50/
134 Esubaalew/run Universal multi-language runner and smart REPL written in Rust. rs 919 1,212 85.2% https://programbench.com/task/esubaalew__run.0fb9dec/
135 chirlu/sox SoX, Swiss Army knife of sound processing c 913 1,202 37.9% https://programbench.com/task/chirlu__sox.42b3557/
136 clog-tool/clog-cli Generate beautiful changelogs from your Git commit history rs 912 575 93.0% https://programbench.com/task/clog-tool__clog-cli.7066cba/
137 tarka/xcp An extended cp rs 911 1,184 92.6% https://programbench.com/task/tarka__xcp.5e5b448/
138 oppiliappan/eva a calculator REPL, similar to bc(1) rs 907 913 88.7% https://programbench.com/task/oppiliappan__eva.41ae245/
139 git-bahn/git-graph Command line tool to show clear git graphs arranged for your branching model rs 904 568 79.6% https://programbench.com/task/git-bahn__git-graph.87b4473/
140 gromacs/gromacs Public/backup repository of the GROMACS molecular simulation toolkit. Please do not mine the metadata blindly; we use https://gitlab.com/gromacs/gromacs for code review and issue tracking. cpp 901 1,245 9.3% https://programbench.com/task/gromacs__gromacs.665ea4c/
141 sirwart/ripsecrets A command-line tool to prevent committing secret keys into your source code rs 901 611 72.8% https://programbench.com/task/sirwart__ripsecrets.34c9e03/
142 Drew-Alleman/DataSurgeon Quickly Extracts IP’s, Email Addresses, Hashes, Files, Credit Cards, Social Security Numbers and a lot More From Text rs 890 502 74.3% https://programbench.com/task/drew-alleman__datasurgeon.d257cee/
143 alexpovel/srgn A grep-like tool which understands source code syntax and allows for manipulation in addition to search rs 889 1,852 69.5% https://programbench.com/task/alexpovel__srgn.89f943b/
144 kyoheiu/felix tui file manager with vim-like key mapping rs 888 502 49.2% https://programbench.com/task/kyoheiu__felix.95df390/
145 oppiliappan/statix lints and suggestions for the nix programming language rs 882 815 42.8% https://programbench.com/task/oppiliappan__statix.e9df54c/
146 nachoparker/dutree a tool to analyze file system usage written in Rust rs 871 641 89.5% https://programbench.com/task/nachoparker__dutree.44e877d/
147 simeg/eureka 💡 CLI tool to input and store your ideas without leaving the terminal rs 867 344 78.8% https://programbench.com/task/simeg__eureka.df3796c/
148 kyoh86/richgo Enrich go test outputs with text decorations. go 863 546 85.0% https://programbench.com/task/kyoh86__richgo.313114f/
149 rochacbruno/marmite Markdown makes sites - A Static Site Generator for Blogs rs 837 668 45.4% https://programbench.com/task/rochacbruno__marmite.7d4bc2d/
150 rust-embedded/svd2rust Generate Rust register maps (structs) from SVD files rs 835 920 72.9% https://programbench.com/task/rust-embedded__svd2rust.1760b5e/
151 konradsz/igrep Interactive Grep rs 827 385 73.5% https://programbench.com/task/konradsz__igrep.aa75630/
152 nikolassv/bartib A simple timetracker for the command line. It saves a log of all tracked activities as a plaintext file and allows you to create flexible reports. rs 827 722 87.3% https://programbench.com/task/nikolassv__bartib.6b9b5ce/
153 yassinebridi/serpl A simple terminal UI for search and replace, ala VS Code. rs 824 446 61.0% https://programbench.com/task/yassinebridi__serpl.c48a9d7/
154 riquito/tuc When cut doesn’t cut it rs 820 1,196 92.7% https://programbench.com/task/riquito__tuc.16fb471/
155 ecumene/rust-sloth A 3D software rasterizer… for the terminal! rs 818 380 52.6% https://programbench.com/task/ecumene__rust-sloth.051c559/
156 crowdagger/crowbook Converts books written in Markdown to HTML, LaTeX/PDF and EPUB rs 813 807 60.3% https://programbench.com/task/crowdagger__crowbook.ea214d7/
157 WGUNDERWOOD/tex-fmt An extremely fast LaTeX formatter written in Rust rs 789 455 80.7% https://programbench.com/task/wgunderwood__tex-fmt.3f1aef6/
158 Stranger6667/jsonschema A high-performance JSON Schema validator for Rust rs 770 2,933 51.7% https://programbench.com/task/stranger6667__jsonschema.d52e881/
159 rhysd/kiro-editor A small terminal UTF-8 text editor written in Rust 📝🦀 rs 761 595 93.3% https://programbench.com/task/rhysd__kiro-editor.4157485/
160 astro/deadnix Scan Nix files for dead code rs 745 602 85.5% https://programbench.com/task/astro__deadnix.d590041/
161 sstadick/hck A sharp cut(1) clone. rs 738 855 95.7% https://programbench.com/task/sstadick__hck.b66c751/
162 trasta298/keifu Git genealogy, untangled. A TUI for navigating commit graphs with color and clarity. rs 729 262 67.2% https://programbench.com/task/trasta298__keifu.3331426/
163 AmmarAbouZor/tui-journal Your journal app if you live in a terminal rs 722 1,402 70.8% https://programbench.com/task/ammarabouzor__tui-journal.2b4540d/
164 incu6us/goimports-reviser Right imports sorting & code formatting tool (goimports alternative) go 715 513 86.4% https://programbench.com/task/incu6us__goimports-reviser.81bd549/
165 yaa110/nomino Batch rename utility for developers rs 710 313 79.9% https://programbench.com/task/yaa110__nomino.f892499/
166 wfxr/csview 📠 Pretty and fast csv viewer for cli with cjk/emoji support. rs 694 335 96.1% https://programbench.com/task/wfxr__csview.8ac4de0/
167 chmln/handlr A better xdg-utils rs 693 722 90.7% https://programbench.com/task/chmln__handlr.90e78ba/
168 Miserlou/Loop UNIX’s missing loop command rs 692 710 94.6% https://programbench.com/task/miserlou__loop.209927c/
169 KSXGitHub/parallel-disk-usage Highly parallelized, blazing fast directory tree analyzer rs 689 531 86.1% https://programbench.com/task/ksxgithub__parallel-disk-usage.96978ed/
170 hush-shell/hush Hush is a unix shell based on the Lua programming language rs 688 1,201 83.3% https://programbench.com/task/hush-shell__hush.560c33a/
171 zevv/duc Dude, where are my bytes: Duc, a library and suite of tools for inspecting disk usage c 682 874 83.4% https://programbench.com/task/zevv__duc.a58fa4e/
172 altdesktop/i3-style 🎨 Make your i3 config a little more stylish. rs 678 539 80.0% https://programbench.com/task/altdesktop__i3-style.f93821b/
173 wintermute-cell/ngrrram A TUI tool to help you type faster and learn new layouts. Includes a free cat. rs 674 303 84.5% https://programbench.com/task/wintermute-cell__ngrrram.8ea13c3/
174 psampaz/go-mod-outdated Find outdated dependencies of your Go projects. go-mod-outdated provides a table view of the go list -u -m -json all command which lists all dependencies of a Go project and their available minor and patch updates. It also provides a way to filter indirect dependencies and dependencies without updates. go 669 285 98.2% https://programbench.com/task/psampaz__go-mod-outdated.bb79367/
175 wfxr/code-minimap 🛰 A high performance code minimap render. rs 660 313 88.8% https://programbench.com/task/wfxr__code-minimap.0ddeea5/
176 kaushiksrini/parqeye Peek inside Parquet files right from your terminal rs 654 479 58.9% https://programbench.com/task/kaushiksrini__parqeye.8072121/
177 stacked-git/stgit Stacked Git rs 652 1,488 20.0% https://programbench.com/task/stacked-git__stgit.430027d/
178 Isona/dirble Fast directory scanning and scraping tool rs 632 718 66.7% https://programbench.com/task/isona__dirble.e2dea9f/
179 YS-L/flamelens Flamegraph viewer in the terminal rs 622 224 59.4% https://programbench.com/task/ys-l__flamelens.0b4dc33/
180 mookid/diffr Yet another diff highlighting tool rs 612 606 84.7% https://programbench.com/task/mookid__diffr.2152742/
181 shashwatah/jot ⚡Rapid note management for the terminal. rs 609 752 84.6% https://programbench.com/task/shashwatah__jot.a92aad8/
182 Epistates/treemd A (TUI/CLI) markdown navigator with tree-based structural navigation. rs 603 1,569 55.1% https://programbench.com/task/epistates__treemd.825c6dd/
183 pier-cli/pier A CLI to organize and run short Unix shell scripts rs 596 692 83.7% https://programbench.com/task/pier-cli__pier.5e1bde9/
184 jrnxf/thokr ✨ sleek typing tui with visualized results and historical logging rs 595 445 82.2% https://programbench.com/task/jrnxf__thokr.09375ef/
185 ismaelgv/rnr A command-line tool to batch rename files and directories rs 581 683 82.1% https://programbench.com/task/ismaelgv__rnr.fc0733b/
186 sitkevij/hex 🔮 Futuristic take on hexdump, made in Rust. rs 563 823 91.7% https://programbench.com/task/sitkevij__hex.61ae69b/
187 brocode/fblog Small command-line JSON Log viewer rs 561 978 86.0% https://programbench.com/task/brocode__fblog.3b54330/
188 codesnap-rs/codesnap 🦀️📸 Pure Rust tool to generate beautiful code snapshots, provide CLI and Library rs 557 730 59.2% https://programbench.com/task/codesnap-rs__codesnap.f81e4f3/
189 foriequal0/git-trim Automatically trims your branches whose tracking remote refs are merged or stray rs 548 509 64.6% https://programbench.com/task/foriequal0__git-trim.07c2f50/
190 axodotdev/oranda 🎁 generate beautiful landing pages for your developer tools rs 542 767 53.6% https://programbench.com/task/axodotdev__oranda.27d60c7/
191 elkowar/pipr A tool to interactively write shell pipelines. rs 541 525 57.1% https://programbench.com/task/elkowar__pipr.fae0b17/
192 paradigmxyz/solar Blazingly fast, modular and contributor friendly Solidity compiler, written in Rust rs 539 1,978 43.3% https://programbench.com/task/paradigmxyz__solar.5190d0e/
193 Lymphatus/caesium-clt Caesium Command Line Tools - Lossy/lossless image compression tool rs 537 575 92.3% https://programbench.com/task/lymphatus__caesium-clt.a529b2e/
194 agourlay/zip-password-finder Find the password of protected ZIP files. rs 534 680 97.9% https://programbench.com/task/agourlay__zip-password-finder.704700d/
195 rust-ethereum/ethabi Encode and decode smart contract invocations rs 525 997 90.9% https://programbench.com/task/rust-ethereum__ethabi.b1710ad/
196 ArthurSonzogni/json-tui A JSON terminal UI made in C++ cpp 438 755 71.0% https://programbench.com/task/arthursonzogni__json-tui.17a22b6/
197 tomarrell/wrapcheck A Go linter to check that errors from external packages are wrapped go 374 480 80.8% https://programbench.com/task/tomarrell__wrapcheck.c058da1/
198 NikolaDucak/caps-log A small TUI journaling tool. 📖 cpp 370 551 61.7% https://programbench.com/task/nikoladucak__caps-log.2cf2d1e/
199 mibk/dupl a tool for code clone detection go 367 373 85.0% https://programbench.com/task/mibk__dupl.1bf052b/
200 HaliteChallenge/Halite @twosigma’s first artificial intelligence programming challenge cpp 202 275 80.4% https://programbench.com/task/halitechallenge__halite.822cfb6/

怎么看这组数据

ProgramBench 的主榜单里,9 个模型的 Resolved 都是 0%。这说明在统一的轻量级 agent 设置下,当前模型还不能稳定从黑箱行为和文档中重建完整软件。

Almost resolved 仍然有区分度。Claude Opus 4.7 达到 3.0%,Claude Opus 4.6 为 2.5%,Claude Sonnet 4.6 为 1.0%,其余模型为 0.0%。这类指标更适合观察“接近完成”的能力,而不是只看是否完全通关。

任务实例表也很关键。它把每个开源项目的语言、星标数、测试数量和当前最佳得分列出来,可以看出 ProgramBench 覆盖了压缩、搜索、数据库、编译器、命令行工具、媒体处理等不同类型的软件。对 AI Coding 来说,这比单纯算法题更接近真实工程压力。

记录并分享
使用 Hugo 构建
主题 StackJimmy 设计