ProgramBench es un nuevo benchmark para evaluar la capacidad de programación con IA. No pide al modelo que arregle un bug dentro de un repositorio existente, sino que reconstruya desde cero un programa con comportamiento equivalente a partir de un ejecutable compilado y su documentación de uso.
Este artículo funciona como referencia de datos, con una explicación mínima. Las tablas siguientes conservan los registros originales publicados en el sitio de ProgramBench para facilitar citas y comparaciones posteriores. Las fuentes son ProgramBench homepage, Extended Results y Task Instances. Los datos se obtuvieron en 2026-05-10T12:42:41+08:00.
Criterios de los datos
Resolved: proporción de tareas que pasan por completo las pruebas de comportamiento ocultas.Almost resolved: proporción de tareas que pasan al menos el 95% de las pruebas de comportamiento.Cost: coste medio de API por instancia de tarea, en dólares estadounidenses.Calls: número medio de llamadas al LLM por instancia de tarea.- Todos los modelos se evaluaron con
mini-SWE-agentsobre 200 tareas.
Leaderboard principal
| # | Model | Provider | Agent | Resolved | Almost resolved | Run |
|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | mini-SWE-agent | 0% | 3.0% | https://programbench.com/run/claude-opus-4-7/ |
| 2 | Claude Opus 4.6 | Anthropic | mini-SWE-agent | 0% | 2.5% | https://programbench.com/run/claude-opus-4-6/ |
| 3 | Claude Sonnet 4.6 | Anthropic | mini-SWE-agent | 0% | 1.0% | https://programbench.com/run/claude-sonnet-4-6/ |
| 4 | GPT 5.4 | OpenAI | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/gpt-5-4/ |
| 5 | Gemini 3.1 Pro | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/gemini-3-1-pro/ | |
| 6 | Gemini 3 Flash | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/gemini-3-flash/ | |
| 7 | Claude Haiku 4.5 | Anthropic | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/claude-haiku-4-5/ |
| 8 | GPT 5.4 mini | OpenAI | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/gpt-5-4-mini/ |
| 9 | GPT 5 mini | OpenAI | mini-SWE-agent | 0% | 0.0% | https://programbench.com/run/gpt-5-mini/ |
Resultados extendidos
| # | Model | Provider | Agent | Resolved | Almost resolved | Cost | Calls | Run |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | mini-SWE-agent | 0% | 3.0% | $3.81 | 93 | https://programbench.com/run/claude-opus-4-7/ |
| 2 | Claude Opus 4.6 | Anthropic | mini-SWE-agent | 0% | 2.5% | $11.38 | 260 | https://programbench.com/run/claude-opus-4-6/ |
| 3 | Claude Sonnet 4.6 | Anthropic | mini-SWE-agent | 0% | 1.0% | $26.73 | 472 | https://programbench.com/run/claude-sonnet-4-6/ |
| 4 | GPT 5.4 | OpenAI | mini-SWE-agent | 0% | 0.0% | $0.33 | 16 | https://programbench.com/run/gpt-5-4/ |
| 5 | Gemini 3.1 Pro | mini-SWE-agent | 0% | 0.0% | $1.51 | 94 | https://programbench.com/run/gemini-3-1-pro/ | |
| 6 | Gemini 3 Flash | mini-SWE-agent | 0% | 0.0% | $0.30 | 85 | https://programbench.com/run/gemini-3-flash/ | |
| 7 | Claude Haiku 4.5 | Anthropic | mini-SWE-agent | 0% | 0.0% | $0.80 | 124 | https://programbench.com/run/claude-haiku-4-5/ |
| 8 | GPT 5.4 mini | OpenAI | mini-SWE-agent | 0% | 0.0% | $0.04 | 18 | https://programbench.com/run/gpt-5-4-mini/ |
| 9 | GPT 5 mini | OpenAI | mini-SWE-agent | 0% | 0.0% | $0.03 | 15 | https://programbench.com/run/gpt-5-mini/ |
Registros originales de las 200 instancias de tareas
| # | Repository | Description | Lang | Stars | Tests | Best Score | Task |
|---|---|---|---|---|---|---|---|
| 1 | junegunn/fzf | :cherry_blossom: A command-line fuzzy finder | go | 79,721 | 1,874 | 81.9% | https://programbench.com/task/junegunn__fzf.b56d614/ |
| 2 | jesseduffield/lazygit | simple terminal UI for git commands | go | 76,901 | 855 | 56.4% | https://programbench.com/task/jesseduffield__lazygit.1d0db51/ |
| 3 | BurntSushi/ripgrep | ripgrep recursively searches directories for a regex pattern while respecting your gitignore | rs | 62,855 | 1,994 | 79.7% | https://programbench.com/task/burntsushi__ripgrep.3b7fd44/ |
| 4 | FFmpeg/FFmpeg | Mirror of https://git.ffmpeg.org/ffmpeg.git | c | 59,217 | 3,050 | 5.3% | https://programbench.com/task/ffmpeg__ffmpeg.360a402/ |
| 5 | sharkdp/bat | A cat(1) clone with wings. | rs | 58,487 | 801 | 33.2% | https://programbench.com/task/sharkdp__bat.f822bd0/ |
| 6 | typst/typst | A markup-based typesetting system that is powerful and easy to learn. | rs | 52,957 | 1,724 | 28.0% | https://programbench.com/task/typst__typst.88356d0/ |
| 7 | jgm/pandoc | Universal markup converter | hs | 43,632 | 5,228 | 14.1% | https://programbench.com/task/jgm__pandoc.5caad90/ |
| 8 | sharkdp/fd | A simple, fast and user-friendly alternative to ‘find’ | rs | 42,668 | 1,235 | 78.1% | https://programbench.com/task/sharkdp__fd.40d8eb3/ |
| 9 | php/php-src | The PHP Interpreter | c | 40,030 | 14,288 | 4.8% | https://programbench.com/task/php__php-src.c891263/ |
| 10 | duckdb/duckdb | DuckDB is an analytical in-process SQL database management system | cpp | 37,657 | 5,650 | 12.4% | https://programbench.com/task/duckdb__duckdb.bdb65ec/ |
| 11 | ajeetdsouza/zoxide | A smarter cd command. Supports all major shells. | rs | 35,994 | 531 | 76.5% | https://programbench.com/task/ajeetdsouza__zoxide.67ca1bc/ |
| 12 | jqlang/jq | Command-line JSON processor | c | 34,541 | 6,072 | 89.9% | https://programbench.com/task/jqlang__jq.b33a763/ |
| 13 | dandavison/delta | A syntax-highlighting pager for git, diff, grep, rg –json, and blame output | rs | 30,445 | 950 | 37.3% | https://programbench.com/task/dandavison__delta.acd758f/ |
| 14 | sharkdp/hyperfine | A command-line benchmarking tool | rs | 27,960 | 291 | 54.3% | https://programbench.com/task/sharkdp__hyperfine.327d5f4/ |
| 15 | ggreer/the_silver_searcher | A code-searching tool similar to ack, but faster. | c | 27,080 | 1,006 | 59.3% | https://programbench.com/task/ggreer__the_silver_searcher.a61f178/ |
| 16 | facebook/zstd | Zstandard - Fast real-time compression algorithm | c | 27,013 | 2,038 | 68.8% | https://programbench.com/task/facebook__zstd.1168da0/ |
| 17 | facebookresearch/fastText | Library for fast text representation and classification. | cpp | 26,511 | 312 | 75.6% | https://programbench.com/task/facebookresearch__fasttext.1142dc4/ |
| 18 | robertdavidgraham/masscan | TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes. | c | 25,544 | 2,549 | 57.0% | https://programbench.com/task/robertdavidgraham__masscan.b99d433/ |
| 19 | tree-sitter/tree-sitter | An incremental parsing system for programming tools | rs | 24,953 | 1,232 | 37.2% | https://programbench.com/task/tree-sitter__tree-sitter.5e23cca/ |
| 20 | FiloSottile/age | A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability. | go | 22,077 | 676 | 63.5% | https://programbench.com/task/filosottile__age.706dfc1/ |
| 21 | rust-lang/mdBook | Create book from markdown files. Like Gitbook but implemented in Rust | rs | 21,541 | 1,114 | 55.5% | https://programbench.com/task/rust-lang__mdbook.37273ba/ |
| 22 | jarun/nnn | n³ The unorthodox terminal file manager | c | 21,506 | 477 | 98.1% | https://programbench.com/task/jarun__nnn.cb2c535/ |
| 23 | antonmedv/fx | Terminal JSON viewer & processor | go | 20,433 | 2,047 | 75.7% | https://programbench.com/task/antonmedv__fx.86d0d34/ |
| 24 | mikefarah/yq | yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor | go | 15,281 | 2,000 | 39.5% | https://programbench.com/task/mikefarah__yq.602586d/ |
| 25 | Y2Z/monolith | ⬛️ CLI tool and library for saving complete web pages as a single HTML file | rs | 15,024 | 713 | 51.2% | https://programbench.com/task/y2z__monolith.8702e66/ |
| 26 | direnv/direnv | unclutter your .profile | go | 14,998 | 849 | 62.0% | https://programbench.com/task/direnv__direnv.02040c7/ |
| 27 | google/brotli | Brotli compression format | c | 14,673 | 441 | 90.7% | https://programbench.com/task/google__brotli.b3dc9cc/ |
| 28 | tomnomnom/gron | Make JSON greppable! | go | 14,424 | 224 | 90.2% | https://programbench.com/task/tomnomnom__gron.88a6234/ |
| 29 | XAMPPRocky/tokei | Count your code, quickly. | rs | 14,300 | 732 | 69.5% | https://programbench.com/task/xampprocky__tokei.505d648/ |
| 30 | ast-grep/ast-grep | ⚡A CLI tool for code structural search, lint and rewriting. Written in Rust | rs | 13,541 | 882 | 11.9% | https://programbench.com/task/ast-grep__ast-grep.dde0fe0/ |
| 31 | cheat/cheat | cheat allows you to create and view interactive cheatsheets on the command-line. It was designed to help remind *nix system administrators of options for commands that they use frequently, but not frequently enough to remember. | go | 13,278 | 297 | 59.9% | https://programbench.com/task/cheat__cheat.b8098dc/ |
| 32 | jonas/tig | Text-mode interface for git | c | 13,200 | 1,586 | 83.9% | https://programbench.com/task/jonas__tig.8334123/ |
| 33 | ninja-build/ninja | a small build system with a focus on speed | cpp | 12,895 | 1,438 | 72.3% | https://programbench.com/task/ninja-build__ninja.cc60300/ |
| 34 | Canop/broot | A new way to see and navigate directory trees : https://dystroy.org/broot | rs | 12,619 | 539 | 67.0% | https://programbench.com/task/canop__broot.d6c798e/ |
| 35 | orf/gping | Ping, but with a graph | rs | 12,433 | 339 | 78.5% | https://programbench.com/task/orf__gping.26eb5b9/ |
| 36 | svenstaro/genact | 🌀 A nonsense activity generator | rs | 11,995 | 232 | 59.1% | https://programbench.com/task/svenstaro__genact.16f96e3/ |
| 37 | lz4/lz4 | Extremely Fast Compression algorithm | c | 11,781 | 1,496 | 82.7% | https://programbench.com/task/lz4__lz4.1519f46/ |
| 38 | o2sh/onefetch | Command-line Git information tool | rs | 11,745 | 1,166 | 81.7% | https://programbench.com/task/o2sh__onefetch.e5958ce/ |
| 39 | bootandy/dust | A more intuitive version of du in rust | rs | 11,609 | 584 | 70.9% | https://programbench.com/task/bootandy__dust.62bf1e1/ |
| 40 | ekzhang/bore | 🕳 bore is a simple CLI tool for making tunnels to localhost | rs | 11,075 | 406 | 68.7% | https://programbench.com/task/ekzhang__bore.8e059cd/ |
| 41 | BurntSushi/xsv | A fast CSV command line toolkit written in Rust. | rs | 10,757 | 1,182 | 82.7% | https://programbench.com/task/burntsushi__xsv.f430466/ |
| 42 | bellard/quickjs | Public repository of the QuickJS Javascript Engine. | c | 10,565 | 3,034 | 3.6% | https://programbench.com/task/bellard__quickjs.d7ae12a/ |
| 43 | hatoo/oha | Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation. | rs | 10,201 | 899 | 72.5% | https://programbench.com/task/hatoo__oha.8dc6349/ |
| 44 | tstack/lnav | Log file navigator | cpp | 10,200 | 990 | 13.4% | https://programbench.com/task/tstack__lnav.ee34494/ |
| 45 | sharkdp/hexyl | A command-line hex viewer | rs | 10,086 | 906 | 82.8% | https://programbench.com/task/sharkdp__hexyl.2e26437/ |
| 46 | lua/lua | A copy of the Lua development repository, as seen by the Lua team. Mirrored irregularly. All communication should be through the Lua mailing list https://www.lua.org/lua-l.html | c | 9,908 | 1,338 | 43.1% | https://programbench.com/task/lua__lua.c6b4848/ |
| 47 | johnkerl/miller | Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON | go | 9,842 | 14,637 | 22.9% | https://programbench.com/task/johnkerl__miller.8d85b46/ |
| 48 | sqlite/sqlite | Official Git mirror of the SQLite source tree | c | 9,434 | 13,514 | 67.0% | https://programbench.com/task/sqlite__sqlite.839433d/ |
| 49 | boyter/scc | Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go | go | 8,320 | 464 | 37.7% | https://programbench.com/task/boyter__scc.515f91c/ |
| 50 | ariga/atlas | Declarative schema migrations with schema-as-code workflows | go | 8,311 | 1,318 | 54.8% | https://programbench.com/task/ariga__atlas.6d81150/ |
| 51 | pemistahl/grex | A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases | rs | 8,103 | 1,312 | 73.9% | https://programbench.com/task/pemistahl__grex.fa3e8ed/ |
| 52 | htop-dev/htop | htop - an interactive process viewer | c | 8,021 | 693 | 85.1% | https://programbench.com/task/htop-dev__htop.523600b/ |
| 53 | peco/peco | Simplistic interactive filtering tool | go | 7,881 | 1,224 | 76.7% | https://programbench.com/task/peco__peco.4e58dad/ |
| 54 | bensadeh/tailspin | 🌀 A log file highlighter | rs | 7,793 | 615 | 75.8% | https://programbench.com/task/bensadeh__tailspin.6278437/ |
| 55 | ducaale/xh | Friendly and fast tool for sending HTTP requests | rs | 7,754 | 1,171 | 50.0% | https://programbench.com/task/ducaale__xh.4a6e44f/ |
| 56 | svenstaro/miniserve | 🌟 For when you really just want to serve some files over HTTP right now! | rs | 7,561 | 304 | 78.6% | https://programbench.com/task/svenstaro__miniserve.8449e8b/ |
| 57 | mgdm/htmlq | Like jq, but for HTML. | rs | 7,520 | 1,455 | 93.9% | https://programbench.com/task/mgdm__htmlq.6e31bc8/ |
| 58 | parcel-bundler/lightningcss | An extremely fast CSS parser, transformer, bundler, and minifier written in Rust. | rs | 7,515 | 2,828 | 53.6% | https://programbench.com/task/parcel-bundler__lightningcss.aa2ed1e/ |
| 59 | universal-ctags/ctags | A maintained ctags implementation | c | 7,149 | 2,258 | 13.3% | https://programbench.com/task/universal-ctags__ctags.243595e/ |
| 60 | chmln/sd | Intuitive find & replace CLI (sed alternative) | rs | 7,072 | 810 | 90.9% | https://programbench.com/task/chmln__sd.87d1ba5/ |
| 61 | ogham/dog | A command-line DNS client. | rs | 6,640 | 1,300 | 84.2% | https://programbench.com/task/ogham__dog.721440b/ |
| 62 | danmar/cppcheck | static analysis of C/C++ code | cpp | 6,599 | 2,126 | 14.6% | https://programbench.com/task/danmar__cppcheck.0a5b103/ |
| 63 | doxygen/doxygen | Official doxygen git repository | c | 6,422 | 229 | 34.5% | https://programbench.com/task/doxygen__doxygen.966d98e/ |
| 64 | sharkdp/pastel | A command-line tool to generate, analyze, convert and manipulate colors | rs | 6,334 | 1,114 | 77.2% | https://programbench.com/task/sharkdp__pastel.b60e899/ |
| 65 | BLAKE3-team/BLAKE3 | the official Rust and C implementations of the BLAKE3 cryptographic hash function | rs | 6,178 | 647 | 97.5% | https://programbench.com/task/blake3-team__blake3.15e83a5/ |
| 66 | Nukesor/pueue | :stars: Manage your shell commands. | rs | 6,154 | 638 | 15.4% | https://programbench.com/task/nukesor__pueue.8b9d6fe/ |
| 67 | OSGeo/gdal | GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats. | cpp | 5,875 | 657 | 25.4% | https://programbench.com/task/osgeo__gdal.0847f12/ |
| 68 | Byron/dua-cli | View disk space usage and delete unwanted data, fast. | rs | 5,794 | 709 | 86.9% | https://programbench.com/task/byron__dua-cli.8570c15/ |
| 69 | dundee/gdu | Fast disk usage analyzer with console interface written in Go | go | 5,578 | 1,161 | 70.1% | https://programbench.com/task/dundee__gdu.ede21d2/ |
| 70 | eradman/entr | Run arbitrary commands when files change | c | 5,551 | 586 | 88.6% | https://programbench.com/task/eradman__entr.8e2e8b4/ |
| 71 | LuaJIT/LuaJIT | Mirror of the LuaJIT git repository | c | 5,518 | 2,967 | 71.5% | https://programbench.com/task/luajit__luajit.a553b3d/ |
| 72 | mgechev/revive | 🔥 ~6x faster, stricter, configurable, extensible, and beautiful drop-in replacement for golint | go | 5,486 | 727 | 46.4% | https://programbench.com/task/mgechev__revive.201451e/ |
| 73 | cweill/gotests | Automatically generate Go test boilerplate from your source code. | go | 5,294 | 603 | 61.9% | https://programbench.com/task/cweill__gotests.2a672c5/ |
| 74 | cordx56/rustowl | Visualize Ownership and Lifetimes in Rust | rs | 5,113 | 589 | 75.2% | https://programbench.com/task/cordx56__rustowl.655bc5c/ |
| 75 | abishekvashok/cmatrix | Terminal based “The Matrix” like implementation | c | 5,042 | 508 | 97.0% | https://programbench.com/task/abishekvashok__cmatrix.5c082c6/ |
| 76 | quinn-rs/quinn | Async-friendly QUIC implementation in Rust | rs | 5,041 | 522 | 61.7% | https://programbench.com/task/quinn-rs__quinn.bb359cc/ |
| 77 | alecthomas/chroma | A general purpose syntax highlighter in pure Go | go | 4,910 | 515 | 15.9% | https://programbench.com/task/alecthomas__chroma.8d04def/ |
| 78 | anordal/shellharden | The corrective bash syntax highlighter | rs | 4,778 | 1,095 | 81.7% | https://programbench.com/task/anordal__shellharden.6a6ffd4/ |
| 79 | yoav-lavi/melody | Melody is a language that compiles to regular expressions and aims to be more readable and maintainable | rs | 4,748 | 1,205 | 78.9% | https://programbench.com/task/yoav-lavi__melody.f4af9b4/ |
| 80 | sayanarijit/xplr | A hackable, minimal, fast TUI file explorer | rs | 4,735 | 463 | 60.5% | https://programbench.com/task/sayanarijit__xplr.1751065/ |
| 81 | hpjansson/chafa | 📺🗿 Terminal graphics for the 21st century. | c | 4,648 | 1,931 | 58.4% | https://programbench.com/task/hpjansson__chafa.dd4d4c1/ |
| 82 | jhspetersson/fselect | Find files with SQL-like queries | rs | 4,420 | 3,115 | 44.0% | https://programbench.com/task/jhspetersson__fselect.c3559ca/ |
| 83 | ivanceras/svgbob | Convert your ascii diagram scribbles into happy little SVG | rs | 4,182 | 472 | 41.3% | https://programbench.com/task/ivanceras__svgbob.6d00ad9/ |
| 84 | multiprocessio/dsq | Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. | go | 3,867 | 542 | 80.3% | https://programbench.com/task/multiprocessio__dsq.c3ae0ba/ |
| 85 | rcoh/angle-grinder | Slice and dice logs on the command line | rs | 3,727 | 1,130 | 38.0% | https://programbench.com/task/rcoh__angle-grinder.9c2fc88/ |
| 86 | rs/curlie | The power of curl, the ease of use of httpie. | go | 3,637 | 701 | 89.3% | https://programbench.com/task/rs__curlie.5dfcbb1/ |
| 87 | antonmedv/walk | Terminal file manager | go | 3,598 | 470 | 74.3% | https://programbench.com/task/antonmedv__walk.bf802ef/ |
| 88 | JohannesKaufmann/html-to-markdown | ⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules. | go | 3,586 | 885 | 85.5% | https://programbench.com/task/johanneskaufmann__html-to-markdown.3006818/ |
| 89 | TheZoraiz/ascii-image-converter | A cross-platform command-line tool to convert images into ascii art and print them on the console. Now supports braille art! | go | 3,284 | 465 | 64.1% | https://programbench.com/task/thezoraiz__ascii-image-converter.d05a757/ |
| 90 | hairyhenderson/gomplate | A flexible commandline tool for template rendering. Supports lots of local and remote datasources. | go | 3,135 | 2,926 | 74.7% | https://programbench.com/task/hairyhenderson__gomplate.05eb3aa/ |
| 91 | ip7z/7zip | 7-Zip | cpp | 2,967 | 1,043 | 33.9% | https://programbench.com/task/ip7z__7zip.839151e/ |
| 92 | madler/pigz | A parallel implementation of gzip for modern multi-processor, multi-core machines. | c | 2,924 | 831 | 83.2% | https://programbench.com/task/madler__pigz.fe4894f/ |
| 93 | tinycc/tinycc | Unofficial mirror of mob development branch | c | 2,843 | 1,978 | 12.8% | https://programbench.com/task/tinycc__tinycc.9b8765d/ |
| 94 | raviqqe/muffet | Fast website link checker in Go | go | 2,597 | 293 | 88.1% | https://programbench.com/task/raviqqe__muffet.a882908/ |
| 95 | segmentio/chamber | CLI for managing secrets | go | 2,588 | 1,748 | 82.0% | https://programbench.com/task/segmentio__chamber.5f93f5f/ |
| 96 | astaxie/bat | Go implement CLI, cURL-like tool for humans | go | 2,563 | 1,091 | 71.8% | https://programbench.com/task/astaxie__bat.17d1080/ |
| 97 | zk-org/zk | Plain text note-taking assistant | go | 2,542 | 1,108 | 43.1% | https://programbench.com/task/zk-org__zk.10d93d5/ |
| 98 | kisielk/errcheck | errcheck checks that you checked errors. | go | 2,480 | 341 | 80.4% | https://programbench.com/task/kisielk__errcheck.dacab89/ |
| 99 | mkj/dropbear | Dropbear SSH | c | 2,231 | 682 | 58.1% | https://programbench.com/task/mkj__dropbear.75f699b/ |
| 100 | noborus/trdsql | CLI tool that can execute SQL queries on CSV, LTSV, JSON, YAML and TBLN. Can output to various formats. | go | 2,159 | 1,312 | 66.8% | https://programbench.com/task/noborus__trdsql.d8c5ff6/ |
| 101 | sheepla/pingu | 🐧ping command but with pingu | go | 2,087 | 383 | 96.6% | https://programbench.com/task/sheepla__pingu.926d475/ |
| 102 | go-critic/go-critic | The most opinionated Go source code linter for code audit. | go | 2,041 | 493 | 41.6% | https://programbench.com/task/go-critic__go-critic.9aea378/ |
| 103 | OSGeo/PROJ | PROJ - Cartographic Projections and Coordinate Transformations Library | cpp | 1,974 | 5,319 | 73.8% | https://programbench.com/task/osgeo__proj.75d455c/ |
| 104 | noborus/ov | 🎑Feature-rich terminal-based text viewer. It is a so-called terminal pager. | go | 1,935 | 1,854 | 87.6% | https://programbench.com/task/noborus__ov.b96c2ba/ |
| 105 | samtools/samtools | Tools (written in C using htslib) for manipulating next-generation sequencing data | c | 1,886 | 1,425 | 14.2% | https://programbench.com/task/samtools__samtools.aa823b5/ |
| 106 | gabotechs/dep-tree | Tool for helping developers keep their code bases clean and decoupled. It allows visualising a code base complexity using a 3d force-directed graph of files and the dependencies between them. | go | 1,706 | 865 | 65.2% | https://programbench.com/task/gabotechs__dep-tree.60a95a2/ |
| 107 | cmatsuoka/figlet | Claudio’s FIGlet tree | c | 1,606 | 872 | 77.5% | https://programbench.com/task/cmatsuoka__figlet.202a0a8/ |
| 108 | lh3/seqtk | Toolkit for processing sequences in FASTA/Q formats | c | 1,537 | 429 | 67.4% | https://programbench.com/task/lh3__seqtk.94e7070/ |
| 109 | tukaani-project/xz | XZ Utils | c | 1,522 | 1,410 | 36.0% | https://programbench.com/task/tukaani-project__xz.1007bf0/ |
| 110 | skeema/skeema | Declarative pure-SQL schema management for MySQL and MariaDB | go | 1,361 | 1,708 | 76.5% | https://programbench.com/task/skeema__skeema.6a76243/ |
| 111 | mfridman/tparse | CLI tool for summarizing go test output. Pipe friendly. CI/CD friendly. | go | 1,246 | 425 | 77.6% | https://programbench.com/task/mfridman__tparse.2416b4b/ |
| 112 | lfos/calcurse | A text-based calendar and scheduling application | c | 1,243 | 666 | 53.8% | https://programbench.com/task/lfos__calcurse.49180d5/ |
| 113 | hooklift/gowsdl | WSDL2Go code generation as well as its SOAP proxy | go | 1,219 | 391 | 86.4% | https://programbench.com/task/hooklift__gowsdl.2a06cec/ |
| 114 | guumaster/hostctl | Your dev tool to manage /etc/hosts like a pro! | go | 1,216 | 1,051 | 82.8% | https://programbench.com/task/guumaster__hostctl.d6d9699/ |
| 115 | rs/jplot | iTerm2 expvar/JSON monitoring tool | go | 1,178 | 583 | 89.0% | https://programbench.com/task/rs__jplot.2a54bcc/ |
| 116 | naggie/dstask | Git powered terminal-based todo/note manager – markdown note page per task. Single binary! | go | 1,157 | 1,278 | 58.8% | https://programbench.com/task/naggie__dstask.ff57396/ |
| 117 | sigoden/argc | A Bash CLI framework, also a Bash command runner. | rs | 1,135 | 995 | 44.1% | https://programbench.com/task/sigoden__argc.04a08f1/ |
| 118 | sibprogrammer/xq | Command-line XML and HTML beautifier and content extractor | go | 1,109 | 792 | 75.9% | https://programbench.com/task/sibprogrammer__xq.b89f681/ |
| 119 | xorg62/tty-clock | Clock using lib ncurses | c | 1,105 | 281 | 84.0% | https://programbench.com/task/xorg62__tty-clock.f2f847c/ |
| 120 | unhappychoice/gittype | A CLI code-typing game that turns your source code into typing challenges | rs | 1,075 | 741 | 91.3% | https://programbench.com/task/unhappychoice__gittype.34b72d0/ |
| 121 | eudoxia0/hashcards | A plain text-based spaced repetition system. | rs | 1,071 | 1,151 | 56.3% | https://programbench.com/task/eudoxia0__hashcards.48aa136/ |
| 122 | rvben/rumdl | Fast Markdown linter and formatter written in Rust | rs | 1,051 | 3,322 | 40.7% | https://programbench.com/task/rvben__rumdl.2d75c4d/ |
| 123 | sclevine/yj | CLI - Convert between YAML, TOML, JSON, and HCL. Preserves map order. | go | 1,041 | 767 | 74.4% | https://programbench.com/task/sclevine__yj.8016400/ |
| 124 | arq5x/bedtools2 | bedtools - the swiss army knife for genome arithmetic | c | 1,029 | 1,053 | 38.9% | https://programbench.com/task/arq5x__bedtools2.dd57059/ |
| 125 | cslarsen/jp2a | Converts jpg images to ASCII | c | 1,021 | 631 | 56.1% | https://programbench.com/task/cslarsen__jp2a.61d205f/ |
| 126 | blacknon/hwatch | A modern alternative to the watch command, records the differences in execution results and can check this differences at after. | rs | 1,016 | 1,016 | 81.1% | https://programbench.com/task/blacknon__hwatch.edfcb62/ |
| 127 | eliukblau/pixterm | Draw images in your ANSI terminal with true color | go | 1,014 | 430 | 74.9% | https://programbench.com/task/eliukblau__pixterm.1a93fd5/ |
| 128 | Canop/rhit | A nginx log explorer | rs | 1,006 | 817 | 53.2% | https://programbench.com/task/canop__rhit.ae90bcb/ |
| 129 | stathissideris/ditaa | ditaa is a small command-line utility that can convert diagrams drawn using ascii art (‘drawings’ that contain characters that resemble lines like | / - ), into proper bitmap graphics. | java | 1,005 | 609 | 20.4% | https://programbench.com/task/stathissideris__ditaa.f2286c4/ |
| 130 | rbakbashev/elfcat | ELF visualizer. Generates HTML files from ELF binaries. | rs | 990 | 564 | 98.2% | https://programbench.com/task/rbakbashev__elfcat.52f8cc7/ |
| 131 | nuta/nsh | A command-line shell like fish, but POSIX compatible. | rs | 966 | 1,963 | 83.7% | https://programbench.com/task/nuta__nsh.bdd0702/ |
| 132 | dalance/amber | A code search / replace tool | rs | 941 | 567 | 71.1% | https://programbench.com/task/dalance__amber.69a0f52/ |
| 133 | pls-rs/pls | pls is a prettier and powerful ls(1) for the pros. | rs | 932 | 332 | 62.3% | https://programbench.com/task/pls-rs__pls.4e1ae50/ |
| 134 | Esubaalew/run | Universal multi-language runner and smart REPL written in Rust. | rs | 919 | 1,212 | 85.2% | https://programbench.com/task/esubaalew__run.0fb9dec/ |
| 135 | chirlu/sox | SoX, Swiss Army knife of sound processing | c | 913 | 1,202 | 37.9% | https://programbench.com/task/chirlu__sox.42b3557/ |
| 136 | clog-tool/clog-cli | Generate beautiful changelogs from your Git commit history | rs | 912 | 575 | 93.0% | https://programbench.com/task/clog-tool__clog-cli.7066cba/ |
| 137 | tarka/xcp | An extended cp |
rs | 911 | 1,184 | 92.6% | https://programbench.com/task/tarka__xcp.5e5b448/ |
| 138 | oppiliappan/eva | a calculator REPL, similar to bc(1) | rs | 907 | 913 | 88.7% | https://programbench.com/task/oppiliappan__eva.41ae245/ |
| 139 | git-bahn/git-graph | Command line tool to show clear git graphs arranged for your branching model | rs | 904 | 568 | 79.6% | https://programbench.com/task/git-bahn__git-graph.87b4473/ |
| 140 | gromacs/gromacs | Public/backup repository of the GROMACS molecular simulation toolkit. Please do not mine the metadata blindly; we use https://gitlab.com/gromacs/gromacs for code review and issue tracking. | cpp | 901 | 1,245 | 9.3% | https://programbench.com/task/gromacs__gromacs.665ea4c/ |
| 141 | sirwart/ripsecrets | A command-line tool to prevent committing secret keys into your source code | rs | 901 | 611 | 72.8% | https://programbench.com/task/sirwart__ripsecrets.34c9e03/ |
| 142 | Drew-Alleman/DataSurgeon | Quickly Extracts IP’s, Email Addresses, Hashes, Files, Credit Cards, Social Security Numbers and a lot More From Text | rs | 890 | 502 | 74.3% | https://programbench.com/task/drew-alleman__datasurgeon.d257cee/ |
| 143 | alexpovel/srgn | A grep-like tool which understands source code syntax and allows for manipulation in addition to search | rs | 889 | 1,852 | 69.5% | https://programbench.com/task/alexpovel__srgn.89f943b/ |
| 144 | kyoheiu/felix | tui file manager with vim-like key mapping | rs | 888 | 502 | 49.2% | https://programbench.com/task/kyoheiu__felix.95df390/ |
| 145 | oppiliappan/statix | lints and suggestions for the nix programming language | rs | 882 | 815 | 42.8% | https://programbench.com/task/oppiliappan__statix.e9df54c/ |
| 146 | nachoparker/dutree | a tool to analyze file system usage written in Rust | rs | 871 | 641 | 89.5% | https://programbench.com/task/nachoparker__dutree.44e877d/ |
| 147 | simeg/eureka | 💡 CLI tool to input and store your ideas without leaving the terminal | rs | 867 | 344 | 78.8% | https://programbench.com/task/simeg__eureka.df3796c/ |
| 148 | kyoh86/richgo | Enrich go test outputs with text decorations. |
go | 863 | 546 | 85.0% | https://programbench.com/task/kyoh86__richgo.313114f/ |
| 149 | rochacbruno/marmite | Markdown makes sites - A Static Site Generator for Blogs | rs | 837 | 668 | 45.4% | https://programbench.com/task/rochacbruno__marmite.7d4bc2d/ |
| 150 | rust-embedded/svd2rust | Generate Rust register maps (structs) from SVD files |
rs | 835 | 920 | 72.9% | https://programbench.com/task/rust-embedded__svd2rust.1760b5e/ |
| 151 | konradsz/igrep | Interactive Grep | rs | 827 | 385 | 73.5% | https://programbench.com/task/konradsz__igrep.aa75630/ |
| 152 | nikolassv/bartib | A simple timetracker for the command line. It saves a log of all tracked activities as a plaintext file and allows you to create flexible reports. | rs | 827 | 722 | 87.3% | https://programbench.com/task/nikolassv__bartib.6b9b5ce/ |
| 153 | yassinebridi/serpl | A simple terminal UI for search and replace, ala VS Code. | rs | 824 | 446 | 61.0% | https://programbench.com/task/yassinebridi__serpl.c48a9d7/ |
| 154 | riquito/tuc | When cut doesn’t cut it | rs | 820 | 1,196 | 92.7% | https://programbench.com/task/riquito__tuc.16fb471/ |
| 155 | ecumene/rust-sloth | A 3D software rasterizer… for the terminal! | rs | 818 | 380 | 52.6% | https://programbench.com/task/ecumene__rust-sloth.051c559/ |
| 156 | crowdagger/crowbook | Converts books written in Markdown to HTML, LaTeX/PDF and EPUB | rs | 813 | 807 | 60.3% | https://programbench.com/task/crowdagger__crowbook.ea214d7/ |
| 157 | WGUNDERWOOD/tex-fmt | An extremely fast LaTeX formatter written in Rust | rs | 789 | 455 | 80.7% | https://programbench.com/task/wgunderwood__tex-fmt.3f1aef6/ |
| 158 | Stranger6667/jsonschema | A high-performance JSON Schema validator for Rust | rs | 770 | 2,933 | 51.7% | https://programbench.com/task/stranger6667__jsonschema.d52e881/ |
| 159 | rhysd/kiro-editor | A small terminal UTF-8 text editor written in Rust 📝🦀 | rs | 761 | 595 | 93.3% | https://programbench.com/task/rhysd__kiro-editor.4157485/ |
| 160 | astro/deadnix | Scan Nix files for dead code | rs | 745 | 602 | 85.5% | https://programbench.com/task/astro__deadnix.d590041/ |
| 161 | sstadick/hck | A sharp cut(1) clone. | rs | 738 | 855 | 95.7% | https://programbench.com/task/sstadick__hck.b66c751/ |
| 162 | trasta298/keifu | Git genealogy, untangled. A TUI for navigating commit graphs with color and clarity. | rs | 729 | 262 | 67.2% | https://programbench.com/task/trasta298__keifu.3331426/ |
| 163 | AmmarAbouZor/tui-journal | Your journal app if you live in a terminal | rs | 722 | 1,402 | 70.8% | https://programbench.com/task/ammarabouzor__tui-journal.2b4540d/ |
| 164 | incu6us/goimports-reviser | Right imports sorting & code formatting tool (goimports alternative) | go | 715 | 513 | 86.4% | https://programbench.com/task/incu6us__goimports-reviser.81bd549/ |
| 165 | yaa110/nomino | Batch rename utility for developers | rs | 710 | 313 | 79.9% | https://programbench.com/task/yaa110__nomino.f892499/ |
| 166 | wfxr/csview | 📠 Pretty and fast csv viewer for cli with cjk/emoji support. | rs | 694 | 335 | 96.1% | https://programbench.com/task/wfxr__csview.8ac4de0/ |
| 167 | chmln/handlr | A better xdg-utils | rs | 693 | 722 | 90.7% | https://programbench.com/task/chmln__handlr.90e78ba/ |
| 168 | Miserlou/Loop | UNIX’s missing loop command |
rs | 692 | 710 | 94.6% | https://programbench.com/task/miserlou__loop.209927c/ |
| 169 | KSXGitHub/parallel-disk-usage | Highly parallelized, blazing fast directory tree analyzer | rs | 689 | 531 | 86.1% | https://programbench.com/task/ksxgithub__parallel-disk-usage.96978ed/ |
| 170 | hush-shell/hush | Hush is a unix shell based on the Lua programming language | rs | 688 | 1,201 | 83.3% | https://programbench.com/task/hush-shell__hush.560c33a/ |
| 171 | zevv/duc | Dude, where are my bytes: Duc, a library and suite of tools for inspecting disk usage | c | 682 | 874 | 83.4% | https://programbench.com/task/zevv__duc.a58fa4e/ |
| 172 | altdesktop/i3-style | 🎨 Make your i3 config a little more stylish. | rs | 678 | 539 | 80.0% | https://programbench.com/task/altdesktop__i3-style.f93821b/ |
| 173 | wintermute-cell/ngrrram | A TUI tool to help you type faster and learn new layouts. Includes a free cat. | rs | 674 | 303 | 84.5% | https://programbench.com/task/wintermute-cell__ngrrram.8ea13c3/ |
| 174 | psampaz/go-mod-outdated | Find outdated dependencies of your Go projects. go-mod-outdated provides a table view of the go list -u -m -json all command which lists all dependencies of a Go project and their available minor and patch updates. It also provides a way to filter indirect dependencies and dependencies without updates. | go | 669 | 285 | 98.2% | https://programbench.com/task/psampaz__go-mod-outdated.bb79367/ |
| 175 | wfxr/code-minimap | 🛰 A high performance code minimap render. | rs | 660 | 313 | 88.8% | https://programbench.com/task/wfxr__code-minimap.0ddeea5/ |
| 176 | kaushiksrini/parqeye | Peek inside Parquet files right from your terminal | rs | 654 | 479 | 58.9% | https://programbench.com/task/kaushiksrini__parqeye.8072121/ |
| 177 | stacked-git/stgit | Stacked Git | rs | 652 | 1,488 | 20.0% | https://programbench.com/task/stacked-git__stgit.430027d/ |
| 178 | Isona/dirble | Fast directory scanning and scraping tool | rs | 632 | 718 | 66.7% | https://programbench.com/task/isona__dirble.e2dea9f/ |
| 179 | YS-L/flamelens | Flamegraph viewer in the terminal | rs | 622 | 224 | 59.4% | https://programbench.com/task/ys-l__flamelens.0b4dc33/ |
| 180 | mookid/diffr | Yet another diff highlighting tool | rs | 612 | 606 | 84.7% | https://programbench.com/task/mookid__diffr.2152742/ |
| 181 | shashwatah/jot | ⚡Rapid note management for the terminal. | rs | 609 | 752 | 84.6% | https://programbench.com/task/shashwatah__jot.a92aad8/ |
| 182 | Epistates/treemd | A (TUI/CLI) markdown navigator with tree-based structural navigation. | rs | 603 | 1,569 | 55.1% | https://programbench.com/task/epistates__treemd.825c6dd/ |
| 183 | pier-cli/pier | A CLI to organize and run short Unix shell scripts | rs | 596 | 692 | 83.7% | https://programbench.com/task/pier-cli__pier.5e1bde9/ |
| 184 | jrnxf/thokr | ✨ sleek typing tui with visualized results and historical logging | rs | 595 | 445 | 82.2% | https://programbench.com/task/jrnxf__thokr.09375ef/ |
| 185 | ismaelgv/rnr | A command-line tool to batch rename files and directories | rs | 581 | 683 | 82.1% | https://programbench.com/task/ismaelgv__rnr.fc0733b/ |
| 186 | sitkevij/hex | 🔮 Futuristic take on hexdump, made in Rust. | rs | 563 | 823 | 91.7% | https://programbench.com/task/sitkevij__hex.61ae69b/ |
| 187 | brocode/fblog | Small command-line JSON Log viewer | rs | 561 | 978 | 86.0% | https://programbench.com/task/brocode__fblog.3b54330/ |
| 188 | codesnap-rs/codesnap | 🦀️📸 Pure Rust tool to generate beautiful code snapshots, provide CLI and Library | rs | 557 | 730 | 59.2% | https://programbench.com/task/codesnap-rs__codesnap.f81e4f3/ |
| 189 | foriequal0/git-trim | Automatically trims your branches whose tracking remote refs are merged or stray | rs | 548 | 509 | 64.6% | https://programbench.com/task/foriequal0__git-trim.07c2f50/ |
| 190 | axodotdev/oranda | 🎁 generate beautiful landing pages for your developer tools | rs | 542 | 767 | 53.6% | https://programbench.com/task/axodotdev__oranda.27d60c7/ |
| 191 | elkowar/pipr | A tool to interactively write shell pipelines. | rs | 541 | 525 | 57.1% | https://programbench.com/task/elkowar__pipr.fae0b17/ |
| 192 | paradigmxyz/solar | Blazingly fast, modular and contributor friendly Solidity compiler, written in Rust | rs | 539 | 1,978 | 43.3% | https://programbench.com/task/paradigmxyz__solar.5190d0e/ |
| 193 | Lymphatus/caesium-clt | Caesium Command Line Tools - Lossy/lossless image compression tool | rs | 537 | 575 | 92.3% | https://programbench.com/task/lymphatus__caesium-clt.a529b2e/ |
| 194 | agourlay/zip-password-finder | Find the password of protected ZIP files. | rs | 534 | 680 | 97.9% | https://programbench.com/task/agourlay__zip-password-finder.704700d/ |
| 195 | rust-ethereum/ethabi | Encode and decode smart contract invocations | rs | 525 | 997 | 90.9% | https://programbench.com/task/rust-ethereum__ethabi.b1710ad/ |
| 196 | ArthurSonzogni/json-tui | A JSON terminal UI made in C++ | cpp | 438 | 755 | 71.0% | https://programbench.com/task/arthursonzogni__json-tui.17a22b6/ |
| 197 | tomarrell/wrapcheck | A Go linter to check that errors from external packages are wrapped | go | 374 | 480 | 80.8% | https://programbench.com/task/tomarrell__wrapcheck.c058da1/ |
| 198 | NikolaDucak/caps-log | A small TUI journaling tool. 📖 | cpp | 370 | 551 | 61.7% | https://programbench.com/task/nikoladucak__caps-log.2cf2d1e/ |
| 199 | mibk/dupl | a tool for code clone detection | go | 367 | 373 | 85.0% | https://programbench.com/task/mibk__dupl.1bf052b/ |
| 200 | HaliteChallenge/Halite | @twosigma’s first artificial intelligence programming challenge | cpp | 202 | 275 | 80.4% | https://programbench.com/task/halitechallenge__halite.822cfb6/ |
Cómo leer estos datos
En el leaderboard principal de ProgramBench, los 9 modelos tienen Resolved en 0%. Bajo una configuración unificada con un agent ligero, los modelos actuales todavía no pueden reconstruir software completo de forma fiable a partir de comportamiento de caja negra y documentación.
Almost resolved sí permite distinguir capacidades. Claude Opus 4.7 alcanza 3.0%, Claude Opus 4.6 llega a 2.5%, Claude Sonnet 4.6 llega a 1.0% y el resto de modelos queda en 0.0%. Este indicador sirve mejor para observar la capacidad de acercarse a una solución completa que mirar solo la finalización total.
La tabla de instancias también es importante. Enumera el lenguaje, las estrellas, el número de pruebas y el mejor resultado actual de cada proyecto open source, mostrando que ProgramBench cubre compresión, búsqueda, bases de datos, compiladores, herramientas de línea de comandos, procesamiento multimedia y otros tipos de software. Para AI Coding, esto se parece mucho más a presión de ingeniería real que un benchmark simple de algoritmos.