先看懂这些参数
Q4_0 是什么
Q4_0 是一种 4-bit 量化格式。它的意义不是“模型更强”,而是“模型更小、更省显存、更容易塞进更多设备里”。这些榜单大多统一用 Llama 2 7B, Q4_0,核心目的是减少变量,让不同 GPU 的成绩更容易横向比较。
pp512 是什么
pp512 一般可以理解为 prompt processing 512 tokens,也就是处理 512 个输入 token 时的吞吐。
pp= prompt processing512= 输入长度是 512 tokent/s= tokens per second
它更像“吃提示词的速度”,通常能并行得更充分,所以数字往往很高。
tg128 是什么
tg128 一般可以理解为 text generation 128 tokens,也就是连续生成 128 个 token 时的速度。
tg= text generation128= 连续生成 128 tokent/s= tokens per second
它更接近我们平时感受到的“模型回答快不快”。因为生成阶段是逐 token 递推,所以通常明显低于 pp512。
FA 是什么
FA 是 Flash Attention。简单理解就是注意力计算的一种优化开关。
with FA表示启用了 Flash Attentionno FA表示关闭 Flash Attention
在不少卡上,FA 对 pp512 的提升比对 tg128 更明显;但不同后端、不同驱动和不同架构之间,提升幅度并不一致,个别设备甚至会出现 PP 升、TG 变化很小,或者 PP 反而下降的情况。
t/s 怎么看
t/s 就是 tokens per second。它不是帧率,也不是 FLOPS,而是模型吞吐表现的直接结果。
读榜单时最重要的一点是:先确认你在比的是不是同一种测试。
- 不要把
pp512和tg128直接混着比 - 不要把
no FA和with FA混着比 - 不要把 CUDA、ROCm、Vulkan 的结果当成完全等价的同一条曲线
先说结论
从这几条讨论串当前可见的数据看,大致可以先记住这几个结论:
CUDA仍然是目前llama.cppGPU 跑分里最强、样本也最密集的一条线,特别是高端 Nvidia 卡在pp512上优势很大。ROCm在高端 AMD 卡和 Instinct 卡上已经能给出非常像样的成绩,MI300X、7900 XTX、W7900这些条目都不弱。Vulkan的优点不是“绝对最快”,而是覆盖面最广,Nvidia、AMD、Intel、Apple Asahi / MoltenVK,甚至很多老卡和核显都能找到条目。tg128往往更接近日常体感,pp512更适合看吞吐能力。很多“榜一”卡,在两项里领先幅度并不完全一样。
CUDA 完整榜单
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14073.41 ± 115.16 | 290.02 ± 1.10 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 14854.63 ± 22.73 | 274.20 ± 0.14 | 79c1160 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 9918.34 ± 176.97 | 267.81 ± 1.54 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 4849.53 ± 8.94 | 190.88 ± 0.33 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 10293.86 ± 134.72 | 189.33 ± 0.19 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 11992.70 ± 107.99 | 186.21 ± 0.13 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 8297.36 ± 9.50 | 181.99 ± 0.42 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 6952.38 ± 13.73 | 176.85 ± 0.07 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 9229.23 ± 101.78 | 176.07 ± 0.26 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6567.49 ± 20.30 | 171.19 ± 3.98 | 9c35706 | @slaren |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5174.69 ± 21.83 | 158.16 ± 0.21 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 8870.49 ± 378.76 | 152.01 ± 0.28 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 8125.15 ± 41.05 | 148.33 ± 0.20 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 8031.64 ± 26.49 | 142.49 ± 0.16 | 20638e4 | @Ristovski |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5013.86 ± 24.80 | 139.65 ± 0.99 | 9c35706 | @slaren |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 4913.93 ± 6.79 | 138.73 ± 2.75 | 4795c91 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 6924.53 ± 13.87 | 132.26 ± 0.16 | 9c35706 | @Ristovski |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 4992.83 ± 113.52 | 131.66 ± 0.20 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4028.16 ± 19.14 | 130.07 ± 2.74 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 3042.64 ± 40.71 | 129.08 ± 0.05 | 51f5a45 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5184.75 ± 18.70 | 127.54 ± 0.46 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 4609.01 ± 10.67 | 124.11 ± 0.17 | 3470a5c | @Hedede |
| A30 | 24 GB / HBM2e / 3072 bit | 2767.10 ± 1.88 | 124.81 ± 0.16 | 583cb83 | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2617.46 ± 2.10 | 108.79 ± 0.05 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 2890.66 ± 2.42 | 107.51 ± 0.21 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 2751.18 ± 19.43 | 102.77 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 2709.95 ± 3.35 | 102.68 ± 0.03 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 2827.20 ± 66.43 | 97.32 ± 2.80 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 3737.25 ± 6.79 | 90.94 ± 0.02 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2088.34 ± 1.94 | 88.06 ± 0.28 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2684.06 ± 15.28 | 83.77 ± 0.37 | 65349f2 | @TinyServal |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1154.96 ± 1.46 | 76.08 ± 0.08 | c4510dc | @Hedede |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2137.50 ± 10.12 | 75.57 ± 0.07 | baa9255 | @QuantiusBenignus |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1536.89 ± 0.90 | 65.62 ± 0.62 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3394.63 ± 7.44 | 63.86 ± 0.01 | 89d1029 | @mike-llamacpp |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1084.41 ± 3.01 | 62.49 ± 0.06 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 2779.77 ± 9.91 | 61.83 ± 0.04 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1420.24 ± 1.95 | 60.04 ± 0.01 | 5c0eb5e | @ggerganov |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 760.80 ± 2.92 | 58.35 ± 0.00 | b8372ee | @Hedede |
| DGX Spark | 128 GB / LPDDR5x | 3062.31 ± 11.02 | 57.21 ± 0.06 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1007.42 ± 1.23 | 54.74 ± 0.07 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 1956.22 ± 7.74 | 50.62 ± 0.04 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1219.06 ± 4.18 | 46.38 ± 0.73 | d32e03f | @pt13762104 |
| RTX 4050 Laptop | 6 GB / GDDR6 / 96 bit | 1725.85 + 17.85 | 43.72 + 0.41 | d79d8f3 | @TimCabbage |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 148.91 ± 0.01 | 41.35 ± 0.02 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 282.65 ± 0.15 | 38.04 ± 0.02 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 714.44 ± 2.04 | 37.82 ± 0.02 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 991.31 ± 1.15 | 33.58 ± 0.14 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 514.53 ± 3.06 | 33.29 ± 0.00 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 406.94 ± 0.25 | 30.40 ± 0.02 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 416.85 ± 1.75 | 27.79 ± 0.02 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 79.44 ± 0.01 | 27.82 ± 0.18 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 309.30 ± 0.05 | 23.63 ± 0.00 | baa9255 | @TinyServal |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 183.40 ± 0.11 | 13.99 ± 0.13 | 1e74897 | @aleksyx |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.14 ± 0.55 | 13.80 ± 0.02 | 32732f2 | @pebaryan |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14970.15 ± 381.06 | 300.40 ± 0.28 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 16618.98 ± 20.66 | 281.11 ± 0.41 | 5143fa8 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 11263.29 ± 98.34 | 280.74 ± 1.17 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 5285.96 ± 6.58 | 200.90 ± 0.12 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 12506.97 ± 11.51 | 191.57 ± 0.03 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 14770.63 ± 102.93 | 188.96 ± 0.05 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 9487.70 ± 21.89 | 184.68 ± 0.05 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 8419.56 ± 35.50 | 182.43 ± 0.09 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 10576.85 ± 530.21 | 179.47 ± 0.32 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6924.01 ± 10.76 | 172.26 ± 1.31 | 9c35706 | @slaren |
| RTX PRO 4500 Blackwell | 32 GB / GDDR7 / 256 bit | 7251.66 ± 92.40 | 168.90 ± 0.20 | becc481 | @Hedede |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5560.06 ± 16.28 | 161.89 ± 0.18 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 10097.64 ± 671.22 | 153.76 ± 0.12 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 9439.01 ± 56.75 | 147.48 ± 1.41 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 9205.93 ± 22.31 | 143.47 ± 0.02 | 20638e4 | @Ristovski |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 5662.39 ± 13.87 | 144.87 ± 0.18 | 4795c91 | @Hedede |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5569.56 ± 14.04 | 139.95 ± 0.95 | 9c35706 | @slaren |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 5674.44 ± 139.53 | 136.38 ± 0.13 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4552.15 ± 9.68 | 135.83 ± 0.11 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 2973.78 ± 3.62 | 134.76 ± 0.02 | 51f5a45 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 7612.32 ± 37.35 | 132.85 ± 0.31 | 9c35706 | @Ristovski |
| A30 | 24 GB / HBM2e / 3072 bit | 3068.72 ± 0.63 | 131.93 ± 0.18 | 583cb83 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5783.44 ± 36.95 | 128.21 ± 2.52 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 5256.38 ± 19.39 | 126.24 ± 0.06 | 3470a5c | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2481.25 ± 1.31 | 112.17 ± 0.01 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 3107.61 ± 4.34 | 109.17 ± 0.07 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 3053.96 ± 1.37 | 104.38 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 3052.35 ± 5.64 | 103.63 ± 0.02 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 3453.10 ± 49.19 | 103.00 ± 0.25 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 4195.53 ± 1.98 | 93.46 ± 0.01 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2293.29 ± 5.91 | 87.71 ± 0.29 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2807.83 ± 52.44 | 85.17 ± 0.66 | 65349f2 | @TinyServal |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2407.67 ± 3.73 | 76.92 ± 0.03 | baa9255 | @QuantiusBenignus |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1218.12 ± 1.82 | 73.84 ± 0.04 | c4510dc | @Hedede |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1662.80 ± 2.04 | 67.62 ± 0.67 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3803.45 ± 70.80 | 64.03 ± 0.53 | 89d1029 | @mike-llamacpp |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 787.36 ± 3.27 | 61.99 ± 0.00 | b8372ee | @Hedede |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1138.14 ± 2.02 | 61.38 ± 0.03 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 3171.86 ± 4.34 | 61.37 ± 0.01 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1563.77 ± 0.51 | 61.13 ± 0.05 | 5c0eb5e | @ggerganov |
| DGX Spark | 128 GB / LPDDR5x | 3661.37 ± 38.66 | 56.74 ± 0.03 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1079.66 ± 0.18 | 53.73 ± 0.05 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 2250.14 ± 5.91 | 50.71 ± 0.01 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1309.73 ± 1.02 | 44.03 ± 0.57 | d32e03f | @pt13762104 |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 154.45 ± 0.52 | 41.43 ± 0.01 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 290.17 ± 0.11 | 39.98 ± 0.01 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 790.52 ± 2.39 | 37.87 ± 0.00 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 1171.96 ± 4.70 | 35.88 ± 0.18 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 529.53 ± 2.12 | 33.12 ± 0.03 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 438.49 ± 0.38 | 30.64 ± 0.06 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 446.19 ± 0.81 | 28.18 ± 0.01 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 27.46 ± 0.23 | 27.46 ± 0.23 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 311.55 ± 0.19 | 23.76 ± 0.01 | baa9255 | @TinyServal |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.36 ± 0.60 | 14.27 ± 0.32 | 32732f2 | @pebaryan |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 173.82 ± 0.02 | 13.65 ± 0.14 | 1e74897 | @aleksyx |
Apple Silicon 参考口径
#4167 这条讨论和后三条最大的区别,是它更早建立了统一口径,除了 Q4_0,还会顺带放 F16 和 Q8_0。它对理解 PP / TG / t/s 很有帮助。
讨论里直接给出的说明是:
PP表示prompt processingTG表示text-generationt/s表示tokens per second
文中可见的一个时间对比样例,是 M2 Ultra 在同一台机器上随着版本和 FA 演进后的成绩:
| 时间 | 设备 | 版本/说明 | 带宽 GB/s | GPU 核心 | F16 PP | F16 TG | Q8_0 PP | Q8_0 TG | Q4_0 PP | Q4_0 TG |
|---|---|---|---|---|---|---|---|---|---|---|
| 2023-11-21 | M2 Ultra | 8e672ef | 800 | 76 | 1401.85 | 41.02 | 1248.59 | 66.64 | 1238.48 | 94.27 |
| 2024-11-12 | M2 Ultra | 86ed72d + FA | 800 | 76 | 1525.95 | 43.15 | 1368.18 | 73.11 | 1391.78 | 108.80 |
| 2025-08-02 | M2 Ultra | 5c0eb5e + FA | 800 | 76 | 1561.35 | 43.24 | 1386.97 | 73.35 | 1412.42 | 109.41 |
讨论正文前部还给了几台 Apple Silicon 设备的统一样例:
| 设备 | Q4_0 PP | Q4_0 TG | Q8_0 PP | Q8_0 TG | F16 PP | F16 TG |
|---|---|---|---|---|---|---|
| M1 Pro 16 GPU | 266.25 | 36.41 | 270.37 | 22.34 | 302.14 | 12.75 |
| M2 Ultra 76 GPU | 1238.48 | 94.27 | 1248.59 | 66.64 | 1401.85 | 41.02 |
| M3 Max 40 GPU | 690.99 | 65.85 | 749.37 | 43.00 | 794.26 | 25.27 |
Apple 这条线这里不展开全文搬运,后面重点看你指定的三类独显后端榜单。
ROCm / HIP 完整榜单
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11476.40 ± 72.79 | 232.92 ± 0.53 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3552.27 ± 101.96 | 167.11 ± 0.50 | 2f0c2db | @Diablo-D3 |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2486.22 ± 9.58 | 124.51 ± 0.04 | 8160b38 | @65a |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3213.17 ± 80.47 | 121.18 ± 0.06 | 8160b38 | @65a |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3098.38 ± 24.02 | 116.15 ± 0.06 | 1e15bfd | @AdamNiederer |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2381.77 ± 3.68 | 114.48 ± 0.60 | d0660f2 | @andj1210 |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2732.83 ± 1.98 | 110.48 ± 0.14 | 9c35706 | @firefox42 |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 5055.19 ± 109.58 | 101.27 ± 0.27 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2151.81 + 17.94 | 100.94 + 0.10 | 00131d6 | @olegshulyakov |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1057.24 ± 0.53 | 98.95 ± 0.25 | 97d5117 | @wtarreau |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1456.98 ± 12.39 | 96.07 ± 0.10 | 6fa3b55 | @MihaiBojescu |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4443.54 ± 339.25 | 93.84 ± 0.26 | bd4ef13 | @gogich77 |
| Instinct MI60 | 32 GB / HBM2 / 4096 bit | 1289.11 ± 0.62 | 91.46 ± 0.13 | 504af20 | @Said-Akbar |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1889.84 ± 31.21 | 88.49 ± 0.00 | a972fae | @notgood |
| Pro VII | 16 GB / HBM2 / 4096 bit | 1064.99 ± 1.18 | 87.45 ± 0.04 | 2739a71 | @8XXD8 |
| RX 6800 XT | 16 GB / GDDR6 / 256 bit | 1447.07 ± 1.36 | 83.92 ± 0.03 | 79c1160 | @MrLavender |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1803.65 ± 2.54 | 74.66 ± 0.01 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1419.67 ± 3.64 | 67.58 ± 0.24 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 354.17 ± 0.18 | 67.55 ± 0.04 | c05e8c9 | @daniandtheweb |
| Instinct MI25 | 16 GB / HBM2 / 2048 bit | 409.83 ± 0.23 | 63.94 ± 0.06 | 2739a71 | @8XXD8 |
| AI Max+ 395 | 128 GB / LPDDR5 | 911.36 ± 1.79 | 50.01 ± 0.07 | e60f241 | @firefox42 |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1099.64 ± 2.05 | 48.58 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 240.68 ± 0.09 | 48.46 ± 0.09 | ec428b0 | @davispuh |
| Radeon 8060S | System Shared / DDR5 | 351.36 ± 0.67 | 47.97 ± 0.33 | 1d0125b | @hspak |
| Radeon 880M | System Shared / DDR5 | 163.25 ± 13.86 | 12.97 ± 1.63 | c55d53a | @Hedede |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11945.97 ± 54.29 | 218.53 ± 0.09 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3874.25 ± 11.92 | 170.12 ± 0.56 | 2f0c2db | @Diablo-D3 |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3472.86 ± 52.86 | 127.43 ± 0.12 | 8160b38 | @65a |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2571.82 ± 2.89 | 130.18 ± 0.06 | 8160b38 | @65a |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2452.68 ± 1.33 | 115.32 ± 0.52 | d0660f2 | @andj1210 |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3261.75 ± 9.09 | 112.30 ± 0.06 | 1e15bfd | @AdamNiederer |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1129.43 ± 0.15 | 105.82 ± 0.07 | 97d5117 | @wtarreau |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2755.00 ± 3.68 | 104.71 ± 0.10 | 9c35706 | @firefox42 |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4773.07 ± 49.30 | 97.98 ± 0.13 | bd4ef13 | @gogich77 |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1598.79 ± 11.48 | 97.53 ± 0.06 | 6fa3b55 | @MihaiBojescu |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 4903.51 ± 96.36 | 97.28 ± 0.13 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2304.63 + 2.85 | 95.99 + 0.21 | 00131d6 | @olegshulyakov |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1948.31 ± 13.51 | 85.04 ± 0.02 | a972fae | @notgood |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1256.86 ± 0.55 | 70.83 ± 0.02 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1479.27 ± 0.71 | 65.42 ± 0.19 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 314.17 ± 0.29 | 62.02 ± 0.05 | c05e8c9 | @daniandtheweb |
| AI Max+ 395 | 128 GB / LPDDR5 | 1003.53 ± 2.91 | 49.87 ± 0.02 | e60f241 | @firefox42 |
| Radeon 8060S | System Shared / DDR5 | 366.08 ± 1.44 | 48.97 ± 0.15 | 1d0125b | @hspak |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1199.16 ± 1.07 | 47.65 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 153.17 ± 0.72 | 42.46 ± 0.40 | ec428b0 | @davispuh |
| Radeon 880M | System Shared / DDR5 | 213.31 ± 14.05 | 16.16 ± 1.41 | c55d53a | @Hedede |
Vulkan 完整榜单
Llama 2 7B, Q4_0, no FA
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 10381.64 ± 508.84 | 263.63 ± 0.91 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3531.93 ± 31.74 | 191.28 ± 0.20 | 2f0c2db | |
| Nvidia RTX 4090 | 9452.03 ± 187.70 | 187.97 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 5080 | 7444.99 ± 20.11 | 185.10 ± 0.54 | f6b533d | coopmat2 |
| Nvidia A100 | 6389.86 ± 4.83 | 160.78 ± 0.16 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4298.97 ± 10.59 | 160.13 ± 0.25 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 7101.18 ± 269.79 | 147.13 ± 5.64 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4287.11 ± 55.50 | 139.15 ± 0.05 | 7c7d6ce | coopmat2 |
| Nvidia RTX A5000 | 3641.55 ± 9.05 | 139.89 ± 0.69 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 5036.04 ± 88.16 | 137.11 ± 0.02 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6213.63 ± 27.72 | 135.63 ± 0.18 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4036.04 ± 34.58 | 130.19 ± 0.39 | 3191462 | |
| Nvidia Tesla V100 | 1391.39 ± 1.19 | 129.58 ± 0.58 | 7d77f07 | |
| Nvidia RTX 4070 Ti Super | 6099.18 ± 154.30 | 129.45 ± 0.18 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 7900 XT | 2941.58 ± 17.17 | 123.18 ± 0.40 | 71e74a3 | |
| AMD Radeon RX 9070 | 3164.10 ± 66.84 | 119.71 ± 3.40 | 21c17b5 | |
| AMD Radeon RX 7800 XT | 2017.33 ± 19.30 | 118.27 ± 0.27 | 4fdbc1e | |
| AMD Radeon RX 7900 GRE | 2336.31 ± 7.52 | 116.11 ± 0.26 | 4b2a477 | |
| Apple M3 Ultra | 1116.83 ± 0.55 | 115.54 ± 0.78 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3379.00 ± 47.92 | 112.02 ± 1.08 | b863507 | |
| Nvidia Titan V | 984.36 ± 4.13 | 108.86 ± 0.28 | e56abd2 | |
| AMD Radeon Pro VII | 1078.54 ± 0.86 | 107.82 ± 0.14 | N/A | |
| AMD Radeon RX 6900 XT | 1837.21 ± 25.44 | 104.60 ± 0.30 | a972fae | |
| Intel Arc Pro A60 | 2261.11 ± 9.53 | 104.25 ± 0.07 | 97d5117 | |
| AMD Radeon RX 6800 XT | 1752.92 ± 1.71 | 100.32 ± 0.97 | N/A | |
| AMD Radeon VII | 1059.14 ± 0.56 | 101.19 ± 0.53 | 77d6ae4 | |
| Nvidia RTX 2080 Ti | 1888.24 ± 9.20 | 97.58 ± 6.60 | N/A | |
| AMD Radeon RX 6800 | 1698.69 ± 0.80 | 95.61 ± 0.19 | 4b385bf | |
| AMD Radeon Pro W6800X Duo | 687.71 ± 4.33 | 94.82 ± 0.12 | N/A | |
| Nvidia RTX 5060 Ti | 3460.92 ± 7.16 | 93.51 ± 0.15 | 89f10ba | coopmat2 |
| Nvidia RTX 4070 | 3179.37 ± 46.16 | 92.29 ± 0.28 | 9a48399 | |
| AMD Radeon Pro W6800X | 510.80 ± 0.13 | 86.47 ± 0.46 | 13b4548 | MoltenVK |
| AMD Radeon RX 6700 XT | 1051.20 ± 0.98 | 83.88 ± 0.08 | 6d75883 | |
| AMD Radeon RX 6750 XT | 1040.58 ± 0.35 | 81.98 ± 0.03 | 228f34c | |
| AMD Radeon Pro V620 | 1595.32 ± 1.59 | 81.78 ± 0.06 | 03d4698 | |
| Nvidia RTX 3070 | 2113.02 ± 7.38 | 78.71 ± 0.13 | 1b8fb81 | |
| AMD Radeon Instinct MI60 | 369.26 ± 2.48 | 78.16 ± 1.40 | 504af20 | |
| Nvidia RTX 3060 | 1815.70 ± 5.85 | 75.94 ± 0.80 | 92c0b38 | coopmat2 |
| Apple M4 Max | 724.77 ± 20.93 | 75.02 ± 0.14 | 1ece0cb6 | |
| Nvidia Tesla T10 | 1692.70 ± 2.05 | 75.01 ± 0.21 | 7f76692 | coopmat2 |
| Nvidia RTX A4000 | 2248.14 ± 7.59 | 73.74 ± 0.08 | f5245b5 | coopmat2 |
| AMD Radeon RX 5700 XT | 529.69 ± 0.26 | 70.73 ± 0.04 | 4fdbc1e | |
| AMD Radeon RX 9060 XT | 2141.67 ± 6.87 | 70.54 ± 0.74 | ed52f36 | |
| Intel Arc B580 | 620.94 ± 15.33 | 70.14 ± 0.28 | 7f76692 | |
| AMD Radeon Pro V540 | 583.88 ± 6.56 | 69.64 ± 0.24 | 9da3dcd | |
| AMD Radeon Pro W5700 | 449.85 ± 0.46 | 68.55 ± 0.15 | 23bc779 | |
| Intel Arc Pro B60 | 522.36 ± 3.60 | 68.55 ± 0.01 | 516a4ca | |
| Nvidia GTX 1080 Ti | 540.69 ± 0.71 | 64.99 ± 0.08 | 360d653 | |
| Nvidia RTX 2070 Super | 1199.13 ± 7.70 | 64.64 ± 0.20 | b7552cf | |
| Nvidia RTX 3070 Mobile | 1689.40 ± 19.57 | 63.64 ± 0.39 | ceff6bb | coopmat2 |
| Nvidia Tesla P100 | 678.14 ± 1.40 | 63.16 ± 0.06 | eec1e33 | |
| AMD BC-250 | 370.66 ± 0.04 | 62.32 ± 0.32 | 5886f4f | |
| AMD Radeon RX 6650 XT | 1029.52 ± 1.21 | 62.14 ± 0.02 | dbb852b | |
| Nvidia RTX 4060 Mobile | 2135.66 ± 23.18 | 59.53 ± 0.03 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 488.06 ± 0.27 | 59.36 ± 0.16 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 511.67 ± 2.85 | 56.60 ± 0.07 | b43556e | |
| AMD Radeon Instinct MI25 | 439.42 ± 0.34 | 54.69 ± 0.03 | 2739a71 | |
| AMD Radeon RX 6600 XT | 574.65 ± 0.86 | 53.92 ± 0.11 | 091592d | |
| AMD Ryzen AI Max+ 395 | 1288.96 ± 6.49 | 53.59 ± 0.38 | 7f76692 | |
| AMD Radeon RX 7600 XT | 840.85 ± 3.02 | 53.02 ± 0.01 | 01d8eaa | |
| Intel Arc A770 | 1073.85 + 29.68 | 52.56 + 0.11 | a69d54f | |
| Nvidia GB10 | 2737.79 ± 19.56 | 52.28 ± 0.03 | b9da444 | coopmat2 |
| AMD FirePro S9300 x2 | 247.26 ± 0.43 | 51.86 ± 0.11 | eec1e33 | Split across two GPUs |
| AMD Radeon RX 6600 | 761.89 ± 1.76 | 50.63 ± 0.02 | b1c70e2 | |
| AMD Radeon RX Vega 56 | 439.87 ± 0.61 | 50.23 ± 0.14 | 92c0b38 | |
| Intel Arc B570 | 913.95 ± 0.90 | 49.64 ± 0.03 | 7f76692 | |
| Nvidia RTX 3060 Mobile | 1059.76 ± 3.54 | 49.03 ± 0.13 | dbb3a47 | |
| AMD Radeon RX 6800M | 861.99 ± 7.67 | 48.71 ± 0.71 | 8e6f8bc | |
| AMD Radeon RX 6600M | 605.59 ± 0.65 | 48.21 ± 0.07 | fe5b78c | |
| Intel Arc A770M | 875.92 ± 2.16 | 47.69 ± 0.16 | eeee367 | |
| Nvidia P104-100 | 311.90 ± 0.22 | 46.18 ± 0.05 | eec1e33 | |
| AMD Radeon RX Vega 64 | 356.08 ± 0.09 | 45.73 ± 0.18 | ec428b0 | |
| Nvidia RTX A2000 | 1245.19 ± 8.76 | 45.52 ± 0.54 | b1afcab | coopmat2 |
| AMD Radeon RX 7600M XT | 459.39 ± 2.34 | 45.28 ± 0.10 | b9ab0a4 | eGPU |
| AMD Radeon Pro V340 | 375.41 ± 0.24 | 45.16 ± 0.06 | 9da3dcd | Split across two GPUs |
| Nvidia GTX 1070 Ti | 297.50 ± 0.54 | 42.86 ± 1.20 | 860a9e4 | eGPU |
| Intel Arc A750 | 1075.94 ± 13.89 | 42.66 ± 0.18 | c1b1876 | |
| Nvidia RTX 4050 Mobile | 1154.28 + 15.76 | 41.89 + 0.10 | d79d8f3 | |
| Nvidia GTX 1070 | 321.57 ± 0.93 | 41.48 ± 0.09 | eec1e33 | |
| Intel Arc Pro B50 | 193.50 ± 0.24 | 39.99 ± 0.10 | 7b43f55 | |
| Nvidia Tesla M40 | 92.48 ± 0.02 | 39.35 ± 1.22 | b8372ee | |
| AMD Radeon RX 580 | 258.03 ± 0.71 | 39.32 ± 0.03 | de4c07f | |
| AMD Radeon RX 470 | 218.07 ± 0.56 | 38.63 ± 0.21 | e288693 | |
| AMD Radeon Pro W5500 | 315.39 ± 3.76 | 36.82 ± 0.38 | 860a9e4 | |
| AMD Radeon RX 480 | 248.66 ± 0.28 | 34.71 ± 0.14 | 3b15924 | |
| Apple M2 Ultra | 205.98 ± 0.02 | 34.34 ± 0.12 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 186.24 ± 0.09 | 33.90 ± 0.51 | 860a9e4 | |
| Nvidia P106-100 | 183.78 ± 0.26 | 29.77 ± 0.04 | 23bc779 | |
| AMD FirePro W8100 | 155.22 ± 0.17 | 29.52 ± 0.05 | 4536363 | |
| Nvidia Tesla P4 | 265.54 ± 0.21 | 28.03 ± 0.14 | 24d2ee0 | |
| AMD Radeon RX 6500 XT | 255.25 ± 0.35 | 27.81 ± 0.10 | g9fdfcd | |
| Apple M3 | 263.70 ± 0.02 | 26.39 ± 0.14 | b9ab0a4 | MoltenVK |
| AMD FirePro S10000 | 94.78 ± 0.02 | 25.32 ± 0.02 | 914a82d | Split across two GPUs |
| Nvidia Quadro P2000 | 169.55 ± 0.17 | 23.05 ± 0.03 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 544.95 ± 4.15 | 22.49 ± 0.09 | cea560f | |
| AMD Ryzen AI 9 300 Series | 479.07 ± 0.41 | 22.41 ± 0.18 | N/A | |
| AMD Ryzen 6000 Series | 240.89 ± 0.52 | 21.26 ± 0.08 | ee09828 | |
| Apple M2 Pro | 62.70 ± 0.03 | 20.95 ± 0.11 | 1fe0029 | Asahi Linux |
| Nvidia GTX 1050 Ti | 136.42 ± 0.67 | 20.96 ± 0.21 | 2f0c2db | |
| AMD Ryzen 8000 Series | 266.19 ± 1.36 | 20.53 ± 0.08 | a5c07dc | |
| AMD Ryzen 7000 Series | 281.62 ± 1.56 | 19.91 ± 0.07 | ebce03e | |
| AMD Ryzen Z1 Extreme | 199.36 ± 7.02 | 18.77 ± 0.02 | 53ff6b9 | |
| AMD FirePro D700 | 69.95 ± 0.04 | 16.62 ± 0.01 | d3bd719 | MoltenVK, running in FP16 mode on FP32 only chip |
| AMD Radeon Pro WX 4100 | 78.79 ± 0.10 | 16.05 ± 0.07 | 860a9e4 | |
| Apple M2 | 50.79 ± 0.16 | 13.50 ± 0.02 | 8c0d6bb | Asahi Linux |
| Apple M1 | 38.29 ± 0.00 | 12.47 ± 0.03 | 2370665 | Asahi Linux |
| AMD Ryzen 5000 Series | 90.55 ± 0.08 | 10.98 ± 0.07 | d84635b | |
| Intel Core 1100 Series | 187.20 ± 1.78 | 10.39 ± 0.04 | abb9f3c | |
| AMD Radeon RX 550 | 52.66 ± 0.49 | 10.20 ± 0.01 | N/A | |
| AMD Ryzen 4000 Series | 103.87 ± 0.02 | 9.63 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 89.46 ± 0.10 | 9.39 ± 0.06 | 5d46bab | Running on single GPU |
| Nvidia Tesla K40 | 64.37 ± 0.09 | 9.30 ± 0.19 | eec1e33 | |
| MediaTek Dimensity 9400 | 38.36 ± 15.15 | 8.92 ± 0.06 | b9ab0a4 | GPU supports coopmat but pp512 is faster with it turned off |
| Intel Core Ultra 100 Series | 185.51 ± 0.22 | 8.21 ± 0.07 | 1d72c84 | |
| AMD Ryzen 3000 Series | 48.63 ± 0.10 | 8.49 ± 0.01 | 1fe0029 | |
| CIX CD8180 | 2.80 ± 0.01 | 5.51 ± 0.00 | 4dca015 | |
| Intel Core 1000 Series | 25.58 ± 0.00 | 4.25 ± 0.18 | N/A | |
| Intel Core 8000 Series | 25.43 ± 0.17 | 3.35 ± 0.03 | c4df49a | |
| Intel N150 | 28.84 ± 0.02 | 2.93 ± 0.00 | 4f63cd7 |
Llama 2 7B, Q4_0, FA enabled
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 11796.38 ± 601.36 | 273.68 ± 0.52 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3332.90 ± 11.47 | 195.30 ± 0.23 | 2f0c2db | |
| Nvidia RTX 5080 | 8054.59 ± 35.68 | 192.17 ± 0.21 | f6b533d | coopmat2 |
| Nvidia RTX 4090 | 10830.41 ± 36.25 | 190.10 ± 0.31 | 4ae88d0 | coopmat2 |
| Nvidia A100 | 7064.40 ± 1.63 | 170.56 ± 0.02 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4732.33 ± 4.80 | 162.28 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 8007.37 ± 46.03 | 150.20 ± 0.26 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4913.83 ± 21.52 | 145.74 ± 0.16 | 7c7d6ce | coopmat2 |
| Nvidia Tesla V100 | 1411.25 ± 2.12 | 142.13 ± 0.03 | 7d77f07 | |
| Nvidia RTX A5000 | 4071.22 ± 13.13 | 140.43 ± 0.22 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 4911.74 ± 28.52 | 138.20 ± 0.18 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6764.53 ± 11.95 | 135.65 ± 0.02 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4333.83 ± 29.36 | 130.90 ± 0.12 | 3191462 | |
| AMD Radeon RX 7900 XT | 3043.93 ± 10.42 | 124.20 ± 0.09 | 71e74a3 | |
| AMD Radeon RX 7800 XT | 2094.64 ± 14.38 | 119.63 ± 0.13 | 4fdbc1e | |
| AMD Radeon RX 9070 | 3277.24 ± 18.17 | 119.55 ± 0.06 | 21c17b5 | |
| AMD Radeon RX 7900 GRE | 2402.07 ± 22.50 | 116.77 ± 0.08 | 4b2a477 | |
| Apple M3 Ultra | 1115.55 ± 0.75 | 115.99 ± 0.12 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3314.53 ± 17.95 | 111.63 ± 0.05 | b863507 | |
| Nvidia Titan V | 792.74 ± 4.30 | 109.21 ± 0.72 | e56abd2 | |
| AMD Radeon Pro VII | 783.94 ± 0.77 | 108.45 ± 0.48 | N/A | |
| AMD Radeon RX 6900 XT | 1761.93 ± 4.75 | 106.15 ± 0.04 | a972fae | |
| Nvidia RTX 2080 Ti | 1936.25 ± 32.08 | 100.99 ± 0.24 | N/A | |
| AMD Radeon RX 6800 XT | 1704.79 ± 0.71 | 100.50 ± 0.06 | N/A | |
| AMD Radeon Pro W6800X Duo | 795.28 ± 0.72 | 100.08 ± 0.02 | N/A | |
| Nvidia RTX 5060 Ti | 3912.65 ± 5.86 | 97.01 ± 0.14 | 89f10ba | coopmat2 |
| AMD Radeon RX 6800 | 1749.46 ± 3.36 | 96.65 ± 0.48 | 4b385bf | |
| Nvidia RTX 4070 | 4293.57 ± 27.70 | 91.49 ± 0.89 | 9a48399 | coopmat2 |
| AMD Radeon RX 6750 XT | 997.05 ± 0.45 | 82.29 ± 0.06 | 228f34c | |
| AMD Radeon RX 6700 XT | 1010.90 ± 12.89 | 81.86 ± 0.19 | 6d75883 | |
| Nvidia RTX 3060 | 2012.88 ± 10.12 | 80.59 ± 0.02 | 92c0b38 | coopmat2 |
| AMD Radeon Pro V620 | 1556.31 ± 2.82 | 79.24 ± 0.09 | 03d4698 | |
| Nvidia RTX A4000 | 2482.74 ± 26.05 | 76.07 ± 0.08 | f5245b5 | coopmat2 |
| Nvidia Tesla T10 | 1840.14 ± 1.22 | 76.05 ± 0.13 | 7f76692 | coopmat2 |
| AMD Radeon RX 5700 XT | 538.31 ± 0.35 | 74.43 ± 0.03 | 4fdbc1e | |
| Intel Arc B580 | 419.49 ± 3.37 | 72.00 ± 0.24 | 7f76692 | |
| Apple M4 Max | 557.46 ± 26.87 | 71.79 ± 4.16 | 1ece0cb6 | |
| AMD Radeon Pro W5700 | 446.98 ± 0.39 | 71.30 ± 0.24 | 23bc779 | |
| Intel Arc Pro B60 | 274.76 ± 0.27 | 70.54 ± 0.03 | 516a4ca | |
| AMD Radeon RX 9060 XT | 1915.41 ± 7.90 | 70.52 ± 0.16 | ed52f36 | |
| Nvidia Tesla P100 | 685.51 ± 0.88 | 66.48 ± 0.02 | eec1e33 | |
| AMD Radeon RX 6650 XT | 1088.90 ± 0.40 | 64.53 ± 0.75 | dbb852b | |
| Nvidia GTX 1080 Ti | 529.96 ± 0.38 | 64.63 ± 0.10 | 360d653 | |
| AMD BC-250 | 356.87 ± 1.24 | 63.14 ± 0.09 | 5886f4f | |
| Nvidia RTX 3070 Mobile | 1832.07 ± 57.14 | 62.92 ± 0.37 | ceff6bb | coopmat2 |
| Nvidia RTX 4060 Mobile | 2358.03 ± 12.17 | 60.01 ± 0.08 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 484.37 ± 0.27 | 59.22 ± 0.15 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 514.34 ± 0.88 | 57.30 ± 0.42 | b43556e | |
| AMD Radeon RX 7600 XT | 1024.38 ± 7.56 | 56.11 ± 0.02 | 01d8eaa | |
| AMD FirePro S9300 x2 | 243.33 ± 0.22 | 55.64 ± 0.06 | eec1e33 | Split across two GPUs |
| Nvidia GB10 | 3279.89 ± 26.78 | 53.64 ± 0.05 | b9da444 | coopmat2 |
| AMD Radeon RX 6600 | 808.76 ± 0.15 | 53.24 ± 0.03 | b1c70e2 | |
| Intel Arc A770 | 1119.68 + 30.25 | 53.07 + 0.09 | a69d54f | |
| AMD Ryzen AI Max+ 395 | 1357.07 ± 10.94 | 53.00 ± 0.13 | 7f76692 | |
| AMD Radeon RX Vega 56 | 428.54 ± 0.50 | 52.66 ± 0.03 | 92c0b38 | |
| Intel Arc B570 | 288.51 ± 0.09 | 50.49 ± 0.05 | 7f76692 | |
| Nvidia P104-100 | 325.30 ± 0.25 | 48.64 ± 0.04 | eec1e33 | |
| AMD Radeon Pro V340 | 360.23 ± 0.74 | 47.54 ± 0.06 | 9da3dcd | Split across two GPUs |
| AMD Radeon RX 6800M | 784.16 ± 2.76 | 49.06 ± 0.34 | 8e6f8bc | |
| AMD Radeon RX Vega 64 | 320.12 ± 0.22 | 47.06 ± 0.01 | ec428b0 | |
| Nvidia RTX A2000 | 1361.85 ± 3.26 | 45.69 ± 0.20 | b1afcab | coopmat2 |
| Intel Arc A770M | 384.74 ± 0.78 | 45.68 ± 0.06 | eeee367 | |
| Intel Arc A750 | 303.37 ± 1.44 | 43.96 ± 0.03 | c1b1876 | |
| Nvidia GTX 1070 Ti | 292.85 ± 0.23 | 43.42 ± 0.34 | 860a9e4 | eGPU |
| Nvidia GTX 1070 | 330.84 ± 1.02 | 43.33 ± 0.06 | 360d653 | |
| Nvidia Tesla M40 | 93.35 ± 0.01 | 41.68 ± 0.01 | b8372ee | |
| Intel Arc Pro B50 | 132.48 ± 0.04 | 41.02 ± 0.04 | 7b43f55 | |
| AMD Radeon RX 470 | 197.26 ± 0.27 | 37.28 ± 0.11 | 3769fe6 | |
| AMD Radeon RX 480 | 194.52 ± 0.61 | 37.23 ± 0.09 | 0bcb40b | |
| Apple M2 Ultra | 198.83 ± 0.85 | 198.83 ± 0.85 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 180.97 ± 0.74 | 34.16 ± 0.10 | 860a9e4 | |
| Nvidia P106-100 | 183.40 ± 0.34 | 30.79 ± 0.32 | 23bc779 | |
| AMD FirePro W8100 | 140.52 ± 0.34 | 29.28 ± 0.14 | 4536363 | |
| Nvidia Tesla P4 | 287.14 ± 0.29 | 28.37 ± 0.24 | 24d2ee0 | |
| Nvidia Quadro P2000 | 181.71 ± 0.12 | 23.77 ± 0.02 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 536.48 ± 1.27 | 23.05 ± 0.04 | cea560f | |
| AMD Ryzen AI 9 300 Series | 532.59 ± 3.55 | 22.31 ± 0.06 | N/A | |
| AMD Ryzen 6000 Series | 277.91 ± 0.37 | 21.15 ± 0.09 | ee09828 | |
| Apple M2 Pro | 58.86 ± 0.02 | 20.97 ± 0.03 | 1fe0029 | Asahi Linux |
| AMD Ryzen 8000 Series | 297.39 ± 1.22 | 20.59 ± 0.38 | a5c07dc | |
| AMD Ryzen 7000 Series | 312.85 ± 2.51 | 20.09 ± 0.35 | 835b2b9 | |
| Nvidia GTX 1050 Ti | 127.54 ± 1.03 | 20.08 ± 0.17 | 2f0c2db | |
| AMD Radeon Pro WX 4100 | 75.59 ± 0.19 | 16.56 ± 0.04 | 860a9e4 | |
| Apple M1 | 35.93 ± 0.00 | 12.85 ± 0.02 | 2370665 | Asahi Linux |
| Apple M2 | 46.81 ± 0.08 | 12.25 ± 2.30 | 8c0d6bb | Asahi Linux |
| AMD Ryzen 5000 Series | 79.06 ± 0.01 | 10.75 ± 0.00 | 5d195f1 | |
| Intel Core 1100 Series | 174.77 ± 4.47 | 10.58 ± 0.03 | abb9f3c | |
| Nvidia Tesla K40 | 64.37 ± 0.02 | 9.92 ± 0.06 | eec1e33 | |
| AMD Ryzen 4000 Series | 113.32 ± 0.01 | 9.87 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 88.26 ± 0.19 | 9.49 ± 0.01 | 5d46bab | Running on single GPU |
| AMD Ryzen 5 3000 Series | 47.41 ± 0.14 | 8.47 ± 0.01 | 1fe0029 | |
| Intel Core Ultra 100 Series | 77.66 ± 2.75 | 7.75 ± 0.05 | 2e89f76 | |
| Intel Core 8000 Series | 25.55 ± 0.04 | 3.35 ± 0.02 | c4df49a | |
| Intel N150 | 25.59 ± 0.00 | 2.91 ± 0.00 | 4f63cd7 |
这些表格该怎么用
如果你只是想买卡或者看手里机器大概在哪个档位,最实用的读法其实是这三步:
-
先看你关心的是
tg128还是pp512。
日常对话、写代码、聊天体感,优先看tg128;长上下文吞吐、批处理、服务端压 prompt,更应该看pp512。 -
再看你实际跑的后端。
Nvidia 通常看CUDA更贴近真实上限;AMD 机器更应该先对照ROCm和Vulkan;跨平台兼容场景则更适合参考Vulkan。 -
最后再看
FA。
很多卡开启FA后pp512会涨得更明显,但tg128不一定同步大涨,所以不能只看单个最高分。
一句话总结
同样是 llama.cpp 跑分,pp512、tg128、Q4_0、FA、CUDA / ROCm / Vulkan 分别代表的是完全不同的维度。把口径先分清,再看数字,榜单才有意义。
如果你只想记一个最短结论,那就是:
CUDA目前整体最强ROCm在高端 AMD 卡上已经很能打Vulkan覆盖最广,老卡、核显、Intel Arc、Apple Asahi 都能找到可比条目tg128比pp512更接近日常真实体感
原始来源
- CUDA discussion #15013: https://github.com/ggml-org/llama.cpp/discussions/15013
- Apple Silicon discussion #4167: https://github.com/ggml-org/llama.cpp/discussions/4167
- ROCm discussion #15021: https://github.com/ggml-org/llama.cpp/discussions/15021
- Vulkan discussion #10879: https://github.com/ggml-org/llama.cpp/discussions/10879