llama.cpp ollama 显卡性能天梯:CUDA、ROCm、Vulkan

基于 GitHub Discussions 中 scoreboard 页面,整理 llama.cpp 在 CUDA、ROCm、Vulkan 下的完整 GPU 跑分表,并解释 pp512、tg128、Q4_0、FA 等指标到底怎么看。

先看懂这些参数

Q4_0 是什么

Q4_0 是一种 4-bit 量化格式。它的意义不是“模型更强”,而是“模型更小、更省显存、更容易塞进更多设备里”。这些榜单大多统一用 Llama 2 7B, Q4_0,核心目的是减少变量,让不同 GPU 的成绩更容易横向比较。

pp512 是什么

pp512 一般可以理解为 prompt processing 512 tokens,也就是处理 512 个输入 token 时的吞吐。

  • pp = prompt processing
  • 512 = 输入长度是 512 token
  • t/s = tokens per second

它更像“吃提示词的速度”,通常能并行得更充分,所以数字往往很高。

tg128 是什么

tg128 一般可以理解为 text generation 128 tokens,也就是连续生成 128 个 token 时的速度。

  • tg = text generation
  • 128 = 连续生成 128 token
  • t/s = tokens per second

它更接近我们平时感受到的“模型回答快不快”。因为生成阶段是逐 token 递推,所以通常明显低于 pp512

FA 是什么

FAFlash Attention。简单理解就是注意力计算的一种优化开关。

  • with FA 表示启用了 Flash Attention
  • no FA 表示关闭 Flash Attention

在不少卡上,FApp512 的提升比对 tg128 更明显;但不同后端、不同驱动和不同架构之间,提升幅度并不一致,个别设备甚至会出现 PP 升、TG 变化很小,或者 PP 反而下降的情况。

t/s 怎么看

t/s 就是 tokens per second。它不是帧率,也不是 FLOPS,而是模型吞吐表现的直接结果。

读榜单时最重要的一点是:先确认你在比的是不是同一种测试。

  • 不要把 pp512tg128 直接混着比
  • 不要把 no FAwith FA 混着比
  • 不要把 CUDA、ROCm、Vulkan 的结果当成完全等价的同一条曲线

先说结论

从这几条讨论串当前可见的数据看,大致可以先记住这几个结论:

  • CUDA 仍然是目前 llama.cpp GPU 跑分里最强、样本也最密集的一条线,特别是高端 Nvidia 卡在 pp512 上优势很大。
  • ROCm 在高端 AMD 卡和 Instinct 卡上已经能给出非常像样的成绩,MI300X7900 XTXW7900 这些条目都不弱。
  • Vulkan 的优点不是“绝对最快”,而是覆盖面最广,Nvidia、AMD、Intel、Apple Asahi / MoltenVK,甚至很多老卡和核显都能找到条目。
  • tg128 往往更接近日常体感,pp512 更适合看吞吐能力。很多“榜一”卡,在两项里领先幅度并不完全一样。

CUDA 完整榜单

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14073.41 ± 115.16 290.02 ± 1.10 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 14854.63 ± 22.73 274.20 ± 0.14 79c1160 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 9918.34 ± 176.97 267.81 ± 1.54 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 4849.53 ± 8.94 190.88 ± 0.33 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 10293.86 ± 134.72 189.33 ± 0.19 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 11992.70 ± 107.99 186.21 ± 0.13 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 8297.36 ± 9.50 181.99 ± 0.42 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 6952.38 ± 13.73 176.85 ± 0.07 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 9229.23 ± 101.78 176.07 ± 0.26 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6567.49 ± 20.30 171.19 ± 3.98 9c35706 @slaren
RTX 3090 24 GB / GDDR6X / 384 bit 5174.69 ± 21.83 158.16 ± 0.21 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 8870.49 ± 378.76 152.01 ± 0.28 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 8125.15 ± 41.05 148.33 ± 0.20 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 8031.64 ± 26.49 142.49 ± 0.16 20638e4 @Ristovski
RTX 3080 10 GB / GDDR6X / 320 bit 5013.86 ± 24.80 139.65 ± 0.99 9c35706 @slaren
RTX A6000 48 GB / GDDR6 / 384 bit 4913.93 ± 6.79 138.73 ± 2.75 4795c91 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 6924.53 ± 13.87 132.26 ± 0.16 9c35706 @Ristovski
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 4992.83 ± 113.52 131.66 ± 0.20 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4028.16 ± 19.14 130.07 ± 2.74 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 3042.64 ± 40.71 129.08 ± 0.05 51f5a45 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5184.75 ± 18.70 127.54 ± 0.46 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 4609.01 ± 10.67 124.11 ± 0.17 3470a5c @Hedede
A30 24 GB / HBM2e / 3072 bit 2767.10 ± 1.88 124.81 ± 0.16 583cb83 @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2617.46 ± 2.10 108.79 ± 0.05 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 2890.66 ± 2.42 107.51 ± 0.21 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 2751.18 ± 19.43 102.77 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 2709.95 ± 3.35 102.68 ± 0.03 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 2827.20 ± 66.43 97.32 ± 2.80 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 3737.25 ± 6.79 90.94 ± 0.02 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2088.34 ± 1.94 88.06 ± 0.28 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2684.06 ± 15.28 83.77 ± 0.37 65349f2 @TinyServal
Titan Xp 12 GB / GDDR5X / 384 bit 1154.96 ± 1.46 76.08 ± 0.08 c4510dc @Hedede
RTX 3060 12 GB / GDDR6 / 192 bit 2137.50 ± 10.12 75.57 ± 0.07 baa9255 @QuantiusBenignus
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1536.89 ± 0.90 65.62 ± 0.62 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3394.63 ± 7.44 63.86 ± 0.01 89d1029 @mike-llamacpp
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1084.41 ± 3.01 62.49 ± 0.06 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 2779.77 ± 9.91 61.83 ± 0.04 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1420.24 ± 1.95 60.04 ± 0.01 5c0eb5e @ggerganov
Tesla P100 16 GB / HBM2 / 4096 bit 760.80 ± 2.92 58.35 ± 0.00 b8372ee @Hedede
DGX Spark 128 GB / LPDDR5x 3062.31 ± 11.02 57.21 ± 0.06 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1007.42 ± 1.23 54.74 ± 0.07 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 1956.22 ± 7.74 50.62 ± 0.04 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1219.06 ± 4.18 46.38 ± 0.73 d32e03f @pt13762104
RTX 4050 Laptop 6 GB / GDDR6 / 96 bit 1725.85 + 17.85 43.72 + 0.41 d79d8f3 @TimCabbage
GTX 1660 6 GB / GDDR5 / 192 bit 148.91 ± 0.01 41.35 ± 0.02 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 282.65 ± 0.15 38.04 ± 0.02 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 714.44 ± 2.04 37.82 ± 0.02 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 991.31 ± 1.15 33.58 ± 0.14 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 514.53 ± 3.06 33.29 ± 0.00 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 406.94 ± 0.25 30.40 ± 0.02 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 416.85 ± 1.75 27.79 ± 0.02 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 79.44 ± 0.01 27.82 ± 0.18 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 309.30 ± 0.05 23.63 ± 0.00 baa9255 @TinyServal
Quadro P1000 4 GB / GDDR5 / 128 bit 183.40 ± 0.11 13.99 ± 0.13 1e74897 @aleksyx
Tesla K80 12 GB / GDDR5 / 384 bit 133.14 ± 0.55 13.80 ± 0.02 32732f2 @pebaryan

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14970.15 ± 381.06 300.40 ± 0.28 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 16618.98 ± 20.66 281.11 ± 0.41 5143fa8 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 11263.29 ± 98.34 280.74 ± 1.17 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 5285.96 ± 6.58 200.90 ± 0.12 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 12506.97 ± 11.51 191.57 ± 0.03 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 14770.63 ± 102.93 188.96 ± 0.05 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 9487.70 ± 21.89 184.68 ± 0.05 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 8419.56 ± 35.50 182.43 ± 0.09 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 10576.85 ± 530.21 179.47 ± 0.32 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6924.01 ± 10.76 172.26 ± 1.31 9c35706 @slaren
RTX PRO 4500 Blackwell 32 GB / GDDR7 / 256 bit 7251.66 ± 92.40 168.90 ± 0.20 becc481 @Hedede
RTX 3090 24 GB / GDDR6X / 384 bit 5560.06 ± 16.28 161.89 ± 0.18 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 10097.64 ± 671.22 153.76 ± 0.12 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 9439.01 ± 56.75 147.48 ± 1.41 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 9205.93 ± 22.31 143.47 ± 0.02 20638e4 @Ristovski
RTX A6000 48 GB / GDDR6 / 384 bit 5662.39 ± 13.87 144.87 ± 0.18 4795c91 @Hedede
RTX 3080 10 GB / GDDR6X / 320 bit 5569.56 ± 14.04 139.95 ± 0.95 9c35706 @slaren
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 5674.44 ± 139.53 136.38 ± 0.13 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4552.15 ± 9.68 135.83 ± 0.11 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 2973.78 ± 3.62 134.76 ± 0.02 51f5a45 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 7612.32 ± 37.35 132.85 ± 0.31 9c35706 @Ristovski
A30 24 GB / HBM2e / 3072 bit 3068.72 ± 0.63 131.93 ± 0.18 583cb83 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5783.44 ± 36.95 128.21 ± 2.52 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 5256.38 ± 19.39 126.24 ± 0.06 3470a5c @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2481.25 ± 1.31 112.17 ± 0.01 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 3107.61 ± 4.34 109.17 ± 0.07 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 3053.96 ± 1.37 104.38 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 3052.35 ± 5.64 103.63 ± 0.02 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 3453.10 ± 49.19 103.00 ± 0.25 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 4195.53 ± 1.98 93.46 ± 0.01 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2293.29 ± 5.91 87.71 ± 0.29 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2807.83 ± 52.44 85.17 ± 0.66 65349f2 @TinyServal
RTX 3060 12 GB / GDDR6 / 192 bit 2407.67 ± 3.73 76.92 ± 0.03 baa9255 @QuantiusBenignus
Titan Xp 12 GB / GDDR5X / 384 bit 1218.12 ± 1.82 73.84 ± 0.04 c4510dc @Hedede
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1662.80 ± 2.04 67.62 ± 0.67 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3803.45 ± 70.80 64.03 ± 0.53 89d1029 @mike-llamacpp
Tesla P100 16 GB / HBM2 / 4096 bit 787.36 ± 3.27 61.99 ± 0.00 b8372ee @Hedede
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1138.14 ± 2.02 61.38 ± 0.03 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 3171.86 ± 4.34 61.37 ± 0.01 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1563.77 ± 0.51 61.13 ± 0.05 5c0eb5e @ggerganov
DGX Spark 128 GB / LPDDR5x 3661.37 ± 38.66 56.74 ± 0.03 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1079.66 ± 0.18 53.73 ± 0.05 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 2250.14 ± 5.91 50.71 ± 0.01 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1309.73 ± 1.02 44.03 ± 0.57 d32e03f @pt13762104
GTX 1660 6 GB / GDDR5 / 192 bit 154.45 ± 0.52 41.43 ± 0.01 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 290.17 ± 0.11 39.98 ± 0.01 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 790.52 ± 2.39 37.87 ± 0.00 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 1171.96 ± 4.70 35.88 ± 0.18 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 529.53 ± 2.12 33.12 ± 0.03 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 438.49 ± 0.38 30.64 ± 0.06 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 446.19 ± 0.81 28.18 ± 0.01 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 27.46 ± 0.23 27.46 ± 0.23 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 311.55 ± 0.19 23.76 ± 0.01 baa9255 @TinyServal
Tesla K80 12 GB / GDDR5 / 384 bit 133.36 ± 0.60 14.27 ± 0.32 32732f2 @pebaryan
Quadro P1000 4 GB / GDDR5 / 128 bit 173.82 ± 0.02 13.65 ± 0.14 1e74897 @aleksyx

Apple Silicon 参考口径

#4167 这条讨论和后三条最大的区别,是它更早建立了统一口径,除了 Q4_0,还会顺带放 F16Q8_0。它对理解 PP / TG / t/s 很有帮助。

讨论里直接给出的说明是:

  • PP 表示 prompt processing
  • TG 表示 text-generation
  • t/s 表示 tokens per second

文中可见的一个时间对比样例,是 M2 Ultra 在同一台机器上随着版本和 FA 演进后的成绩:

时间 设备 版本/说明 带宽 GB/s GPU 核心 F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG
2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27
2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80
2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41

讨论正文前部还给了几台 Apple Silicon 设备的统一样例:

设备 Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG
M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75
M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02
M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27

Apple 这条线这里不展开全文搬运,后面重点看你指定的三类独显后端榜单。

ROCm / HIP 完整榜单

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11476.40 ± 72.79 232.92 ± 0.53 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3552.27 ± 101.96 167.11 ± 0.50 2f0c2db @Diablo-D3
Instinct MI210 64 GB / HBM2e / 4096 bit 2486.22 ± 9.58 124.51 ± 0.04 8160b38 @65a
Pro W7900 48 GB / GDDR6 / 384 bit 3213.17 ± 80.47 121.18 ± 0.06 8160b38 @65a
RX 7900 XT 20 GB / GDDR6 / 320 bit 3098.38 ± 24.02 116.15 ± 0.06 1e15bfd @AdamNiederer
RX 9070 16 GB / GDDR6 / 256 bit 2381.77 ± 3.68 114.48 ± 0.60 d0660f2 @andj1210
Instinct MI100 32 GB / HBM2 / 4096 bit 2732.83 ± 1.98 110.48 ± 0.14 9c35706 @firefox42
RX 9070 XT 16 GB / GDDR6 / 256 bit 5055.19 ± 109.58 101.27 ± 0.27 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2151.81 + 17.94 100.94 + 0.10 00131d6 @olegshulyakov
Instinct MI50 32 GB / HBM2 / 4096 bit 1057.24 ± 0.53 98.95 ± 0.25 97d5117 @wtarreau
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1456.98 ± 12.39 96.07 ± 0.10 6fa3b55 @MihaiBojescu
AI PRO R9700 32 GB / GDDR6 / 256 bit 4443.54 ± 339.25 93.84 ± 0.26 bd4ef13 @gogich77
Instinct MI60 32 GB / HBM2 / 4096 bit 1289.11 ± 0.62 91.46 ± 0.13 504af20 @Said-Akbar
RX 6900 XT 16 GB / GDDR6 / 256 bit 1889.84 ± 31.21 88.49 ± 0.00 a972fae @notgood
Pro VII 16 GB / HBM2 / 4096 bit 1064.99 ± 1.18 87.45 ± 0.04 2739a71 @8XXD8
RX 6800 XT 16 GB / GDDR6 / 256 bit 1447.07 ± 1.36 83.92 ± 0.03 79c1160 @MrLavender
Pro V620 32 GB / GDDR6 / 256 bit 1803.65 ± 2.54 74.66 ± 0.01 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1419.67 ± 3.64 67.58 ± 0.24 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 354.17 ± 0.18 67.55 ± 0.04 c05e8c9 @daniandtheweb
Instinct MI25 16 GB / HBM2 / 2048 bit 409.83 ± 0.23 63.94 ± 0.06 2739a71 @8XXD8
AI Max+ 395 128 GB / LPDDR5 911.36 ± 1.79 50.01 ± 0.07 e60f241 @firefox42
RX 7600 XT 16 GB / GDDR6 / 128 bit 1099.64 ± 2.05 48.58 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 240.68 ± 0.09 48.46 ± 0.09 ec428b0 @davispuh
Radeon 8060S System Shared / DDR5 351.36 ± 0.67 47.97 ± 0.33 1d0125b @hspak
Radeon 880M System Shared / DDR5 163.25 ± 13.86 12.97 ± 1.63 c55d53a @Hedede

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11945.97 ± 54.29 218.53 ± 0.09 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3874.25 ± 11.92 170.12 ± 0.56 2f0c2db @Diablo-D3
Pro W7900 48 GB / GDDR6 / 384 bit 3472.86 ± 52.86 127.43 ± 0.12 8160b38 @65a
Instinct MI210 64 GB / HBM2e / 4096 bit 2571.82 ± 2.89 130.18 ± 0.06 8160b38 @65a
RX 9070 16 GB / GDDR6 / 256 bit 2452.68 ± 1.33 115.32 ± 0.52 d0660f2 @andj1210
RX 7900 XT 20 GB / GDDR6 / 320 bit 3261.75 ± 9.09 112.30 ± 0.06 1e15bfd @AdamNiederer
Instinct MI50 32 GB / HBM2 / 4096 bit 1129.43 ± 0.15 105.82 ± 0.07 97d5117 @wtarreau
Instinct MI100 32 GB / HBM2 / 4096 bit 2755.00 ± 3.68 104.71 ± 0.10 9c35706 @firefox42
AI PRO R9700 32 GB / GDDR6 / 256 bit 4773.07 ± 49.30 97.98 ± 0.13 bd4ef13 @gogich77
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1598.79 ± 11.48 97.53 ± 0.06 6fa3b55 @MihaiBojescu
RX 9070 XT 16 GB / GDDR6 / 256 bit 4903.51 ± 96.36 97.28 ± 0.13 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2304.63 + 2.85 95.99 + 0.21 00131d6 @olegshulyakov
RX 6900 XT 16 GB / GDDR6 / 256 bit 1948.31 ± 13.51 85.04 ± 0.02 a972fae @notgood
Pro V620 32 GB / GDDR6 / 256 bit 1256.86 ± 0.55 70.83 ± 0.02 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1479.27 ± 0.71 65.42 ± 0.19 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 314.17 ± 0.29 62.02 ± 0.05 c05e8c9 @daniandtheweb
AI Max+ 395 128 GB / LPDDR5 1003.53 ± 2.91 49.87 ± 0.02 e60f241 @firefox42
Radeon 8060S System Shared / DDR5 366.08 ± 1.44 48.97 ± 0.15 1d0125b @hspak
RX 7600 XT 16 GB / GDDR6 / 128 bit 1199.16 ± 1.07 47.65 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 153.17 ± 0.72 42.46 ± 0.40 ec428b0 @davispuh
Radeon 880M System Shared / DDR5 213.31 ± 14.05 16.16 ± 1.41 c55d53a @Hedede

Vulkan 完整榜单

Llama 2 7B, Q4_0, no FA

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 10381.64 ± 508.84 263.63 ± 0.91 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3531.93 ± 31.74 191.28 ± 0.20 2f0c2db
Nvidia RTX 4090 9452.03 ± 187.70 187.97 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 5080 7444.99 ± 20.11 185.10 ± 0.54 f6b533d coopmat2
Nvidia A100 6389.86 ± 4.83 160.78 ± 0.16 2257758 coopmat2
Nvidia RTX 3090 4298.97 ± 10.59 160.13 ± 0.25 4ae88d0 coopmat2
Nvidia RTX 4080 Super 7101.18 ± 269.79 147.13 ± 5.64 81086cd coopmat2
Nvidia RTX 3080 4287.11 ± 55.50 139.15 ± 0.05 7c7d6ce coopmat2
Nvidia RTX A5000 3641.55 ± 9.05 139.89 ± 0.69 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 5036.04 ± 88.16 137.11 ± 0.02 e9fd8dc
Nvidia RTX 5070 Ti 6213.63 ± 27.72 135.63 ± 0.18 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4036.04 ± 34.58 130.19 ± 0.39 3191462
Nvidia Tesla V100 1391.39 ± 1.19 129.58 ± 0.58 7d77f07
Nvidia RTX 4070 Ti Super 6099.18 ± 154.30 129.45 ± 0.18 4ae88d0 coopmat2
AMD Radeon RX 7900 XT 2941.58 ± 17.17 123.18 ± 0.40 71e74a3
AMD Radeon RX 9070 3164.10 ± 66.84 119.71 ± 3.40 21c17b5
AMD Radeon RX 7800 XT 2017.33 ± 19.30 118.27 ± 0.27 4fdbc1e
AMD Radeon RX 7900 GRE 2336.31 ± 7.52 116.11 ± 0.26 4b2a477
Apple M3 Ultra 1116.83 ± 0.55 115.54 ± 0.78 2d451c8 MoltenVK
Intel Arc Pro B70 3379.00 ± 47.92 112.02 ± 1.08 b863507
Nvidia Titan V 984.36 ± 4.13 108.86 ± 0.28 e56abd2
AMD Radeon Pro VII 1078.54 ± 0.86 107.82 ± 0.14 N/A
AMD Radeon RX 6900 XT 1837.21 ± 25.44 104.60 ± 0.30 a972fae
Intel Arc Pro A60 2261.11 ± 9.53 104.25 ± 0.07 97d5117
AMD Radeon RX 6800 XT 1752.92 ± 1.71 100.32 ± 0.97 N/A
AMD Radeon VII 1059.14 ± 0.56 101.19 ± 0.53 77d6ae4
Nvidia RTX 2080 Ti 1888.24 ± 9.20 97.58 ± 6.60 N/A
AMD Radeon RX 6800 1698.69 ± 0.80 95.61 ± 0.19 4b385bf
AMD Radeon Pro W6800X Duo 687.71 ± 4.33 94.82 ± 0.12 N/A
Nvidia RTX 5060 Ti 3460.92 ± 7.16 93.51 ± 0.15 89f10ba coopmat2
Nvidia RTX 4070 3179.37 ± 46.16 92.29 ± 0.28 9a48399
AMD Radeon Pro W6800X 510.80 ± 0.13 86.47 ± 0.46 13b4548 MoltenVK
AMD Radeon RX 6700 XT 1051.20 ± 0.98 83.88 ± 0.08 6d75883
AMD Radeon RX 6750 XT 1040.58 ± 0.35 81.98 ± 0.03 228f34c
AMD Radeon Pro V620 1595.32 ± 1.59 81.78 ± 0.06 03d4698
Nvidia RTX 3070 2113.02 ± 7.38 78.71 ± 0.13 1b8fb81
AMD Radeon Instinct MI60 369.26 ± 2.48 78.16 ± 1.40 504af20
Nvidia RTX 3060 1815.70 ± 5.85 75.94 ± 0.80 92c0b38 coopmat2
Apple M4 Max 724.77 ± 20.93 75.02 ± 0.14 1ece0cb6
Nvidia Tesla T10 1692.70 ± 2.05 75.01 ± 0.21 7f76692 coopmat2
Nvidia RTX A4000 2248.14 ± 7.59 73.74 ± 0.08 f5245b5 coopmat2
AMD Radeon RX 5700 XT 529.69 ± 0.26 70.73 ± 0.04 4fdbc1e
AMD Radeon RX 9060 XT 2141.67 ± 6.87 70.54 ± 0.74 ed52f36
Intel Arc B580 620.94 ± 15.33 70.14 ± 0.28 7f76692
AMD Radeon Pro V540 583.88 ± 6.56 69.64 ± 0.24 9da3dcd
AMD Radeon Pro W5700 449.85 ± 0.46 68.55 ± 0.15 23bc779
Intel Arc Pro B60 522.36 ± 3.60 68.55 ± 0.01 516a4ca
Nvidia GTX 1080 Ti 540.69 ± 0.71 64.99 ± 0.08 360d653
Nvidia RTX 2070 Super 1199.13 ± 7.70 64.64 ± 0.20 b7552cf
Nvidia RTX 3070 Mobile 1689.40 ± 19.57 63.64 ± 0.39 ceff6bb coopmat2
Nvidia Tesla P100 678.14 ± 1.40 63.16 ± 0.06 eec1e33
AMD BC-250 370.66 ± 0.04 62.32 ± 0.32 5886f4f
AMD Radeon RX 6650 XT 1029.52 ± 1.21 62.14 ± 0.02 dbb852b
Nvidia RTX 4060 Mobile 2135.66 ± 23.18 59.53 ± 0.03 a5c07dc coopmat2
Nvidia Tesla P40 488.06 ± 0.27 59.36 ± 0.16 N/A
Nvidia GTX 1660 Ti Mobile 511.67 ± 2.85 56.60 ± 0.07 b43556e
AMD Radeon Instinct MI25 439.42 ± 0.34 54.69 ± 0.03 2739a71
AMD Radeon RX 6600 XT 574.65 ± 0.86 53.92 ± 0.11 091592d
AMD Ryzen AI Max+ 395 1288.96 ± 6.49 53.59 ± 0.38 7f76692
AMD Radeon RX 7600 XT 840.85 ± 3.02 53.02 ± 0.01 01d8eaa
Intel Arc A770 1073.85 + 29.68 52.56 + 0.11 a69d54f
Nvidia GB10 2737.79 ± 19.56 52.28 ± 0.03 b9da444 coopmat2
AMD FirePro S9300 x2 247.26 ± 0.43 51.86 ± 0.11 eec1e33 Split across two GPUs
AMD Radeon RX 6600 761.89 ± 1.76 50.63 ± 0.02 b1c70e2
AMD Radeon RX Vega 56 439.87 ± 0.61 50.23 ± 0.14 92c0b38
Intel Arc B570 913.95 ± 0.90 49.64 ± 0.03 7f76692
Nvidia RTX 3060 Mobile 1059.76 ± 3.54 49.03 ± 0.13 dbb3a47
AMD Radeon RX 6800M 861.99 ± 7.67 48.71 ± 0.71 8e6f8bc
AMD Radeon RX 6600M 605.59 ± 0.65 48.21 ± 0.07 fe5b78c
Intel Arc A770M 875.92 ± 2.16 47.69 ± 0.16 eeee367
Nvidia P104-100 311.90 ± 0.22 46.18 ± 0.05 eec1e33
AMD Radeon RX Vega 64 356.08 ± 0.09 45.73 ± 0.18 ec428b0
Nvidia RTX A2000 1245.19 ± 8.76 45.52 ± 0.54 b1afcab coopmat2
AMD Radeon RX 7600M XT 459.39 ± 2.34 45.28 ± 0.10 b9ab0a4 eGPU
AMD Radeon Pro V340 375.41 ± 0.24 45.16 ± 0.06 9da3dcd Split across two GPUs
Nvidia GTX 1070 Ti 297.50 ± 0.54 42.86 ± 1.20 860a9e4 eGPU
Intel Arc A750 1075.94 ± 13.89 42.66 ± 0.18 c1b1876
Nvidia RTX 4050 Mobile 1154.28 + 15.76 41.89 + 0.10 d79d8f3
Nvidia GTX 1070 321.57 ± 0.93 41.48 ± 0.09 eec1e33
Intel Arc Pro B50 193.50 ± 0.24 39.99 ± 0.10 7b43f55
Nvidia Tesla M40 92.48 ± 0.02 39.35 ± 1.22 b8372ee
AMD Radeon RX 580 258.03 ± 0.71 39.32 ± 0.03 de4c07f
AMD Radeon RX 470 218.07 ± 0.56 38.63 ± 0.21 e288693
AMD Radeon Pro W5500 315.39 ± 3.76 36.82 ± 0.38 860a9e4
AMD Radeon RX 480 248.66 ± 0.28 34.71 ± 0.14 3b15924
Apple M2 Ultra 205.98 ± 0.02 34.34 ± 0.12 dbb852b Asahi Linux
Nvidia GTX 980 186.24 ± 0.09 33.90 ± 0.51 860a9e4
Nvidia P106-100 183.78 ± 0.26 29.77 ± 0.04 23bc779
AMD FirePro W8100 155.22 ± 0.17 29.52 ± 0.05 4536363
Nvidia Tesla P4 265.54 ± 0.21 28.03 ± 0.14 24d2ee0
AMD Radeon RX 6500 XT 255.25 ± 0.35 27.81 ± 0.10 g9fdfcd
Apple M3 263.70 ± 0.02 26.39 ± 0.14 b9ab0a4 MoltenVK
AMD FirePro S10000 94.78 ± 0.02 25.32 ± 0.02 914a82d Split across two GPUs
Nvidia Quadro P2000 169.55 ± 0.17 23.05 ± 0.03 63f8fe0
Intel Core Ultra 200 Series 544.95 ± 4.15 22.49 ± 0.09 cea560f
AMD Ryzen AI 9 300 Series 479.07 ± 0.41 22.41 ± 0.18 N/A
AMD Ryzen 6000 Series 240.89 ± 0.52 21.26 ± 0.08 ee09828
Apple M2 Pro 62.70 ± 0.03 20.95 ± 0.11 1fe0029 Asahi Linux
Nvidia GTX 1050 Ti 136.42 ± 0.67 20.96 ± 0.21 2f0c2db
AMD Ryzen 8000 Series 266.19 ± 1.36 20.53 ± 0.08 a5c07dc
AMD Ryzen 7000 Series 281.62 ± 1.56 19.91 ± 0.07 ebce03e
AMD Ryzen Z1 Extreme 199.36 ± 7.02 18.77 ± 0.02 53ff6b9
AMD FirePro D700 69.95 ± 0.04 16.62 ± 0.01 d3bd719 MoltenVK, running in FP16 mode on FP32 only chip
AMD Radeon Pro WX 4100 78.79 ± 0.10 16.05 ± 0.07 860a9e4
Apple M2 50.79 ± 0.16 13.50 ± 0.02 8c0d6bb Asahi Linux
Apple M1 38.29 ± 0.00 12.47 ± 0.03 2370665 Asahi Linux
AMD Ryzen 5000 Series 90.55 ± 0.08 10.98 ± 0.07 d84635b
Intel Core 1100 Series 187.20 ± 1.78 10.39 ± 0.04 abb9f3c
AMD Radeon RX 550 52.66 ± 0.49 10.20 ± 0.01 N/A
AMD Ryzen 4000 Series 103.87 ± 0.02 9.63 ± 0.01 4b385bf
Nvidia Tesla K80 89.46 ± 0.10 9.39 ± 0.06 5d46bab Running on single GPU
Nvidia Tesla K40 64.37 ± 0.09 9.30 ± 0.19 eec1e33
MediaTek Dimensity 9400 38.36 ± 15.15 8.92 ± 0.06 b9ab0a4 GPU supports coopmat but pp512 is faster with it turned off
Intel Core Ultra 100 Series 185.51 ± 0.22 8.21 ± 0.07 1d72c84
AMD Ryzen 3000 Series 48.63 ± 0.10 8.49 ± 0.01 1fe0029
CIX CD8180 2.80 ± 0.01 5.51 ± 0.00 4dca015
Intel Core 1000 Series 25.58 ± 0.00 4.25 ± 0.18 N/A
Intel Core 8000 Series 25.43 ± 0.17 3.35 ± 0.03 c4df49a
Intel N150 28.84 ± 0.02 2.93 ± 0.00 4f63cd7

Llama 2 7B, Q4_0, FA enabled

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 11796.38 ± 601.36 273.68 ± 0.52 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3332.90 ± 11.47 195.30 ± 0.23 2f0c2db
Nvidia RTX 5080 8054.59 ± 35.68 192.17 ± 0.21 f6b533d coopmat2
Nvidia RTX 4090 10830.41 ± 36.25 190.10 ± 0.31 4ae88d0 coopmat2
Nvidia A100 7064.40 ± 1.63 170.56 ± 0.02 2257758 coopmat2
Nvidia RTX 3090 4732.33 ± 4.80 162.28 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 4080 Super 8007.37 ± 46.03 150.20 ± 0.26 81086cd coopmat2
Nvidia RTX 3080 4913.83 ± 21.52 145.74 ± 0.16 7c7d6ce coopmat2
Nvidia Tesla V100 1411.25 ± 2.12 142.13 ± 0.03 7d77f07
Nvidia RTX A5000 4071.22 ± 13.13 140.43 ± 0.22 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 4911.74 ± 28.52 138.20 ± 0.18 e9fd8dc
Nvidia RTX 5070 Ti 6764.53 ± 11.95 135.65 ± 0.02 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4333.83 ± 29.36 130.90 ± 0.12 3191462
AMD Radeon RX 7900 XT 3043.93 ± 10.42 124.20 ± 0.09 71e74a3
AMD Radeon RX 7800 XT 2094.64 ± 14.38 119.63 ± 0.13 4fdbc1e
AMD Radeon RX 9070 3277.24 ± 18.17 119.55 ± 0.06 21c17b5
AMD Radeon RX 7900 GRE 2402.07 ± 22.50 116.77 ± 0.08 4b2a477
Apple M3 Ultra 1115.55 ± 0.75 115.99 ± 0.12 2d451c8 MoltenVK
Intel Arc Pro B70 3314.53 ± 17.95 111.63 ± 0.05 b863507
Nvidia Titan V 792.74 ± 4.30 109.21 ± 0.72 e56abd2
AMD Radeon Pro VII 783.94 ± 0.77 108.45 ± 0.48 N/A
AMD Radeon RX 6900 XT 1761.93 ± 4.75 106.15 ± 0.04 a972fae
Nvidia RTX 2080 Ti 1936.25 ± 32.08 100.99 ± 0.24 N/A
AMD Radeon RX 6800 XT 1704.79 ± 0.71 100.50 ± 0.06 N/A
AMD Radeon Pro W6800X Duo 795.28 ± 0.72 100.08 ± 0.02 N/A
Nvidia RTX 5060 Ti 3912.65 ± 5.86 97.01 ± 0.14 89f10ba coopmat2
AMD Radeon RX 6800 1749.46 ± 3.36 96.65 ± 0.48 4b385bf
Nvidia RTX 4070 4293.57 ± 27.70 91.49 ± 0.89 9a48399 coopmat2
AMD Radeon RX 6750 XT 997.05 ± 0.45 82.29 ± 0.06 228f34c
AMD Radeon RX 6700 XT 1010.90 ± 12.89 81.86 ± 0.19 6d75883
Nvidia RTX 3060 2012.88 ± 10.12 80.59 ± 0.02 92c0b38 coopmat2
AMD Radeon Pro V620 1556.31 ± 2.82 79.24 ± 0.09 03d4698
Nvidia RTX A4000 2482.74 ± 26.05 76.07 ± 0.08 f5245b5 coopmat2
Nvidia Tesla T10 1840.14 ± 1.22 76.05 ± 0.13 7f76692 coopmat2
AMD Radeon RX 5700 XT 538.31 ± 0.35 74.43 ± 0.03 4fdbc1e
Intel Arc B580 419.49 ± 3.37 72.00 ± 0.24 7f76692
Apple M4 Max 557.46 ± 26.87 71.79 ± 4.16 1ece0cb6
AMD Radeon Pro W5700 446.98 ± 0.39 71.30 ± 0.24 23bc779
Intel Arc Pro B60 274.76 ± 0.27 70.54 ± 0.03 516a4ca
AMD Radeon RX 9060 XT 1915.41 ± 7.90 70.52 ± 0.16 ed52f36
Nvidia Tesla P100 685.51 ± 0.88 66.48 ± 0.02 eec1e33
AMD Radeon RX 6650 XT 1088.90 ± 0.40 64.53 ± 0.75 dbb852b
Nvidia GTX 1080 Ti 529.96 ± 0.38 64.63 ± 0.10 360d653
AMD BC-250 356.87 ± 1.24 63.14 ± 0.09 5886f4f
Nvidia RTX 3070 Mobile 1832.07 ± 57.14 62.92 ± 0.37 ceff6bb coopmat2
Nvidia RTX 4060 Mobile 2358.03 ± 12.17 60.01 ± 0.08 a5c07dc coopmat2
Nvidia Tesla P40 484.37 ± 0.27 59.22 ± 0.15 N/A
Nvidia GTX 1660 Ti Mobile 514.34 ± 0.88 57.30 ± 0.42 b43556e
AMD Radeon RX 7600 XT 1024.38 ± 7.56 56.11 ± 0.02 01d8eaa
AMD FirePro S9300 x2 243.33 ± 0.22 55.64 ± 0.06 eec1e33 Split across two GPUs
Nvidia GB10 3279.89 ± 26.78 53.64 ± 0.05 b9da444 coopmat2
AMD Radeon RX 6600 808.76 ± 0.15 53.24 ± 0.03 b1c70e2
Intel Arc A770 1119.68 + 30.25 53.07 + 0.09 a69d54f
AMD Ryzen AI Max+ 395 1357.07 ± 10.94 53.00 ± 0.13 7f76692
AMD Radeon RX Vega 56 428.54 ± 0.50 52.66 ± 0.03 92c0b38
Intel Arc B570 288.51 ± 0.09 50.49 ± 0.05 7f76692
Nvidia P104-100 325.30 ± 0.25 48.64 ± 0.04 eec1e33
AMD Radeon Pro V340 360.23 ± 0.74 47.54 ± 0.06 9da3dcd Split across two GPUs
AMD Radeon RX 6800M 784.16 ± 2.76 49.06 ± 0.34 8e6f8bc
AMD Radeon RX Vega 64 320.12 ± 0.22 47.06 ± 0.01 ec428b0
Nvidia RTX A2000 1361.85 ± 3.26 45.69 ± 0.20 b1afcab coopmat2
Intel Arc A770M 384.74 ± 0.78 45.68 ± 0.06 eeee367
Intel Arc A750 303.37 ± 1.44 43.96 ± 0.03 c1b1876
Nvidia GTX 1070 Ti 292.85 ± 0.23 43.42 ± 0.34 860a9e4 eGPU
Nvidia GTX 1070 330.84 ± 1.02 43.33 ± 0.06 360d653
Nvidia Tesla M40 93.35 ± 0.01 41.68 ± 0.01 b8372ee
Intel Arc Pro B50 132.48 ± 0.04 41.02 ± 0.04 7b43f55
AMD Radeon RX 470 197.26 ± 0.27 37.28 ± 0.11 3769fe6
AMD Radeon RX 480 194.52 ± 0.61 37.23 ± 0.09 0bcb40b
Apple M2 Ultra 198.83 ± 0.85 198.83 ± 0.85 dbb852b Asahi Linux
Nvidia GTX 980 180.97 ± 0.74 34.16 ± 0.10 860a9e4
Nvidia P106-100 183.40 ± 0.34 30.79 ± 0.32 23bc779
AMD FirePro W8100 140.52 ± 0.34 29.28 ± 0.14 4536363
Nvidia Tesla P4 287.14 ± 0.29 28.37 ± 0.24 24d2ee0
Nvidia Quadro P2000 181.71 ± 0.12 23.77 ± 0.02 63f8fe0
Intel Core Ultra 200 Series 536.48 ± 1.27 23.05 ± 0.04 cea560f
AMD Ryzen AI 9 300 Series 532.59 ± 3.55 22.31 ± 0.06 N/A
AMD Ryzen 6000 Series 277.91 ± 0.37 21.15 ± 0.09 ee09828
Apple M2 Pro 58.86 ± 0.02 20.97 ± 0.03 1fe0029 Asahi Linux
AMD Ryzen 8000 Series 297.39 ± 1.22 20.59 ± 0.38 a5c07dc
AMD Ryzen 7000 Series 312.85 ± 2.51 20.09 ± 0.35 835b2b9
Nvidia GTX 1050 Ti 127.54 ± 1.03 20.08 ± 0.17 2f0c2db
AMD Radeon Pro WX 4100 75.59 ± 0.19 16.56 ± 0.04 860a9e4
Apple M1 35.93 ± 0.00 12.85 ± 0.02 2370665 Asahi Linux
Apple M2 46.81 ± 0.08 12.25 ± 2.30 8c0d6bb Asahi Linux
AMD Ryzen 5000 Series 79.06 ± 0.01 10.75 ± 0.00 5d195f1
Intel Core 1100 Series 174.77 ± 4.47 10.58 ± 0.03 abb9f3c
Nvidia Tesla K40 64.37 ± 0.02 9.92 ± 0.06 eec1e33
AMD Ryzen 4000 Series 113.32 ± 0.01 9.87 ± 0.01 4b385bf
Nvidia Tesla K80 88.26 ± 0.19 9.49 ± 0.01 5d46bab Running on single GPU
AMD Ryzen 5 3000 Series 47.41 ± 0.14 8.47 ± 0.01 1fe0029
Intel Core Ultra 100 Series 77.66 ± 2.75 7.75 ± 0.05 2e89f76
Intel Core 8000 Series 25.55 ± 0.04 3.35 ± 0.02 c4df49a
Intel N150 25.59 ± 0.00 2.91 ± 0.00 4f63cd7

这些表格该怎么用

如果你只是想买卡或者看手里机器大概在哪个档位,最实用的读法其实是这三步:

  1. 先看你关心的是 tg128 还是 pp512
    日常对话、写代码、聊天体感,优先看 tg128;长上下文吞吐、批处理、服务端压 prompt,更应该看 pp512

  2. 再看你实际跑的后端。
    Nvidia 通常看 CUDA 更贴近真实上限;AMD 机器更应该先对照 ROCmVulkan;跨平台兼容场景则更适合参考 Vulkan

  3. 最后再看 FA
    很多卡开启 FApp512 会涨得更明显,但 tg128 不一定同步大涨,所以不能只看单个最高分。

一句话总结

同样是 llama.cpp 跑分,pp512tg128Q4_0FACUDA / ROCm / Vulkan 分别代表的是完全不同的维度。把口径先分清,再看数字,榜单才有意义。

如果你只想记一个最短结论,那就是:

  • CUDA 目前整体最强
  • ROCm 在高端 AMD 卡上已经很能打
  • Vulkan 覆盖最广,老卡、核显、Intel Arc、Apple Asahi 都能找到可比条目
  • tg128pp512 更接近日常真实体感

原始来源

记录并分享
使用 Hugo 构建
主题 StackJimmy 设计