llama.cpp GPU Performance Ranking: Full CUDA, ROCm, and Vulkan Scoreboards Explained with pp512 / tg128 / FA

Based on the visible scoreboard data in GitHub Discussions as of 2026-04-23, this article compiles the full llama.cpp GPU benchmark tables for CUDA, ROCm, and Vulkan, and explains what pp512, tg128, Q4_0, and FA actually mean.

Understanding the Metrics First

What is Q4_0

Q4_0 is a 4-bit quantization format. It does not mean the model is stronger. It means the model is smaller, uses less VRAM, and fits on more devices. Most of these scoreboards standardize on Llama 2 7B, Q4_0 so that GPU-to-GPU comparisons are easier.

What is pp512

pp512 usually means prompt processing 512 tokens, which is the throughput while processing 512 input tokens.

  • pp = prompt processing
  • 512 = input length is 512 tokens
  • /s = tokens per second This is closer to prompt-ingestion speed, so it is often much higher than generation speed.

What is g128

g128 usually means 	ext generation 128 tokens, which is the speed while generating 128 tokens continuously.
  • g = text generation
  • 128 = generate 128 tokens continuously
  • /s = tokens per second This is usually closer to the speed users actually feel in interactive usage.

What is FA

FA stands for Flash Attention.

  • with FA means Flash Attention is enabled

o FA means Flash Attention is disabled On many GPUs, FA improves pp512 more clearly than g128, but the gain is not identical across backends, drivers, and GPU architectures.

How to read /s

/s means 	okens per second. When reading these scoreboards, the key rule is to compare the same type of test with the same settings.
  • Do not compare pp512 and g128 as if they were the same thing
  • Do not mix o FA and with FA
  • Do not assume CUDA, ROCm, and Vulkan are directly interchangeable

Quick Takeaways

  • CUDA is still the strongest overall path in llama.cpp GPU benchmarks, especially on high-end Nvidia GPUs.
  • ROCm is already delivering strong results on high-end AMD GPUs and Instinct accelerators.
  • Vulkan has the broadest hardware coverage, including Nvidia, AMD, Intel, older GPUs, and some Apple / Asahi setups.
  • g128 is closer to everyday perceived speed, while pp512 is better for judging prompt throughput.

CUDA Scoreboards

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14073.41 ± 115.16 290.02 ± 1.10 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 14854.63 ± 22.73 274.20 ± 0.14 79c1160 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 9918.34 ± 176.97 267.81 ± 1.54 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 4849.53 ± 8.94 190.88 ± 0.33 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 10293.86 ± 134.72 189.33 ± 0.19 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 11992.70 ± 107.99 186.21 ± 0.13 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 8297.36 ± 9.50 181.99 ± 0.42 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 6952.38 ± 13.73 176.85 ± 0.07 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 9229.23 ± 101.78 176.07 ± 0.26 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6567.49 ± 20.30 171.19 ± 3.98 9c35706 @slaren
RTX 3090 24 GB / GDDR6X / 384 bit 5174.69 ± 21.83 158.16 ± 0.21 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 8870.49 ± 378.76 152.01 ± 0.28 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 8125.15 ± 41.05 148.33 ± 0.20 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 8031.64 ± 26.49 142.49 ± 0.16 20638e4 @Ristovski
RTX 3080 10 GB / GDDR6X / 320 bit 5013.86 ± 24.80 139.65 ± 0.99 9c35706 @slaren
RTX A6000 48 GB / GDDR6 / 384 bit 4913.93 ± 6.79 138.73 ± 2.75 4795c91 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 6924.53 ± 13.87 132.26 ± 0.16 9c35706 @Ristovski
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 4992.83 ± 113.52 131.66 ± 0.20 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4028.16 ± 19.14 130.07 ± 2.74 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 3042.64 ± 40.71 129.08 ± 0.05 51f5a45 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5184.75 ± 18.70 127.54 ± 0.46 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 4609.01 ± 10.67 124.11 ± 0.17 3470a5c @Hedede
A30 24 GB / HBM2e / 3072 bit 2767.10 ± 1.88 124.81 ± 0.16 583cb83 @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2617.46 ± 2.10 108.79 ± 0.05 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 2890.66 ± 2.42 107.51 ± 0.21 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 2751.18 ± 19.43 102.77 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 2709.95 ± 3.35 102.68 ± 0.03 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 2827.20 ± 66.43 97.32 ± 2.80 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 3737.25 ± 6.79 90.94 ± 0.02 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2088.34 ± 1.94 88.06 ± 0.28 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2684.06 ± 15.28 83.77 ± 0.37 65349f2 @TinyServal
Titan Xp 12 GB / GDDR5X / 384 bit 1154.96 ± 1.46 76.08 ± 0.08 c4510dc @Hedede
RTX 3060 12 GB / GDDR6 / 192 bit 2137.50 ± 10.12 75.57 ± 0.07 baa9255 @QuantiusBenignus
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1536.89 ± 0.90 65.62 ± 0.62 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3394.63 ± 7.44 63.86 ± 0.01 89d1029 @mike-llamacpp
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1084.41 ± 3.01 62.49 ± 0.06 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 2779.77 ± 9.91 61.83 ± 0.04 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1420.24 ± 1.95 60.04 ± 0.01 5c0eb5e @ggerganov
Tesla P100 16 GB / HBM2 / 4096 bit 760.80 ± 2.92 58.35 ± 0.00 b8372ee @Hedede
DGX Spark 128 GB / LPDDR5x 3062.31 ± 11.02 57.21 ± 0.06 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1007.42 ± 1.23 54.74 ± 0.07 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 1956.22 ± 7.74 50.62 ± 0.04 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1219.06 ± 4.18 46.38 ± 0.73 d32e03f @pt13762104
RTX 4050 Laptop 6 GB / GDDR6 / 96 bit 1725.85 + 17.85 43.72 + 0.41 d79d8f3 @TimCabbage
GTX 1660 6 GB / GDDR5 / 192 bit 148.91 ± 0.01 41.35 ± 0.02 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 282.65 ± 0.15 38.04 ± 0.02 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 714.44 ± 2.04 37.82 ± 0.02 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 991.31 ± 1.15 33.58 ± 0.14 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 514.53 ± 3.06 33.29 ± 0.00 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 406.94 ± 0.25 30.40 ± 0.02 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 416.85 ± 1.75 27.79 ± 0.02 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 79.44 ± 0.01 27.82 ± 0.18 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 309.30 ± 0.05 23.63 ± 0.00 baa9255 @TinyServal
Quadro P1000 4 GB / GDDR5 / 128 bit 183.40 ± 0.11 13.99 ± 0.13 1e74897 @aleksyx
Tesla K80 12 GB / GDDR5 / 384 bit 133.14 ± 0.55 13.80 ± 0.02 32732f2 @pebaryan

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14970.15 ± 381.06 300.40 ± 0.28 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 16618.98 ± 20.66 281.11 ± 0.41 5143fa8 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 11263.29 ± 98.34 280.74 ± 1.17 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 5285.96 ± 6.58 200.90 ± 0.12 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 12506.97 ± 11.51 191.57 ± 0.03 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 14770.63 ± 102.93 188.96 ± 0.05 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 9487.70 ± 21.89 184.68 ± 0.05 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 8419.56 ± 35.50 182.43 ± 0.09 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 10576.85 ± 530.21 179.47 ± 0.32 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6924.01 ± 10.76 172.26 ± 1.31 9c35706 @slaren
RTX PRO 4500 Blackwell 32 GB / GDDR7 / 256 bit 7251.66 ± 92.40 168.90 ± 0.20 becc481 @Hedede
RTX 3090 24 GB / GDDR6X / 384 bit 5560.06 ± 16.28 161.89 ± 0.18 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 10097.64 ± 671.22 153.76 ± 0.12 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 9439.01 ± 56.75 147.48 ± 1.41 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 9205.93 ± 22.31 143.47 ± 0.02 20638e4 @Ristovski
RTX A6000 48 GB / GDDR6 / 384 bit 5662.39 ± 13.87 144.87 ± 0.18 4795c91 @Hedede
RTX 3080 10 GB / GDDR6X / 320 bit 5569.56 ± 14.04 139.95 ± 0.95 9c35706 @slaren
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 5674.44 ± 139.53 136.38 ± 0.13 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4552.15 ± 9.68 135.83 ± 0.11 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 2973.78 ± 3.62 134.76 ± 0.02 51f5a45 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 7612.32 ± 37.35 132.85 ± 0.31 9c35706 @Ristovski
A30 24 GB / HBM2e / 3072 bit 3068.72 ± 0.63 131.93 ± 0.18 583cb83 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5783.44 ± 36.95 128.21 ± 2.52 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 5256.38 ± 19.39 126.24 ± 0.06 3470a5c @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2481.25 ± 1.31 112.17 ± 0.01 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 3107.61 ± 4.34 109.17 ± 0.07 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 3053.96 ± 1.37 104.38 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 3052.35 ± 5.64 103.63 ± 0.02 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 3453.10 ± 49.19 103.00 ± 0.25 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 4195.53 ± 1.98 93.46 ± 0.01 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2293.29 ± 5.91 87.71 ± 0.29 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2807.83 ± 52.44 85.17 ± 0.66 65349f2 @TinyServal
RTX 3060 12 GB / GDDR6 / 192 bit 2407.67 ± 3.73 76.92 ± 0.03 baa9255 @QuantiusBenignus
Titan Xp 12 GB / GDDR5X / 384 bit 1218.12 ± 1.82 73.84 ± 0.04 c4510dc @Hedede
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1662.80 ± 2.04 67.62 ± 0.67 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3803.45 ± 70.80 64.03 ± 0.53 89d1029 @mike-llamacpp
Tesla P100 16 GB / HBM2 / 4096 bit 787.36 ± 3.27 61.99 ± 0.00 b8372ee @Hedede
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1138.14 ± 2.02 61.38 ± 0.03 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 3171.86 ± 4.34 61.37 ± 0.01 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1563.77 ± 0.51 61.13 ± 0.05 5c0eb5e @ggerganov
DGX Spark 128 GB / LPDDR5x 3661.37 ± 38.66 56.74 ± 0.03 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1079.66 ± 0.18 53.73 ± 0.05 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 2250.14 ± 5.91 50.71 ± 0.01 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1309.73 ± 1.02 44.03 ± 0.57 d32e03f @pt13762104
GTX 1660 6 GB / GDDR5 / 192 bit 154.45 ± 0.52 41.43 ± 0.01 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 290.17 ± 0.11 39.98 ± 0.01 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 790.52 ± 2.39 37.87 ± 0.00 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 1171.96 ± 4.70 35.88 ± 0.18 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 529.53 ± 2.12 33.12 ± 0.03 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 438.49 ± 0.38 30.64 ± 0.06 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 446.19 ± 0.81 28.18 ± 0.01 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 27.46 ± 0.23 27.46 ± 0.23 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 311.55 ± 0.19 23.76 ± 0.01 baa9255 @TinyServal
Tesla K80 12 GB / GDDR5 / 384 bit 133.36 ± 0.60 14.27 ± 0.32 32732f2 @pebaryan
Quadro P1000 4 GB / GDDR5 / 128 bit 173.82 ± 0.02 13.65 ± 0.14 1e74897 @aleksyx

Apple Silicon as a Reference Baseline

Discussion #4167 is useful because it established a more unified benchmark format early on. Besides Q4_0, it also includes F16 and Q8_0, which helps explain PP / TG / t/s. The thread explicitly defines:

  • PP = prompt processing
  • TG = ext-generation
  • /s = okens per second A representative example is the M2 Ultra time-series comparison:
    Time Device Version / Note Bandwidth GB/s GPU Cores F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG
    2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27
    2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80
    2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41
    Representative Apple Silicon entries shown in the thread:
    Device Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG
    —: —: —: —: —: —:
    M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75
    M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02
    M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27

ROCm / HIP Scoreboards

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11476.40 ± 72.79 232.92 ± 0.53 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3552.27 ± 101.96 167.11 ± 0.50 2f0c2db @Diablo-D3
Instinct MI210 64 GB / HBM2e / 4096 bit 2486.22 ± 9.58 124.51 ± 0.04 8160b38 @65a
Pro W7900 48 GB / GDDR6 / 384 bit 3213.17 ± 80.47 121.18 ± 0.06 8160b38 @65a
RX 7900 XT 20 GB / GDDR6 / 320 bit 3098.38 ± 24.02 116.15 ± 0.06 1e15bfd @AdamNiederer
RX 9070 16 GB / GDDR6 / 256 bit 2381.77 ± 3.68 114.48 ± 0.60 d0660f2 @andj1210
Instinct MI100 32 GB / HBM2 / 4096 bit 2732.83 ± 1.98 110.48 ± 0.14 9c35706 @firefox42
RX 9070 XT 16 GB / GDDR6 / 256 bit 5055.19 ± 109.58 101.27 ± 0.27 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2151.81 + 17.94 100.94 + 0.10 00131d6 @olegshulyakov
Instinct MI50 32 GB / HBM2 / 4096 bit 1057.24 ± 0.53 98.95 ± 0.25 97d5117 @wtarreau
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1456.98 ± 12.39 96.07 ± 0.10 6fa3b55 @MihaiBojescu
AI PRO R9700 32 GB / GDDR6 / 256 bit 4443.54 ± 339.25 93.84 ± 0.26 bd4ef13 @gogich77
Instinct MI60 32 GB / HBM2 / 4096 bit 1289.11 ± 0.62 91.46 ± 0.13 504af20 @Said-Akbar
RX 6900 XT 16 GB / GDDR6 / 256 bit 1889.84 ± 31.21 88.49 ± 0.00 a972fae @notgood
Pro VII 16 GB / HBM2 / 4096 bit 1064.99 ± 1.18 87.45 ± 0.04 2739a71 @8XXD8
RX 6800 XT 16 GB / GDDR6 / 256 bit 1447.07 ± 1.36 83.92 ± 0.03 79c1160 @MrLavender
Pro V620 32 GB / GDDR6 / 256 bit 1803.65 ± 2.54 74.66 ± 0.01 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1419.67 ± 3.64 67.58 ± 0.24 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 354.17 ± 0.18 67.55 ± 0.04 c05e8c9 @daniandtheweb
Instinct MI25 16 GB / HBM2 / 2048 bit 409.83 ± 0.23 63.94 ± 0.06 2739a71 @8XXD8
AI Max+ 395 128 GB / LPDDR5 911.36 ± 1.79 50.01 ± 0.07 e60f241 @firefox42
RX 7600 XT 16 GB / GDDR6 / 128 bit 1099.64 ± 2.05 48.58 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 240.68 ± 0.09 48.46 ± 0.09 ec428b0 @davispuh
Radeon 8060S System Shared / DDR5 351.36 ± 0.67 47.97 ± 0.33 1d0125b @hspak
Radeon 880M System Shared / DDR5 163.25 ± 13.86 12.97 ± 1.63 c55d53a @Hedede

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11945.97 ± 54.29 218.53 ± 0.09 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3874.25 ± 11.92 170.12 ± 0.56 2f0c2db @Diablo-D3
Pro W7900 48 GB / GDDR6 / 384 bit 3472.86 ± 52.86 127.43 ± 0.12 8160b38 @65a
Instinct MI210 64 GB / HBM2e / 4096 bit 2571.82 ± 2.89 130.18 ± 0.06 8160b38 @65a
RX 9070 16 GB / GDDR6 / 256 bit 2452.68 ± 1.33 115.32 ± 0.52 d0660f2 @andj1210
RX 7900 XT 20 GB / GDDR6 / 320 bit 3261.75 ± 9.09 112.30 ± 0.06 1e15bfd @AdamNiederer
Instinct MI50 32 GB / HBM2 / 4096 bit 1129.43 ± 0.15 105.82 ± 0.07 97d5117 @wtarreau
Instinct MI100 32 GB / HBM2 / 4096 bit 2755.00 ± 3.68 104.71 ± 0.10 9c35706 @firefox42
AI PRO R9700 32 GB / GDDR6 / 256 bit 4773.07 ± 49.30 97.98 ± 0.13 bd4ef13 @gogich77
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1598.79 ± 11.48 97.53 ± 0.06 6fa3b55 @MihaiBojescu
RX 9070 XT 16 GB / GDDR6 / 256 bit 4903.51 ± 96.36 97.28 ± 0.13 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2304.63 + 2.85 95.99 + 0.21 00131d6 @olegshulyakov
RX 6900 XT 16 GB / GDDR6 / 256 bit 1948.31 ± 13.51 85.04 ± 0.02 a972fae @notgood
Pro V620 32 GB / GDDR6 / 256 bit 1256.86 ± 0.55 70.83 ± 0.02 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1479.27 ± 0.71 65.42 ± 0.19 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 314.17 ± 0.29 62.02 ± 0.05 c05e8c9 @daniandtheweb
AI Max+ 395 128 GB / LPDDR5 1003.53 ± 2.91 49.87 ± 0.02 e60f241 @firefox42
Radeon 8060S System Shared / DDR5 366.08 ± 1.44 48.97 ± 0.15 1d0125b @hspak
RX 7600 XT 16 GB / GDDR6 / 128 bit 1199.16 ± 1.07 47.65 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 153.17 ± 0.72 42.46 ± 0.40 ec428b0 @davispuh
Radeon 880M System Shared / DDR5 213.31 ± 14.05 16.16 ± 1.41 c55d53a @Hedede

Vulkan Scoreboards

Llama 2 7B, Q4_0, no FA

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 10381.64 ± 508.84 263.63 ± 0.91 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3531.93 ± 31.74 191.28 ± 0.20 2f0c2db
Nvidia RTX 4090 9452.03 ± 187.70 187.97 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 5080 7444.99 ± 20.11 185.10 ± 0.54 f6b533d coopmat2
Nvidia A100 6389.86 ± 4.83 160.78 ± 0.16 2257758 coopmat2
Nvidia RTX 3090 4298.97 ± 10.59 160.13 ± 0.25 4ae88d0 coopmat2
Nvidia RTX 4080 Super 7101.18 ± 269.79 147.13 ± 5.64 81086cd coopmat2
Nvidia RTX 3080 4287.11 ± 55.50 139.15 ± 0.05 7c7d6ce coopmat2
Nvidia RTX A5000 3641.55 ± 9.05 139.89 ± 0.69 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 5036.04 ± 88.16 137.11 ± 0.02 e9fd8dc
Nvidia RTX 5070 Ti 6213.63 ± 27.72 135.63 ± 0.18 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4036.04 ± 34.58 130.19 ± 0.39 3191462
Nvidia Tesla V100 1391.39 ± 1.19 129.58 ± 0.58 7d77f07
Nvidia RTX 4070 Ti Super 6099.18 ± 154.30 129.45 ± 0.18 4ae88d0 coopmat2
AMD Radeon RX 7900 XT 2941.58 ± 17.17 123.18 ± 0.40 71e74a3
AMD Radeon RX 9070 3164.10 ± 66.84 119.71 ± 3.40 21c17b5
AMD Radeon RX 7800 XT 2017.33 ± 19.30 118.27 ± 0.27 4fdbc1e
AMD Radeon RX 7900 GRE 2336.31 ± 7.52 116.11 ± 0.26 4b2a477
Apple M3 Ultra 1116.83 ± 0.55 115.54 ± 0.78 2d451c8 MoltenVK
Intel Arc Pro B70 3379.00 ± 47.92 112.02 ± 1.08 b863507
Nvidia Titan V 984.36 ± 4.13 108.86 ± 0.28 e56abd2
AMD Radeon Pro VII 1078.54 ± 0.86 107.82 ± 0.14 N/A
AMD Radeon RX 6900 XT 1837.21 ± 25.44 104.60 ± 0.30 a972fae
Intel Arc Pro A60 2261.11 ± 9.53 104.25 ± 0.07 97d5117
AMD Radeon RX 6800 XT 1752.92 ± 1.71 100.32 ± 0.97 N/A
AMD Radeon VII 1059.14 ± 0.56 101.19 ± 0.53 77d6ae4
Nvidia RTX 2080 Ti 1888.24 ± 9.20 97.58 ± 6.60 N/A
AMD Radeon RX 6800 1698.69 ± 0.80 95.61 ± 0.19 4b385bf
AMD Radeon Pro W6800X Duo 687.71 ± 4.33 94.82 ± 0.12 N/A
Nvidia RTX 5060 Ti 3460.92 ± 7.16 93.51 ± 0.15 89f10ba coopmat2
Nvidia RTX 4070 3179.37 ± 46.16 92.29 ± 0.28 9a48399
AMD Radeon Pro W6800X 510.80 ± 0.13 86.47 ± 0.46 13b4548 MoltenVK
AMD Radeon RX 6700 XT 1051.20 ± 0.98 83.88 ± 0.08 6d75883
AMD Radeon RX 6750 XT 1040.58 ± 0.35 81.98 ± 0.03 228f34c
AMD Radeon Pro V620 1595.32 ± 1.59 81.78 ± 0.06 03d4698
Nvidia RTX 3070 2113.02 ± 7.38 78.71 ± 0.13 1b8fb81
AMD Radeon Instinct MI60 369.26 ± 2.48 78.16 ± 1.40 504af20
Nvidia RTX 3060 1815.70 ± 5.85 75.94 ± 0.80 92c0b38 coopmat2
Apple M4 Max 724.77 ± 20.93 75.02 ± 0.14 1ece0cb6
Nvidia Tesla T10 1692.70 ± 2.05 75.01 ± 0.21 7f76692 coopmat2
Nvidia RTX A4000 2248.14 ± 7.59 73.74 ± 0.08 f5245b5 coopmat2
AMD Radeon RX 5700 XT 529.69 ± 0.26 70.73 ± 0.04 4fdbc1e
AMD Radeon RX 9060 XT 2141.67 ± 6.87 70.54 ± 0.74 ed52f36
Intel Arc B580 620.94 ± 15.33 70.14 ± 0.28 7f76692
AMD Radeon Pro V540 583.88 ± 6.56 69.64 ± 0.24 9da3dcd
AMD Radeon Pro W5700 449.85 ± 0.46 68.55 ± 0.15 23bc779
Intel Arc Pro B60 522.36 ± 3.60 68.55 ± 0.01 516a4ca
Nvidia GTX 1080 Ti 540.69 ± 0.71 64.99 ± 0.08 360d653
Nvidia RTX 2070 Super 1199.13 ± 7.70 64.64 ± 0.20 b7552cf
Nvidia RTX 3070 Mobile 1689.40 ± 19.57 63.64 ± 0.39 ceff6bb coopmat2
Nvidia Tesla P100 678.14 ± 1.40 63.16 ± 0.06 eec1e33
AMD BC-250 370.66 ± 0.04 62.32 ± 0.32 5886f4f
AMD Radeon RX 6650 XT 1029.52 ± 1.21 62.14 ± 0.02 dbb852b
Nvidia RTX 4060 Mobile 2135.66 ± 23.18 59.53 ± 0.03 a5c07dc coopmat2
Nvidia Tesla P40 488.06 ± 0.27 59.36 ± 0.16 N/A
Nvidia GTX 1660 Ti Mobile 511.67 ± 2.85 56.60 ± 0.07 b43556e
AMD Radeon Instinct MI25 439.42 ± 0.34 54.69 ± 0.03 2739a71
AMD Radeon RX 6600 XT 574.65 ± 0.86 53.92 ± 0.11 091592d
AMD Ryzen AI Max+ 395 1288.96 ± 6.49 53.59 ± 0.38 7f76692
AMD Radeon RX 7600 XT 840.85 ± 3.02 53.02 ± 0.01 01d8eaa
Intel Arc A770 1073.85 + 29.68 52.56 + 0.11 a69d54f
Nvidia GB10 2737.79 ± 19.56 52.28 ± 0.03 b9da444 coopmat2
AMD FirePro S9300 x2 247.26 ± 0.43 51.86 ± 0.11 eec1e33 Split across two GPUs
AMD Radeon RX 6600 761.89 ± 1.76 50.63 ± 0.02 b1c70e2
AMD Radeon RX Vega 56 439.87 ± 0.61 50.23 ± 0.14 92c0b38
Intel Arc B570 913.95 ± 0.90 49.64 ± 0.03 7f76692
Nvidia RTX 3060 Mobile 1059.76 ± 3.54 49.03 ± 0.13 dbb3a47
AMD Radeon RX 6800M 861.99 ± 7.67 48.71 ± 0.71 8e6f8bc
AMD Radeon RX 6600M 605.59 ± 0.65 48.21 ± 0.07 fe5b78c
Intel Arc A770M 875.92 ± 2.16 47.69 ± 0.16 eeee367
Nvidia P104-100 311.90 ± 0.22 46.18 ± 0.05 eec1e33
AMD Radeon RX Vega 64 356.08 ± 0.09 45.73 ± 0.18 ec428b0
Nvidia RTX A2000 1245.19 ± 8.76 45.52 ± 0.54 b1afcab coopmat2
AMD Radeon RX 7600M XT 459.39 ± 2.34 45.28 ± 0.10 b9ab0a4 eGPU
AMD Radeon Pro V340 375.41 ± 0.24 45.16 ± 0.06 9da3dcd Split across two GPUs
Nvidia GTX 1070 Ti 297.50 ± 0.54 42.86 ± 1.20 860a9e4 eGPU
Intel Arc A750 1075.94 ± 13.89 42.66 ± 0.18 c1b1876
Nvidia RTX 4050 Mobile 1154.28 + 15.76 41.89 + 0.10 d79d8f3
Nvidia GTX 1070 321.57 ± 0.93 41.48 ± 0.09 eec1e33
Intel Arc Pro B50 193.50 ± 0.24 39.99 ± 0.10 7b43f55
Nvidia Tesla M40 92.48 ± 0.02 39.35 ± 1.22 b8372ee
AMD Radeon RX 580 258.03 ± 0.71 39.32 ± 0.03 de4c07f
AMD Radeon RX 470 218.07 ± 0.56 38.63 ± 0.21 e288693
AMD Radeon Pro W5500 315.39 ± 3.76 36.82 ± 0.38 860a9e4
AMD Radeon RX 480 248.66 ± 0.28 34.71 ± 0.14 3b15924
Apple M2 Ultra 205.98 ± 0.02 34.34 ± 0.12 dbb852b Asahi Linux
Nvidia GTX 980 186.24 ± 0.09 33.90 ± 0.51 860a9e4
Nvidia P106-100 183.78 ± 0.26 29.77 ± 0.04 23bc779
AMD FirePro W8100 155.22 ± 0.17 29.52 ± 0.05 4536363
Nvidia Tesla P4 265.54 ± 0.21 28.03 ± 0.14 24d2ee0
AMD Radeon RX 6500 XT 255.25 ± 0.35 27.81 ± 0.10 g9fdfcd
Apple M3 263.70 ± 0.02 26.39 ± 0.14 b9ab0a4 MoltenVK
AMD FirePro S10000 94.78 ± 0.02 25.32 ± 0.02 914a82d Split across two GPUs
Nvidia Quadro P2000 169.55 ± 0.17 23.05 ± 0.03 63f8fe0
Intel Core Ultra 200 Series 544.95 ± 4.15 22.49 ± 0.09 cea560f
AMD Ryzen AI 9 300 Series 479.07 ± 0.41 22.41 ± 0.18 N/A
AMD Ryzen 6000 Series 240.89 ± 0.52 21.26 ± 0.08 ee09828
Apple M2 Pro 62.70 ± 0.03 20.95 ± 0.11 1fe0029 Asahi Linux
Nvidia GTX 1050 Ti 136.42 ± 0.67 20.96 ± 0.21 2f0c2db
AMD Ryzen 8000 Series 266.19 ± 1.36 20.53 ± 0.08 a5c07dc
AMD Ryzen 7000 Series 281.62 ± 1.56 19.91 ± 0.07 ebce03e
AMD Ryzen Z1 Extreme 199.36 ± 7.02 18.77 ± 0.02 53ff6b9
AMD FirePro D700 69.95 ± 0.04 16.62 ± 0.01 d3bd719 MoltenVK, running in FP16 mode on FP32 only chip
AMD Radeon Pro WX 4100 78.79 ± 0.10 16.05 ± 0.07 860a9e4
Apple M2 50.79 ± 0.16 13.50 ± 0.02 8c0d6bb Asahi Linux
Apple M1 38.29 ± 0.00 12.47 ± 0.03 2370665 Asahi Linux
AMD Ryzen 5000 Series 90.55 ± 0.08 10.98 ± 0.07 d84635b
Intel Core 1100 Series 187.20 ± 1.78 10.39 ± 0.04 abb9f3c
AMD Radeon RX 550 52.66 ± 0.49 10.20 ± 0.01 N/A
AMD Ryzen 4000 Series 103.87 ± 0.02 9.63 ± 0.01 4b385bf
Nvidia Tesla K80 89.46 ± 0.10 9.39 ± 0.06 5d46bab Running on single GPU
Nvidia Tesla K40 64.37 ± 0.09 9.30 ± 0.19 eec1e33
MediaTek Dimensity 9400 38.36 ± 15.15 8.92 ± 0.06 b9ab0a4 GPU supports coopmat but pp512 is faster with it turned off
Intel Core Ultra 100 Series 185.51 ± 0.22 8.21 ± 0.07 1d72c84
AMD Ryzen 3000 Series 48.63 ± 0.10 8.49 ± 0.01 1fe0029
CIX CD8180 2.80 ± 0.01 5.51 ± 0.00 4dca015
Intel Core 1000 Series 25.58 ± 0.00 4.25 ± 0.18 N/A
Intel Core 8000 Series 25.43 ± 0.17 3.35 ± 0.03 c4df49a
Intel N150 28.84 ± 0.02 2.93 ± 0.00 4f63cd7

Llama 2 7B, Q4_0, FA enabled

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 11796.38 ± 601.36 273.68 ± 0.52 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3332.90 ± 11.47 195.30 ± 0.23 2f0c2db
Nvidia RTX 5080 8054.59 ± 35.68 192.17 ± 0.21 f6b533d coopmat2
Nvidia RTX 4090 10830.41 ± 36.25 190.10 ± 0.31 4ae88d0 coopmat2
Nvidia A100 7064.40 ± 1.63 170.56 ± 0.02 2257758 coopmat2
Nvidia RTX 3090 4732.33 ± 4.80 162.28 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 4080 Super 8007.37 ± 46.03 150.20 ± 0.26 81086cd coopmat2
Nvidia RTX 3080 4913.83 ± 21.52 145.74 ± 0.16 7c7d6ce coopmat2
Nvidia Tesla V100 1411.25 ± 2.12 142.13 ± 0.03 7d77f07
Nvidia RTX A5000 4071.22 ± 13.13 140.43 ± 0.22 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 4911.74 ± 28.52 138.20 ± 0.18 e9fd8dc
Nvidia RTX 5070 Ti 6764.53 ± 11.95 135.65 ± 0.02 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4333.83 ± 29.36 130.90 ± 0.12 3191462
AMD Radeon RX 7900 XT 3043.93 ± 10.42 124.20 ± 0.09 71e74a3
AMD Radeon RX 7800 XT 2094.64 ± 14.38 119.63 ± 0.13 4fdbc1e
AMD Radeon RX 9070 3277.24 ± 18.17 119.55 ± 0.06 21c17b5
AMD Radeon RX 7900 GRE 2402.07 ± 22.50 116.77 ± 0.08 4b2a477
Apple M3 Ultra 1115.55 ± 0.75 115.99 ± 0.12 2d451c8 MoltenVK
Intel Arc Pro B70 3314.53 ± 17.95 111.63 ± 0.05 b863507
Nvidia Titan V 792.74 ± 4.30 109.21 ± 0.72 e56abd2
AMD Radeon Pro VII 783.94 ± 0.77 108.45 ± 0.48 N/A
AMD Radeon RX 6900 XT 1761.93 ± 4.75 106.15 ± 0.04 a972fae
Nvidia RTX 2080 Ti 1936.25 ± 32.08 100.99 ± 0.24 N/A
AMD Radeon RX 6800 XT 1704.79 ± 0.71 100.50 ± 0.06 N/A
AMD Radeon Pro W6800X Duo 795.28 ± 0.72 100.08 ± 0.02 N/A
Nvidia RTX 5060 Ti 3912.65 ± 5.86 97.01 ± 0.14 89f10ba coopmat2
AMD Radeon RX 6800 1749.46 ± 3.36 96.65 ± 0.48 4b385bf
Nvidia RTX 4070 4293.57 ± 27.70 91.49 ± 0.89 9a48399 coopmat2
AMD Radeon RX 6750 XT 997.05 ± 0.45 82.29 ± 0.06 228f34c
AMD Radeon RX 6700 XT 1010.90 ± 12.89 81.86 ± 0.19 6d75883
Nvidia RTX 3060 2012.88 ± 10.12 80.59 ± 0.02 92c0b38 coopmat2
AMD Radeon Pro V620 1556.31 ± 2.82 79.24 ± 0.09 03d4698
Nvidia RTX A4000 2482.74 ± 26.05 76.07 ± 0.08 f5245b5 coopmat2
Nvidia Tesla T10 1840.14 ± 1.22 76.05 ± 0.13 7f76692 coopmat2
AMD Radeon RX 5700 XT 538.31 ± 0.35 74.43 ± 0.03 4fdbc1e
Intel Arc B580 419.49 ± 3.37 72.00 ± 0.24 7f76692
Apple M4 Max 557.46 ± 26.87 71.79 ± 4.16 1ece0cb6
AMD Radeon Pro W5700 446.98 ± 0.39 71.30 ± 0.24 23bc779
Intel Arc Pro B60 274.76 ± 0.27 70.54 ± 0.03 516a4ca
AMD Radeon RX 9060 XT 1915.41 ± 7.90 70.52 ± 0.16 ed52f36
Nvidia Tesla P100 685.51 ± 0.88 66.48 ± 0.02 eec1e33
AMD Radeon RX 6650 XT 1088.90 ± 0.40 64.53 ± 0.75 dbb852b
Nvidia GTX 1080 Ti 529.96 ± 0.38 64.63 ± 0.10 360d653
AMD BC-250 356.87 ± 1.24 63.14 ± 0.09 5886f4f
Nvidia RTX 3070 Mobile 1832.07 ± 57.14 62.92 ± 0.37 ceff6bb coopmat2
Nvidia RTX 4060 Mobile 2358.03 ± 12.17 60.01 ± 0.08 a5c07dc coopmat2
Nvidia Tesla P40 484.37 ± 0.27 59.22 ± 0.15 N/A
Nvidia GTX 1660 Ti Mobile 514.34 ± 0.88 57.30 ± 0.42 b43556e
AMD Radeon RX 7600 XT 1024.38 ± 7.56 56.11 ± 0.02 01d8eaa
AMD FirePro S9300 x2 243.33 ± 0.22 55.64 ± 0.06 eec1e33 Split across two GPUs
Nvidia GB10 3279.89 ± 26.78 53.64 ± 0.05 b9da444 coopmat2
AMD Radeon RX 6600 808.76 ± 0.15 53.24 ± 0.03 b1c70e2
Intel Arc A770 1119.68 + 30.25 53.07 + 0.09 a69d54f
AMD Ryzen AI Max+ 395 1357.07 ± 10.94 53.00 ± 0.13 7f76692
AMD Radeon RX Vega 56 428.54 ± 0.50 52.66 ± 0.03 92c0b38
Intel Arc B570 288.51 ± 0.09 50.49 ± 0.05 7f76692
Nvidia P104-100 325.30 ± 0.25 48.64 ± 0.04 eec1e33
AMD Radeon Pro V340 360.23 ± 0.74 47.54 ± 0.06 9da3dcd Split across two GPUs
AMD Radeon RX 6800M 784.16 ± 2.76 49.06 ± 0.34 8e6f8bc
AMD Radeon RX Vega 64 320.12 ± 0.22 47.06 ± 0.01 ec428b0
Nvidia RTX A2000 1361.85 ± 3.26 45.69 ± 0.20 b1afcab coopmat2
Intel Arc A770M 384.74 ± 0.78 45.68 ± 0.06 eeee367
Intel Arc A750 303.37 ± 1.44 43.96 ± 0.03 c1b1876
Nvidia GTX 1070 Ti 292.85 ± 0.23 43.42 ± 0.34 860a9e4 eGPU
Nvidia GTX 1070 330.84 ± 1.02 43.33 ± 0.06 360d653
Nvidia Tesla M40 93.35 ± 0.01 41.68 ± 0.01 b8372ee
Intel Arc Pro B50 132.48 ± 0.04 41.02 ± 0.04 7b43f55
AMD Radeon RX 470 197.26 ± 0.27 37.28 ± 0.11 3769fe6
AMD Radeon RX 480 194.52 ± 0.61 37.23 ± 0.09 0bcb40b
Apple M2 Ultra 198.83 ± 0.85 198.83 ± 0.85 dbb852b Asahi Linux
Nvidia GTX 980 180.97 ± 0.74 34.16 ± 0.10 860a9e4
Nvidia P106-100 183.40 ± 0.34 30.79 ± 0.32 23bc779
AMD FirePro W8100 140.52 ± 0.34 29.28 ± 0.14 4536363
Nvidia Tesla P4 287.14 ± 0.29 28.37 ± 0.24 24d2ee0
Nvidia Quadro P2000 181.71 ± 0.12 23.77 ± 0.02 63f8fe0
Intel Core Ultra 200 Series 536.48 ± 1.27 23.05 ± 0.04 cea560f
AMD Ryzen AI 9 300 Series 532.59 ± 3.55 22.31 ± 0.06 N/A
AMD Ryzen 6000 Series 277.91 ± 0.37 21.15 ± 0.09 ee09828
Apple M2 Pro 58.86 ± 0.02 20.97 ± 0.03 1fe0029 Asahi Linux
AMD Ryzen 8000 Series 297.39 ± 1.22 20.59 ± 0.38 a5c07dc
AMD Ryzen 7000 Series 312.85 ± 2.51 20.09 ± 0.35 835b2b9
Nvidia GTX 1050 Ti 127.54 ± 1.03 20.08 ± 0.17 2f0c2db
AMD Radeon Pro WX 4100 75.59 ± 0.19 16.56 ± 0.04 860a9e4
Apple M1 35.93 ± 0.00 12.85 ± 0.02 2370665 Asahi Linux
Apple M2 46.81 ± 0.08 12.25 ± 2.30 8c0d6bb Asahi Linux
AMD Ryzen 5000 Series 79.06 ± 0.01 10.75 ± 0.00 5d195f1
Intel Core 1100 Series 174.77 ± 4.47 10.58 ± 0.03 abb9f3c
Nvidia Tesla K40 64.37 ± 0.02 9.92 ± 0.06 eec1e33
AMD Ryzen 4000 Series 113.32 ± 0.01 9.87 ± 0.01 4b385bf
Nvidia Tesla K80 88.26 ± 0.19 9.49 ± 0.01 5d46bab Running on single GPU
AMD Ryzen 5 3000 Series 47.41 ± 0.14 8.47 ± 0.01 1fe0029
Intel Core Ultra 100 Series 77.66 ± 2.75 7.75 ± 0.05 2e89f76
Intel Core 8000 Series 25.55 ± 0.04 3.35 ± 0.02 c4df49a
Intel N150 25.59 ± 0.00 2.91 ± 0.00 4f63cd7

How to Use These Tables

  1. Decide whether you care more about g128 or pp512. For chat and interactive use, g128 usually matters more. For long prompts and batch throughput, pp512 matters more.
  2. Match the backend you actually use. Nvidia users should usually prioritize CUDA. AMD users should compare ROCm and Vulkan first. Cross-platform users should pay close attention to Vulkan.
  3. Check FA last. On many GPUs, enabling FA improves pp512 more than g128, so a single headline number can be misleading.

One-Sentence Summary

In llama.cpp benchmarks, pp512, g128, Q4_0, FA, and CUDA / ROCm / Vulkan describe different dimensions. Once the benchmark context is clear, the tables become much easier to read.

Sources

记录并分享
Built with Hugo
Theme Stack designed by Jimmy