Understanding the Metrics First
What is Q4_0
Q4_0 is a 4-bit quantization format. It does not mean the model is stronger. It means the model is smaller, uses less VRAM, and fits on more devices. Most of these scoreboards standardize on Llama 2 7B, Q4_0 so that GPU-to-GPU comparisons are easier.
What is pp512
pp512 usually means prompt processing 512 tokens, which is the throughput while processing 512 input tokens.
- pp = prompt processing
- 512 = input length is 512 tokens
- /s = tokens per second This is closer to prompt-ingestion speed, so it is often much higher than generation speed.
What is g128
g128 usually means ext generation 128 tokens, which is the speed while generating 128 tokens continuously.
- g = text generation
- 128 = generate 128 tokens continuously
- /s = tokens per second This is usually closer to the speed users actually feel in interactive usage.
What is FA
FA stands for Flash Attention.
- with FA means Flash Attention is enabled
o FA means Flash Attention is disabled On many GPUs, FA improves pp512 more clearly than g128, but the gain is not identical across backends, drivers, and GPU architectures.
How to read /s
/s means okens per second. When reading these scoreboards, the key rule is to compare the same type of test with the same settings.
- Do not compare pp512 and g128 as if they were the same thing
- Do not mix o FA and with FA
- Do not assume CUDA, ROCm, and Vulkan are directly interchangeable
Quick Takeaways
- CUDA is still the strongest overall path in llama.cpp GPU benchmarks, especially on high-end Nvidia GPUs.
- ROCm is already delivering strong results on high-end AMD GPUs and Instinct accelerators.
- Vulkan has the broadest hardware coverage, including Nvidia, AMD, Intel, older GPUs, and some Apple / Asahi setups.
- g128 is closer to everyday perceived speed, while pp512 is better for judging prompt throughput.
CUDA Scoreboards
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14073.41 ± 115.16 | 290.02 ± 1.10 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 14854.63 ± 22.73 | 274.20 ± 0.14 | 79c1160 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 9918.34 ± 176.97 | 267.81 ± 1.54 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 4849.53 ± 8.94 | 190.88 ± 0.33 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 10293.86 ± 134.72 | 189.33 ± 0.19 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 11992.70 ± 107.99 | 186.21 ± 0.13 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 8297.36 ± 9.50 | 181.99 ± 0.42 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 6952.38 ± 13.73 | 176.85 ± 0.07 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 9229.23 ± 101.78 | 176.07 ± 0.26 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6567.49 ± 20.30 | 171.19 ± 3.98 | 9c35706 | @slaren |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5174.69 ± 21.83 | 158.16 ± 0.21 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 8870.49 ± 378.76 | 152.01 ± 0.28 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 8125.15 ± 41.05 | 148.33 ± 0.20 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 8031.64 ± 26.49 | 142.49 ± 0.16 | 20638e4 | @Ristovski |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5013.86 ± 24.80 | 139.65 ± 0.99 | 9c35706 | @slaren |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 4913.93 ± 6.79 | 138.73 ± 2.75 | 4795c91 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 6924.53 ± 13.87 | 132.26 ± 0.16 | 9c35706 | @Ristovski |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 4992.83 ± 113.52 | 131.66 ± 0.20 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4028.16 ± 19.14 | 130.07 ± 2.74 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 3042.64 ± 40.71 | 129.08 ± 0.05 | 51f5a45 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5184.75 ± 18.70 | 127.54 ± 0.46 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 4609.01 ± 10.67 | 124.11 ± 0.17 | 3470a5c | @Hedede |
| A30 | 24 GB / HBM2e / 3072 bit | 2767.10 ± 1.88 | 124.81 ± 0.16 | 583cb83 | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2617.46 ± 2.10 | 108.79 ± 0.05 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 2890.66 ± 2.42 | 107.51 ± 0.21 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 2751.18 ± 19.43 | 102.77 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 2709.95 ± 3.35 | 102.68 ± 0.03 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 2827.20 ± 66.43 | 97.32 ± 2.80 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 3737.25 ± 6.79 | 90.94 ± 0.02 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2088.34 ± 1.94 | 88.06 ± 0.28 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2684.06 ± 15.28 | 83.77 ± 0.37 | 65349f2 | @TinyServal |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1154.96 ± 1.46 | 76.08 ± 0.08 | c4510dc | @Hedede |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2137.50 ± 10.12 | 75.57 ± 0.07 | baa9255 | @QuantiusBenignus |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1536.89 ± 0.90 | 65.62 ± 0.62 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3394.63 ± 7.44 | 63.86 ± 0.01 | 89d1029 | @mike-llamacpp |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1084.41 ± 3.01 | 62.49 ± 0.06 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 2779.77 ± 9.91 | 61.83 ± 0.04 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1420.24 ± 1.95 | 60.04 ± 0.01 | 5c0eb5e | @ggerganov |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 760.80 ± 2.92 | 58.35 ± 0.00 | b8372ee | @Hedede |
| DGX Spark | 128 GB / LPDDR5x | 3062.31 ± 11.02 | 57.21 ± 0.06 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1007.42 ± 1.23 | 54.74 ± 0.07 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 1956.22 ± 7.74 | 50.62 ± 0.04 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1219.06 ± 4.18 | 46.38 ± 0.73 | d32e03f | @pt13762104 |
| RTX 4050 Laptop | 6 GB / GDDR6 / 96 bit | 1725.85 + 17.85 | 43.72 + 0.41 | d79d8f3 | @TimCabbage |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 148.91 ± 0.01 | 41.35 ± 0.02 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 282.65 ± 0.15 | 38.04 ± 0.02 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 714.44 ± 2.04 | 37.82 ± 0.02 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 991.31 ± 1.15 | 33.58 ± 0.14 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 514.53 ± 3.06 | 33.29 ± 0.00 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 406.94 ± 0.25 | 30.40 ± 0.02 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 416.85 ± 1.75 | 27.79 ± 0.02 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 79.44 ± 0.01 | 27.82 ± 0.18 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 309.30 ± 0.05 | 23.63 ± 0.00 | baa9255 | @TinyServal |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 183.40 ± 0.11 | 13.99 ± 0.13 | 1e74897 | @aleksyx |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.14 ± 0.55 | 13.80 ± 0.02 | 32732f2 | @pebaryan |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14970.15 ± 381.06 | 300.40 ± 0.28 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 16618.98 ± 20.66 | 281.11 ± 0.41 | 5143fa8 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 11263.29 ± 98.34 | 280.74 ± 1.17 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 5285.96 ± 6.58 | 200.90 ± 0.12 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 12506.97 ± 11.51 | 191.57 ± 0.03 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 14770.63 ± 102.93 | 188.96 ± 0.05 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 9487.70 ± 21.89 | 184.68 ± 0.05 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 8419.56 ± 35.50 | 182.43 ± 0.09 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 10576.85 ± 530.21 | 179.47 ± 0.32 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6924.01 ± 10.76 | 172.26 ± 1.31 | 9c35706 | @slaren |
| RTX PRO 4500 Blackwell | 32 GB / GDDR7 / 256 bit | 7251.66 ± 92.40 | 168.90 ± 0.20 | becc481 | @Hedede |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5560.06 ± 16.28 | 161.89 ± 0.18 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 10097.64 ± 671.22 | 153.76 ± 0.12 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 9439.01 ± 56.75 | 147.48 ± 1.41 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 9205.93 ± 22.31 | 143.47 ± 0.02 | 20638e4 | @Ristovski |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 5662.39 ± 13.87 | 144.87 ± 0.18 | 4795c91 | @Hedede |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5569.56 ± 14.04 | 139.95 ± 0.95 | 9c35706 | @slaren |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 5674.44 ± 139.53 | 136.38 ± 0.13 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4552.15 ± 9.68 | 135.83 ± 0.11 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 2973.78 ± 3.62 | 134.76 ± 0.02 | 51f5a45 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 7612.32 ± 37.35 | 132.85 ± 0.31 | 9c35706 | @Ristovski |
| A30 | 24 GB / HBM2e / 3072 bit | 3068.72 ± 0.63 | 131.93 ± 0.18 | 583cb83 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5783.44 ± 36.95 | 128.21 ± 2.52 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 5256.38 ± 19.39 | 126.24 ± 0.06 | 3470a5c | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2481.25 ± 1.31 | 112.17 ± 0.01 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 3107.61 ± 4.34 | 109.17 ± 0.07 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 3053.96 ± 1.37 | 104.38 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 3052.35 ± 5.64 | 103.63 ± 0.02 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 3453.10 ± 49.19 | 103.00 ± 0.25 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 4195.53 ± 1.98 | 93.46 ± 0.01 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2293.29 ± 5.91 | 87.71 ± 0.29 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2807.83 ± 52.44 | 85.17 ± 0.66 | 65349f2 | @TinyServal |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2407.67 ± 3.73 | 76.92 ± 0.03 | baa9255 | @QuantiusBenignus |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1218.12 ± 1.82 | 73.84 ± 0.04 | c4510dc | @Hedede |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1662.80 ± 2.04 | 67.62 ± 0.67 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3803.45 ± 70.80 | 64.03 ± 0.53 | 89d1029 | @mike-llamacpp |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 787.36 ± 3.27 | 61.99 ± 0.00 | b8372ee | @Hedede |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1138.14 ± 2.02 | 61.38 ± 0.03 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 3171.86 ± 4.34 | 61.37 ± 0.01 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1563.77 ± 0.51 | 61.13 ± 0.05 | 5c0eb5e | @ggerganov |
| DGX Spark | 128 GB / LPDDR5x | 3661.37 ± 38.66 | 56.74 ± 0.03 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1079.66 ± 0.18 | 53.73 ± 0.05 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 2250.14 ± 5.91 | 50.71 ± 0.01 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1309.73 ± 1.02 | 44.03 ± 0.57 | d32e03f | @pt13762104 |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 154.45 ± 0.52 | 41.43 ± 0.01 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 290.17 ± 0.11 | 39.98 ± 0.01 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 790.52 ± 2.39 | 37.87 ± 0.00 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 1171.96 ± 4.70 | 35.88 ± 0.18 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 529.53 ± 2.12 | 33.12 ± 0.03 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 438.49 ± 0.38 | 30.64 ± 0.06 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 446.19 ± 0.81 | 28.18 ± 0.01 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 27.46 ± 0.23 | 27.46 ± 0.23 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 311.55 ± 0.19 | 23.76 ± 0.01 | baa9255 | @TinyServal |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.36 ± 0.60 | 14.27 ± 0.32 | 32732f2 | @pebaryan |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 173.82 ± 0.02 | 13.65 ± 0.14 | 1e74897 | @aleksyx |
Apple Silicon as a Reference Baseline
Discussion #4167 is useful because it established a more unified benchmark format early on. Besides Q4_0, it also includes F16 and Q8_0, which helps explain PP / TG / t/s. The thread explicitly defines:
- PP = prompt processing
- TG = ext-generation
- /s = okens per second
A representative example is the M2 Ultra time-series comparison:
Time Device Version / Note Bandwidth GB/s GPU Cores F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG 2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27 2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80 2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41 Representative Apple Silicon entries shown in the thread: Device Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG — —: —: —: —: —: —: M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75 M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02 M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27
ROCm / HIP Scoreboards
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11476.40 ± 72.79 | 232.92 ± 0.53 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3552.27 ± 101.96 | 167.11 ± 0.50 | 2f0c2db | @Diablo-D3 |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2486.22 ± 9.58 | 124.51 ± 0.04 | 8160b38 | @65a |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3213.17 ± 80.47 | 121.18 ± 0.06 | 8160b38 | @65a |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3098.38 ± 24.02 | 116.15 ± 0.06 | 1e15bfd | @AdamNiederer |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2381.77 ± 3.68 | 114.48 ± 0.60 | d0660f2 | @andj1210 |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2732.83 ± 1.98 | 110.48 ± 0.14 | 9c35706 | @firefox42 |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 5055.19 ± 109.58 | 101.27 ± 0.27 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2151.81 + 17.94 | 100.94 + 0.10 | 00131d6 | @olegshulyakov |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1057.24 ± 0.53 | 98.95 ± 0.25 | 97d5117 | @wtarreau |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1456.98 ± 12.39 | 96.07 ± 0.10 | 6fa3b55 | @MihaiBojescu |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4443.54 ± 339.25 | 93.84 ± 0.26 | bd4ef13 | @gogich77 |
| Instinct MI60 | 32 GB / HBM2 / 4096 bit | 1289.11 ± 0.62 | 91.46 ± 0.13 | 504af20 | @Said-Akbar |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1889.84 ± 31.21 | 88.49 ± 0.00 | a972fae | @notgood |
| Pro VII | 16 GB / HBM2 / 4096 bit | 1064.99 ± 1.18 | 87.45 ± 0.04 | 2739a71 | @8XXD8 |
| RX 6800 XT | 16 GB / GDDR6 / 256 bit | 1447.07 ± 1.36 | 83.92 ± 0.03 | 79c1160 | @MrLavender |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1803.65 ± 2.54 | 74.66 ± 0.01 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1419.67 ± 3.64 | 67.58 ± 0.24 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 354.17 ± 0.18 | 67.55 ± 0.04 | c05e8c9 | @daniandtheweb |
| Instinct MI25 | 16 GB / HBM2 / 2048 bit | 409.83 ± 0.23 | 63.94 ± 0.06 | 2739a71 | @8XXD8 |
| AI Max+ 395 | 128 GB / LPDDR5 | 911.36 ± 1.79 | 50.01 ± 0.07 | e60f241 | @firefox42 |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1099.64 ± 2.05 | 48.58 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 240.68 ± 0.09 | 48.46 ± 0.09 | ec428b0 | @davispuh |
| Radeon 8060S | System Shared / DDR5 | 351.36 ± 0.67 | 47.97 ± 0.33 | 1d0125b | @hspak |
| Radeon 880M | System Shared / DDR5 | 163.25 ± 13.86 | 12.97 ± 1.63 | c55d53a | @Hedede |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11945.97 ± 54.29 | 218.53 ± 0.09 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3874.25 ± 11.92 | 170.12 ± 0.56 | 2f0c2db | @Diablo-D3 |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3472.86 ± 52.86 | 127.43 ± 0.12 | 8160b38 | @65a |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2571.82 ± 2.89 | 130.18 ± 0.06 | 8160b38 | @65a |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2452.68 ± 1.33 | 115.32 ± 0.52 | d0660f2 | @andj1210 |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3261.75 ± 9.09 | 112.30 ± 0.06 | 1e15bfd | @AdamNiederer |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1129.43 ± 0.15 | 105.82 ± 0.07 | 97d5117 | @wtarreau |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2755.00 ± 3.68 | 104.71 ± 0.10 | 9c35706 | @firefox42 |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4773.07 ± 49.30 | 97.98 ± 0.13 | bd4ef13 | @gogich77 |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1598.79 ± 11.48 | 97.53 ± 0.06 | 6fa3b55 | @MihaiBojescu |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 4903.51 ± 96.36 | 97.28 ± 0.13 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2304.63 + 2.85 | 95.99 + 0.21 | 00131d6 | @olegshulyakov |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1948.31 ± 13.51 | 85.04 ± 0.02 | a972fae | @notgood |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1256.86 ± 0.55 | 70.83 ± 0.02 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1479.27 ± 0.71 | 65.42 ± 0.19 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 314.17 ± 0.29 | 62.02 ± 0.05 | c05e8c9 | @daniandtheweb |
| AI Max+ 395 | 128 GB / LPDDR5 | 1003.53 ± 2.91 | 49.87 ± 0.02 | e60f241 | @firefox42 |
| Radeon 8060S | System Shared / DDR5 | 366.08 ± 1.44 | 48.97 ± 0.15 | 1d0125b | @hspak |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1199.16 ± 1.07 | 47.65 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 153.17 ± 0.72 | 42.46 ± 0.40 | ec428b0 | @davispuh |
| Radeon 880M | System Shared / DDR5 | 213.31 ± 14.05 | 16.16 ± 1.41 | c55d53a | @Hedede |
Vulkan Scoreboards
Llama 2 7B, Q4_0, no FA
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 10381.64 ± 508.84 | 263.63 ± 0.91 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3531.93 ± 31.74 | 191.28 ± 0.20 | 2f0c2db | |
| Nvidia RTX 4090 | 9452.03 ± 187.70 | 187.97 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 5080 | 7444.99 ± 20.11 | 185.10 ± 0.54 | f6b533d | coopmat2 |
| Nvidia A100 | 6389.86 ± 4.83 | 160.78 ± 0.16 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4298.97 ± 10.59 | 160.13 ± 0.25 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 7101.18 ± 269.79 | 147.13 ± 5.64 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4287.11 ± 55.50 | 139.15 ± 0.05 | 7c7d6ce | coopmat2 |
| Nvidia RTX A5000 | 3641.55 ± 9.05 | 139.89 ± 0.69 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 5036.04 ± 88.16 | 137.11 ± 0.02 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6213.63 ± 27.72 | 135.63 ± 0.18 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4036.04 ± 34.58 | 130.19 ± 0.39 | 3191462 | |
| Nvidia Tesla V100 | 1391.39 ± 1.19 | 129.58 ± 0.58 | 7d77f07 | |
| Nvidia RTX 4070 Ti Super | 6099.18 ± 154.30 | 129.45 ± 0.18 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 7900 XT | 2941.58 ± 17.17 | 123.18 ± 0.40 | 71e74a3 | |
| AMD Radeon RX 9070 | 3164.10 ± 66.84 | 119.71 ± 3.40 | 21c17b5 | |
| AMD Radeon RX 7800 XT | 2017.33 ± 19.30 | 118.27 ± 0.27 | 4fdbc1e | |
| AMD Radeon RX 7900 GRE | 2336.31 ± 7.52 | 116.11 ± 0.26 | 4b2a477 | |
| Apple M3 Ultra | 1116.83 ± 0.55 | 115.54 ± 0.78 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3379.00 ± 47.92 | 112.02 ± 1.08 | b863507 | |
| Nvidia Titan V | 984.36 ± 4.13 | 108.86 ± 0.28 | e56abd2 | |
| AMD Radeon Pro VII | 1078.54 ± 0.86 | 107.82 ± 0.14 | N/A | |
| AMD Radeon RX 6900 XT | 1837.21 ± 25.44 | 104.60 ± 0.30 | a972fae | |
| Intel Arc Pro A60 | 2261.11 ± 9.53 | 104.25 ± 0.07 | 97d5117 | |
| AMD Radeon RX 6800 XT | 1752.92 ± 1.71 | 100.32 ± 0.97 | N/A | |
| AMD Radeon VII | 1059.14 ± 0.56 | 101.19 ± 0.53 | 77d6ae4 | |
| Nvidia RTX 2080 Ti | 1888.24 ± 9.20 | 97.58 ± 6.60 | N/A | |
| AMD Radeon RX 6800 | 1698.69 ± 0.80 | 95.61 ± 0.19 | 4b385bf | |
| AMD Radeon Pro W6800X Duo | 687.71 ± 4.33 | 94.82 ± 0.12 | N/A | |
| Nvidia RTX 5060 Ti | 3460.92 ± 7.16 | 93.51 ± 0.15 | 89f10ba | coopmat2 |
| Nvidia RTX 4070 | 3179.37 ± 46.16 | 92.29 ± 0.28 | 9a48399 | |
| AMD Radeon Pro W6800X | 510.80 ± 0.13 | 86.47 ± 0.46 | 13b4548 | MoltenVK |
| AMD Radeon RX 6700 XT | 1051.20 ± 0.98 | 83.88 ± 0.08 | 6d75883 | |
| AMD Radeon RX 6750 XT | 1040.58 ± 0.35 | 81.98 ± 0.03 | 228f34c | |
| AMD Radeon Pro V620 | 1595.32 ± 1.59 | 81.78 ± 0.06 | 03d4698 | |
| Nvidia RTX 3070 | 2113.02 ± 7.38 | 78.71 ± 0.13 | 1b8fb81 | |
| AMD Radeon Instinct MI60 | 369.26 ± 2.48 | 78.16 ± 1.40 | 504af20 | |
| Nvidia RTX 3060 | 1815.70 ± 5.85 | 75.94 ± 0.80 | 92c0b38 | coopmat2 |
| Apple M4 Max | 724.77 ± 20.93 | 75.02 ± 0.14 | 1ece0cb6 | |
| Nvidia Tesla T10 | 1692.70 ± 2.05 | 75.01 ± 0.21 | 7f76692 | coopmat2 |
| Nvidia RTX A4000 | 2248.14 ± 7.59 | 73.74 ± 0.08 | f5245b5 | coopmat2 |
| AMD Radeon RX 5700 XT | 529.69 ± 0.26 | 70.73 ± 0.04 | 4fdbc1e | |
| AMD Radeon RX 9060 XT | 2141.67 ± 6.87 | 70.54 ± 0.74 | ed52f36 | |
| Intel Arc B580 | 620.94 ± 15.33 | 70.14 ± 0.28 | 7f76692 | |
| AMD Radeon Pro V540 | 583.88 ± 6.56 | 69.64 ± 0.24 | 9da3dcd | |
| AMD Radeon Pro W5700 | 449.85 ± 0.46 | 68.55 ± 0.15 | 23bc779 | |
| Intel Arc Pro B60 | 522.36 ± 3.60 | 68.55 ± 0.01 | 516a4ca | |
| Nvidia GTX 1080 Ti | 540.69 ± 0.71 | 64.99 ± 0.08 | 360d653 | |
| Nvidia RTX 2070 Super | 1199.13 ± 7.70 | 64.64 ± 0.20 | b7552cf | |
| Nvidia RTX 3070 Mobile | 1689.40 ± 19.57 | 63.64 ± 0.39 | ceff6bb | coopmat2 |
| Nvidia Tesla P100 | 678.14 ± 1.40 | 63.16 ± 0.06 | eec1e33 | |
| AMD BC-250 | 370.66 ± 0.04 | 62.32 ± 0.32 | 5886f4f | |
| AMD Radeon RX 6650 XT | 1029.52 ± 1.21 | 62.14 ± 0.02 | dbb852b | |
| Nvidia RTX 4060 Mobile | 2135.66 ± 23.18 | 59.53 ± 0.03 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 488.06 ± 0.27 | 59.36 ± 0.16 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 511.67 ± 2.85 | 56.60 ± 0.07 | b43556e | |
| AMD Radeon Instinct MI25 | 439.42 ± 0.34 | 54.69 ± 0.03 | 2739a71 | |
| AMD Radeon RX 6600 XT | 574.65 ± 0.86 | 53.92 ± 0.11 | 091592d | |
| AMD Ryzen AI Max+ 395 | 1288.96 ± 6.49 | 53.59 ± 0.38 | 7f76692 | |
| AMD Radeon RX 7600 XT | 840.85 ± 3.02 | 53.02 ± 0.01 | 01d8eaa | |
| Intel Arc A770 | 1073.85 + 29.68 | 52.56 + 0.11 | a69d54f | |
| Nvidia GB10 | 2737.79 ± 19.56 | 52.28 ± 0.03 | b9da444 | coopmat2 |
| AMD FirePro S9300 x2 | 247.26 ± 0.43 | 51.86 ± 0.11 | eec1e33 | Split across two GPUs |
| AMD Radeon RX 6600 | 761.89 ± 1.76 | 50.63 ± 0.02 | b1c70e2 | |
| AMD Radeon RX Vega 56 | 439.87 ± 0.61 | 50.23 ± 0.14 | 92c0b38 | |
| Intel Arc B570 | 913.95 ± 0.90 | 49.64 ± 0.03 | 7f76692 | |
| Nvidia RTX 3060 Mobile | 1059.76 ± 3.54 | 49.03 ± 0.13 | dbb3a47 | |
| AMD Radeon RX 6800M | 861.99 ± 7.67 | 48.71 ± 0.71 | 8e6f8bc | |
| AMD Radeon RX 6600M | 605.59 ± 0.65 | 48.21 ± 0.07 | fe5b78c | |
| Intel Arc A770M | 875.92 ± 2.16 | 47.69 ± 0.16 | eeee367 | |
| Nvidia P104-100 | 311.90 ± 0.22 | 46.18 ± 0.05 | eec1e33 | |
| AMD Radeon RX Vega 64 | 356.08 ± 0.09 | 45.73 ± 0.18 | ec428b0 | |
| Nvidia RTX A2000 | 1245.19 ± 8.76 | 45.52 ± 0.54 | b1afcab | coopmat2 |
| AMD Radeon RX 7600M XT | 459.39 ± 2.34 | 45.28 ± 0.10 | b9ab0a4 | eGPU |
| AMD Radeon Pro V340 | 375.41 ± 0.24 | 45.16 ± 0.06 | 9da3dcd | Split across two GPUs |
| Nvidia GTX 1070 Ti | 297.50 ± 0.54 | 42.86 ± 1.20 | 860a9e4 | eGPU |
| Intel Arc A750 | 1075.94 ± 13.89 | 42.66 ± 0.18 | c1b1876 | |
| Nvidia RTX 4050 Mobile | 1154.28 + 15.76 | 41.89 + 0.10 | d79d8f3 | |
| Nvidia GTX 1070 | 321.57 ± 0.93 | 41.48 ± 0.09 | eec1e33 | |
| Intel Arc Pro B50 | 193.50 ± 0.24 | 39.99 ± 0.10 | 7b43f55 | |
| Nvidia Tesla M40 | 92.48 ± 0.02 | 39.35 ± 1.22 | b8372ee | |
| AMD Radeon RX 580 | 258.03 ± 0.71 | 39.32 ± 0.03 | de4c07f | |
| AMD Radeon RX 470 | 218.07 ± 0.56 | 38.63 ± 0.21 | e288693 | |
| AMD Radeon Pro W5500 | 315.39 ± 3.76 | 36.82 ± 0.38 | 860a9e4 | |
| AMD Radeon RX 480 | 248.66 ± 0.28 | 34.71 ± 0.14 | 3b15924 | |
| Apple M2 Ultra | 205.98 ± 0.02 | 34.34 ± 0.12 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 186.24 ± 0.09 | 33.90 ± 0.51 | 860a9e4 | |
| Nvidia P106-100 | 183.78 ± 0.26 | 29.77 ± 0.04 | 23bc779 | |
| AMD FirePro W8100 | 155.22 ± 0.17 | 29.52 ± 0.05 | 4536363 | |
| Nvidia Tesla P4 | 265.54 ± 0.21 | 28.03 ± 0.14 | 24d2ee0 | |
| AMD Radeon RX 6500 XT | 255.25 ± 0.35 | 27.81 ± 0.10 | g9fdfcd | |
| Apple M3 | 263.70 ± 0.02 | 26.39 ± 0.14 | b9ab0a4 | MoltenVK |
| AMD FirePro S10000 | 94.78 ± 0.02 | 25.32 ± 0.02 | 914a82d | Split across two GPUs |
| Nvidia Quadro P2000 | 169.55 ± 0.17 | 23.05 ± 0.03 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 544.95 ± 4.15 | 22.49 ± 0.09 | cea560f | |
| AMD Ryzen AI 9 300 Series | 479.07 ± 0.41 | 22.41 ± 0.18 | N/A | |
| AMD Ryzen 6000 Series | 240.89 ± 0.52 | 21.26 ± 0.08 | ee09828 | |
| Apple M2 Pro | 62.70 ± 0.03 | 20.95 ± 0.11 | 1fe0029 | Asahi Linux |
| Nvidia GTX 1050 Ti | 136.42 ± 0.67 | 20.96 ± 0.21 | 2f0c2db | |
| AMD Ryzen 8000 Series | 266.19 ± 1.36 | 20.53 ± 0.08 | a5c07dc | |
| AMD Ryzen 7000 Series | 281.62 ± 1.56 | 19.91 ± 0.07 | ebce03e | |
| AMD Ryzen Z1 Extreme | 199.36 ± 7.02 | 18.77 ± 0.02 | 53ff6b9 | |
| AMD FirePro D700 | 69.95 ± 0.04 | 16.62 ± 0.01 | d3bd719 | MoltenVK, running in FP16 mode on FP32 only chip |
| AMD Radeon Pro WX 4100 | 78.79 ± 0.10 | 16.05 ± 0.07 | 860a9e4 | |
| Apple M2 | 50.79 ± 0.16 | 13.50 ± 0.02 | 8c0d6bb | Asahi Linux |
| Apple M1 | 38.29 ± 0.00 | 12.47 ± 0.03 | 2370665 | Asahi Linux |
| AMD Ryzen 5000 Series | 90.55 ± 0.08 | 10.98 ± 0.07 | d84635b | |
| Intel Core 1100 Series | 187.20 ± 1.78 | 10.39 ± 0.04 | abb9f3c | |
| AMD Radeon RX 550 | 52.66 ± 0.49 | 10.20 ± 0.01 | N/A | |
| AMD Ryzen 4000 Series | 103.87 ± 0.02 | 9.63 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 89.46 ± 0.10 | 9.39 ± 0.06 | 5d46bab | Running on single GPU |
| Nvidia Tesla K40 | 64.37 ± 0.09 | 9.30 ± 0.19 | eec1e33 | |
| MediaTek Dimensity 9400 | 38.36 ± 15.15 | 8.92 ± 0.06 | b9ab0a4 | GPU supports coopmat but pp512 is faster with it turned off |
| Intel Core Ultra 100 Series | 185.51 ± 0.22 | 8.21 ± 0.07 | 1d72c84 | |
| AMD Ryzen 3000 Series | 48.63 ± 0.10 | 8.49 ± 0.01 | 1fe0029 | |
| CIX CD8180 | 2.80 ± 0.01 | 5.51 ± 0.00 | 4dca015 | |
| Intel Core 1000 Series | 25.58 ± 0.00 | 4.25 ± 0.18 | N/A | |
| Intel Core 8000 Series | 25.43 ± 0.17 | 3.35 ± 0.03 | c4df49a | |
| Intel N150 | 28.84 ± 0.02 | 2.93 ± 0.00 | 4f63cd7 |
Llama 2 7B, Q4_0, FA enabled
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 11796.38 ± 601.36 | 273.68 ± 0.52 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3332.90 ± 11.47 | 195.30 ± 0.23 | 2f0c2db | |
| Nvidia RTX 5080 | 8054.59 ± 35.68 | 192.17 ± 0.21 | f6b533d | coopmat2 |
| Nvidia RTX 4090 | 10830.41 ± 36.25 | 190.10 ± 0.31 | 4ae88d0 | coopmat2 |
| Nvidia A100 | 7064.40 ± 1.63 | 170.56 ± 0.02 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4732.33 ± 4.80 | 162.28 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 8007.37 ± 46.03 | 150.20 ± 0.26 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4913.83 ± 21.52 | 145.74 ± 0.16 | 7c7d6ce | coopmat2 |
| Nvidia Tesla V100 | 1411.25 ± 2.12 | 142.13 ± 0.03 | 7d77f07 | |
| Nvidia RTX A5000 | 4071.22 ± 13.13 | 140.43 ± 0.22 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 4911.74 ± 28.52 | 138.20 ± 0.18 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6764.53 ± 11.95 | 135.65 ± 0.02 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4333.83 ± 29.36 | 130.90 ± 0.12 | 3191462 | |
| AMD Radeon RX 7900 XT | 3043.93 ± 10.42 | 124.20 ± 0.09 | 71e74a3 | |
| AMD Radeon RX 7800 XT | 2094.64 ± 14.38 | 119.63 ± 0.13 | 4fdbc1e | |
| AMD Radeon RX 9070 | 3277.24 ± 18.17 | 119.55 ± 0.06 | 21c17b5 | |
| AMD Radeon RX 7900 GRE | 2402.07 ± 22.50 | 116.77 ± 0.08 | 4b2a477 | |
| Apple M3 Ultra | 1115.55 ± 0.75 | 115.99 ± 0.12 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3314.53 ± 17.95 | 111.63 ± 0.05 | b863507 | |
| Nvidia Titan V | 792.74 ± 4.30 | 109.21 ± 0.72 | e56abd2 | |
| AMD Radeon Pro VII | 783.94 ± 0.77 | 108.45 ± 0.48 | N/A | |
| AMD Radeon RX 6900 XT | 1761.93 ± 4.75 | 106.15 ± 0.04 | a972fae | |
| Nvidia RTX 2080 Ti | 1936.25 ± 32.08 | 100.99 ± 0.24 | N/A | |
| AMD Radeon RX 6800 XT | 1704.79 ± 0.71 | 100.50 ± 0.06 | N/A | |
| AMD Radeon Pro W6800X Duo | 795.28 ± 0.72 | 100.08 ± 0.02 | N/A | |
| Nvidia RTX 5060 Ti | 3912.65 ± 5.86 | 97.01 ± 0.14 | 89f10ba | coopmat2 |
| AMD Radeon RX 6800 | 1749.46 ± 3.36 | 96.65 ± 0.48 | 4b385bf | |
| Nvidia RTX 4070 | 4293.57 ± 27.70 | 91.49 ± 0.89 | 9a48399 | coopmat2 |
| AMD Radeon RX 6750 XT | 997.05 ± 0.45 | 82.29 ± 0.06 | 228f34c | |
| AMD Radeon RX 6700 XT | 1010.90 ± 12.89 | 81.86 ± 0.19 | 6d75883 | |
| Nvidia RTX 3060 | 2012.88 ± 10.12 | 80.59 ± 0.02 | 92c0b38 | coopmat2 |
| AMD Radeon Pro V620 | 1556.31 ± 2.82 | 79.24 ± 0.09 | 03d4698 | |
| Nvidia RTX A4000 | 2482.74 ± 26.05 | 76.07 ± 0.08 | f5245b5 | coopmat2 |
| Nvidia Tesla T10 | 1840.14 ± 1.22 | 76.05 ± 0.13 | 7f76692 | coopmat2 |
| AMD Radeon RX 5700 XT | 538.31 ± 0.35 | 74.43 ± 0.03 | 4fdbc1e | |
| Intel Arc B580 | 419.49 ± 3.37 | 72.00 ± 0.24 | 7f76692 | |
| Apple M4 Max | 557.46 ± 26.87 | 71.79 ± 4.16 | 1ece0cb6 | |
| AMD Radeon Pro W5700 | 446.98 ± 0.39 | 71.30 ± 0.24 | 23bc779 | |
| Intel Arc Pro B60 | 274.76 ± 0.27 | 70.54 ± 0.03 | 516a4ca | |
| AMD Radeon RX 9060 XT | 1915.41 ± 7.90 | 70.52 ± 0.16 | ed52f36 | |
| Nvidia Tesla P100 | 685.51 ± 0.88 | 66.48 ± 0.02 | eec1e33 | |
| AMD Radeon RX 6650 XT | 1088.90 ± 0.40 | 64.53 ± 0.75 | dbb852b | |
| Nvidia GTX 1080 Ti | 529.96 ± 0.38 | 64.63 ± 0.10 | 360d653 | |
| AMD BC-250 | 356.87 ± 1.24 | 63.14 ± 0.09 | 5886f4f | |
| Nvidia RTX 3070 Mobile | 1832.07 ± 57.14 | 62.92 ± 0.37 | ceff6bb | coopmat2 |
| Nvidia RTX 4060 Mobile | 2358.03 ± 12.17 | 60.01 ± 0.08 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 484.37 ± 0.27 | 59.22 ± 0.15 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 514.34 ± 0.88 | 57.30 ± 0.42 | b43556e | |
| AMD Radeon RX 7600 XT | 1024.38 ± 7.56 | 56.11 ± 0.02 | 01d8eaa | |
| AMD FirePro S9300 x2 | 243.33 ± 0.22 | 55.64 ± 0.06 | eec1e33 | Split across two GPUs |
| Nvidia GB10 | 3279.89 ± 26.78 | 53.64 ± 0.05 | b9da444 | coopmat2 |
| AMD Radeon RX 6600 | 808.76 ± 0.15 | 53.24 ± 0.03 | b1c70e2 | |
| Intel Arc A770 | 1119.68 + 30.25 | 53.07 + 0.09 | a69d54f | |
| AMD Ryzen AI Max+ 395 | 1357.07 ± 10.94 | 53.00 ± 0.13 | 7f76692 | |
| AMD Radeon RX Vega 56 | 428.54 ± 0.50 | 52.66 ± 0.03 | 92c0b38 | |
| Intel Arc B570 | 288.51 ± 0.09 | 50.49 ± 0.05 | 7f76692 | |
| Nvidia P104-100 | 325.30 ± 0.25 | 48.64 ± 0.04 | eec1e33 | |
| AMD Radeon Pro V340 | 360.23 ± 0.74 | 47.54 ± 0.06 | 9da3dcd | Split across two GPUs |
| AMD Radeon RX 6800M | 784.16 ± 2.76 | 49.06 ± 0.34 | 8e6f8bc | |
| AMD Radeon RX Vega 64 | 320.12 ± 0.22 | 47.06 ± 0.01 | ec428b0 | |
| Nvidia RTX A2000 | 1361.85 ± 3.26 | 45.69 ± 0.20 | b1afcab | coopmat2 |
| Intel Arc A770M | 384.74 ± 0.78 | 45.68 ± 0.06 | eeee367 | |
| Intel Arc A750 | 303.37 ± 1.44 | 43.96 ± 0.03 | c1b1876 | |
| Nvidia GTX 1070 Ti | 292.85 ± 0.23 | 43.42 ± 0.34 | 860a9e4 | eGPU |
| Nvidia GTX 1070 | 330.84 ± 1.02 | 43.33 ± 0.06 | 360d653 | |
| Nvidia Tesla M40 | 93.35 ± 0.01 | 41.68 ± 0.01 | b8372ee | |
| Intel Arc Pro B50 | 132.48 ± 0.04 | 41.02 ± 0.04 | 7b43f55 | |
| AMD Radeon RX 470 | 197.26 ± 0.27 | 37.28 ± 0.11 | 3769fe6 | |
| AMD Radeon RX 480 | 194.52 ± 0.61 | 37.23 ± 0.09 | 0bcb40b | |
| Apple M2 Ultra | 198.83 ± 0.85 | 198.83 ± 0.85 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 180.97 ± 0.74 | 34.16 ± 0.10 | 860a9e4 | |
| Nvidia P106-100 | 183.40 ± 0.34 | 30.79 ± 0.32 | 23bc779 | |
| AMD FirePro W8100 | 140.52 ± 0.34 | 29.28 ± 0.14 | 4536363 | |
| Nvidia Tesla P4 | 287.14 ± 0.29 | 28.37 ± 0.24 | 24d2ee0 | |
| Nvidia Quadro P2000 | 181.71 ± 0.12 | 23.77 ± 0.02 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 536.48 ± 1.27 | 23.05 ± 0.04 | cea560f | |
| AMD Ryzen AI 9 300 Series | 532.59 ± 3.55 | 22.31 ± 0.06 | N/A | |
| AMD Ryzen 6000 Series | 277.91 ± 0.37 | 21.15 ± 0.09 | ee09828 | |
| Apple M2 Pro | 58.86 ± 0.02 | 20.97 ± 0.03 | 1fe0029 | Asahi Linux |
| AMD Ryzen 8000 Series | 297.39 ± 1.22 | 20.59 ± 0.38 | a5c07dc | |
| AMD Ryzen 7000 Series | 312.85 ± 2.51 | 20.09 ± 0.35 | 835b2b9 | |
| Nvidia GTX 1050 Ti | 127.54 ± 1.03 | 20.08 ± 0.17 | 2f0c2db | |
| AMD Radeon Pro WX 4100 | 75.59 ± 0.19 | 16.56 ± 0.04 | 860a9e4 | |
| Apple M1 | 35.93 ± 0.00 | 12.85 ± 0.02 | 2370665 | Asahi Linux |
| Apple M2 | 46.81 ± 0.08 | 12.25 ± 2.30 | 8c0d6bb | Asahi Linux |
| AMD Ryzen 5000 Series | 79.06 ± 0.01 | 10.75 ± 0.00 | 5d195f1 | |
| Intel Core 1100 Series | 174.77 ± 4.47 | 10.58 ± 0.03 | abb9f3c | |
| Nvidia Tesla K40 | 64.37 ± 0.02 | 9.92 ± 0.06 | eec1e33 | |
| AMD Ryzen 4000 Series | 113.32 ± 0.01 | 9.87 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 88.26 ± 0.19 | 9.49 ± 0.01 | 5d46bab | Running on single GPU |
| AMD Ryzen 5 3000 Series | 47.41 ± 0.14 | 8.47 ± 0.01 | 1fe0029 | |
| Intel Core Ultra 100 Series | 77.66 ± 2.75 | 7.75 ± 0.05 | 2e89f76 | |
| Intel Core 8000 Series | 25.55 ± 0.04 | 3.35 ± 0.02 | c4df49a | |
| Intel N150 | 25.59 ± 0.00 | 2.91 ± 0.00 | 4f63cd7 |
How to Use These Tables
- Decide whether you care more about g128 or pp512. For chat and interactive use, g128 usually matters more. For long prompts and batch throughput, pp512 matters more.
- Match the backend you actually use. Nvidia users should usually prioritize CUDA. AMD users should compare ROCm and Vulkan first. Cross-platform users should pay close attention to Vulkan.
- Check FA last. On many GPUs, enabling FA improves pp512 more than g128, so a single headline number can be misleading.
One-Sentence Summary
In llama.cpp benchmarks, pp512, g128, Q4_0, FA, and CUDA / ROCm / Vulkan describe different dimensions. Once the benchmark context is clear, the tables become much easier to read.
Sources
- CUDA discussion #15013: https://github.com/ggml-org/llama.cpp/discussions/15013
- Apple Silicon discussion #4167: https://github.com/ggml-org/llama.cpp/discussions/4167
- ROCm discussion #15021: https://github.com/ggml-org/llama.cpp/discussions/15021
- Vulkan discussion #10879: https://github.com/ggml-org/llama.cpp/discussions/10879