?????????
Q4_0 ??
Q4_0 ? 4-bit ????????????????????????????????????????VRAM ???????????????????????????????? scoreboard ? Llama 2 7B, Q4_0 ??????????????????????????
pp512 ??
pp512 ??? prompt processing 512 tokens???? 512 ???????????????????????????
- pp = prompt processing
- 512 = ???? 512 token
- /s = tokens per second ???????????????????????????????????????????
g128 ??
g128 ??? ext generation 128 tokens???? 128 ??????????????????????
- g = text generation
- 128 = 128 token ?????
- /s = tokens per second ??????????????????????
FA ??
FA ? Flash Attention ?????
- with FA ? Flash Attention ??
o FA ? Flash Attention ?? ??? GPU ?? FA ???? pp512 ??????????? g128 ???????????????
/s ???
/s ? okens per second ???scoreboard ?????????????????????????????????????????
- pp512 ? g128 ?????????????
o FA ? with FA ?????
- CUDA?ROCm?Vulkan ???????????????
??????
- CUDA ?????? llama.cpp ? GPU ????????????????
- ROCm ????? AMD GPU ? Instinct ?????????????????
- Vulkan ????????????????????????
- g128 ???????????pp512 ???????????????????
CUDA ????
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14073.41 ± 115.16 | 290.02 ± 1.10 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 14854.63 ± 22.73 | 274.20 ± 0.14 | 79c1160 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 9918.34 ± 176.97 | 267.81 ± 1.54 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 4849.53 ± 8.94 | 190.88 ± 0.33 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 10293.86 ± 134.72 | 189.33 ± 0.19 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 11992.70 ± 107.99 | 186.21 ± 0.13 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 8297.36 ± 9.50 | 181.99 ± 0.42 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 6952.38 ± 13.73 | 176.85 ± 0.07 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 9229.23 ± 101.78 | 176.07 ± 0.26 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6567.49 ± 20.30 | 171.19 ± 3.98 | 9c35706 | @slaren |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5174.69 ± 21.83 | 158.16 ± 0.21 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 8870.49 ± 378.76 | 152.01 ± 0.28 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 8125.15 ± 41.05 | 148.33 ± 0.20 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 8031.64 ± 26.49 | 142.49 ± 0.16 | 20638e4 | @Ristovski |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5013.86 ± 24.80 | 139.65 ± 0.99 | 9c35706 | @slaren |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 4913.93 ± 6.79 | 138.73 ± 2.75 | 4795c91 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 6924.53 ± 13.87 | 132.26 ± 0.16 | 9c35706 | @Ristovski |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 4992.83 ± 113.52 | 131.66 ± 0.20 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4028.16 ± 19.14 | 130.07 ± 2.74 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 3042.64 ± 40.71 | 129.08 ± 0.05 | 51f5a45 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5184.75 ± 18.70 | 127.54 ± 0.46 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 4609.01 ± 10.67 | 124.11 ± 0.17 | 3470a5c | @Hedede |
| A30 | 24 GB / HBM2e / 3072 bit | 2767.10 ± 1.88 | 124.81 ± 0.16 | 583cb83 | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2617.46 ± 2.10 | 108.79 ± 0.05 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 2890.66 ± 2.42 | 107.51 ± 0.21 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 2751.18 ± 19.43 | 102.77 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 2709.95 ± 3.35 | 102.68 ± 0.03 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 2827.20 ± 66.43 | 97.32 ± 2.80 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 3737.25 ± 6.79 | 90.94 ± 0.02 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2088.34 ± 1.94 | 88.06 ± 0.28 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2684.06 ± 15.28 | 83.77 ± 0.37 | 65349f2 | @TinyServal |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1154.96 ± 1.46 | 76.08 ± 0.08 | c4510dc | @Hedede |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2137.50 ± 10.12 | 75.57 ± 0.07 | baa9255 | @QuantiusBenignus |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1536.89 ± 0.90 | 65.62 ± 0.62 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3394.63 ± 7.44 | 63.86 ± 0.01 | 89d1029 | @mike-llamacpp |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1084.41 ± 3.01 | 62.49 ± 0.06 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 2779.77 ± 9.91 | 61.83 ± 0.04 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1420.24 ± 1.95 | 60.04 ± 0.01 | 5c0eb5e | @ggerganov |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 760.80 ± 2.92 | 58.35 ± 0.00 | b8372ee | @Hedede |
| DGX Spark | 128 GB / LPDDR5x | 3062.31 ± 11.02 | 57.21 ± 0.06 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1007.42 ± 1.23 | 54.74 ± 0.07 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 1956.22 ± 7.74 | 50.62 ± 0.04 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1219.06 ± 4.18 | 46.38 ± 0.73 | d32e03f | @pt13762104 |
| RTX 4050 Laptop | 6 GB / GDDR6 / 96 bit | 1725.85 + 17.85 | 43.72 + 0.41 | d79d8f3 | @TimCabbage |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 148.91 ± 0.01 | 41.35 ± 0.02 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 282.65 ± 0.15 | 38.04 ± 0.02 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 714.44 ± 2.04 | 37.82 ± 0.02 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 991.31 ± 1.15 | 33.58 ± 0.14 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 514.53 ± 3.06 | 33.29 ± 0.00 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 406.94 ± 0.25 | 30.40 ± 0.02 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 416.85 ± 1.75 | 27.79 ± 0.02 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 79.44 ± 0.01 | 27.82 ± 0.18 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 309.30 ± 0.05 | 23.63 ± 0.00 | baa9255 | @TinyServal |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 183.40 ± 0.11 | 13.99 ± 0.13 | 1e74897 | @aleksyx |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.14 ± 0.55 | 13.80 ± 0.02 | 32732f2 | @pebaryan |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| RTX 5090 | 32 GB / GDDR7 / 512 bit | 14970.15 ± 381.06 | 300.40 ± 0.28 | 8cf6b42 | @totaldev |
| RTX PRO 6000 Blackwell | 96 GB / GDDR7 / 512 bit | 16618.98 ± 20.66 | 281.11 ± 0.41 | 5143fa8 | @Tom94 |
| H100 80 GB | 80 GB / HBM3 / 5120 bit | 11263.29 ± 98.34 | 280.74 ± 1.17 | 5143fa8 | @Hedede |
| A100 80 GB | 80 GB / HBM2e / 5120 bit | 5285.96 ± 6.58 | 200.90 ± 0.12 | 5143fa8 | @Hedede |
| RTX 4090 D | 24 GB / GDDR6X / 384 bit | 12506.97 ± 11.51 | 191.57 ± 0.03 | 79c1160 | @autonomous-AI-lab |
| RTX 4090 | 24 GB / GDDR6X / 384 bit | 14770.63 ± 102.93 | 188.96 ± 0.05 | 2241453 | @lhl |
| RTX 5080 | 16 GB / GDDR7 / 256 bit | 9487.70 ± 21.89 | 184.68 ± 0.05 | 8a4280c | @Hedede |
| RTX 5070 Ti | 16 GB / GDDR7 / 256 bit | 8419.56 ± 35.50 | 182.43 ± 0.09 | 933414c | @TinyServal |
| RTX 6000 Ada | 48 GB / GDDR6 / 384 bit | 10576.85 ± 530.21 | 179.47 ± 0.32 | b8e09f0 | @Hedede |
| RTX 3090 Ti | 24 GB / GDDR6X / 384 bit | 6924.01 ± 10.76 | 172.26 ± 1.31 | 9c35706 | @slaren |
| RTX PRO 4500 Blackwell | 32 GB / GDDR7 / 256 bit | 7251.66 ± 92.40 | 168.90 ± 0.20 | becc481 | @Hedede |
| RTX 3090 | 24 GB / GDDR6X / 384 bit | 5560.06 ± 16.28 | 161.89 ± 0.18 | c76b420 | @m18coppola |
| L40 | 48 GB / GDDR6 / 384 bit | 10097.64 ± 671.22 | 153.76 ± 0.12 | ee09828 | @Hedede |
| RTX 4080 SUPER | 16 GB / GDDR6X / 256 bit | 9439.01 ± 56.75 | 147.48 ± 1.41 | 81086cd | @zacharyarnaise |
| RTX 4080 | 16 GB / GDDR6X / 256 bit | 9205.93 ± 22.31 | 143.47 ± 0.02 | 20638e4 | @Ristovski |
| RTX A6000 | 48 GB / GDDR6 / 384 bit | 5662.39 ± 13.87 | 144.87 ± 0.18 | 4795c91 | @Hedede |
| RTX 3080 | 10 GB / GDDR6X / 320 bit | 5569.56 ± 14.04 | 139.95 ± 0.95 | 9c35706 | @slaren |
| RTX PRO 4000 Blackwell | 24 GB / GDDR7 / 192 bit | 5674.44 ± 139.53 | 136.38 ± 0.13 | 7d77f07 | @Hedede |
| RTX A5000 | 24 GB / GDDR6 / 384 bit | 4552.15 ± 9.68 | 135.83 ± 0.11 | e5155e6 | @Hedede |
| Tesla V100 | 32 GB / HBM2 / 4096 bit | 2973.78 ± 3.62 | 134.76 ± 0.02 | 51f5a45 | @Hedede |
| RTX 4070 Ti SUPER | 16 GB / GDDR6X / 256 bit | 7612.32 ± 37.35 | 132.85 ± 0.31 | 9c35706 | @Ristovski |
| A30 | 24 GB / HBM2e / 3072 bit | 3068.72 ± 0.63 | 131.93 ± 0.18 | 583cb83 | @Hedede |
| RTX 5070 | 12 GB / GDDR7 / 192 bit | 5783.44 ± 36.95 | 128.21 ± 2.52 | @Spyro000 | - |
| A40 | 48 GB / GDDR6 / 384 bit | 5256.38 ± 19.39 | 126.24 ± 0.06 | 3470a5c | @Hedede |
| Titan V | 12 GB / HBM2 / 3072 bit | 2481.25 ± 1.31 | 112.17 ± 0.01 | e56abd2 | @Hedede |
| RTX 2080 Ti | 11 GB / GDDR6 / 352 bit | 3107.61 ± 4.34 | 109.17 ± 0.07 | 9c35706 | @ariya |
| Quadro RTX 6000 | 24 GB / GDDR6 / 384 bit | 3053.96 ± 1.37 | 104.38 ± 0.04 | b8e09f0 | @Hedede |
| Quadro RTX 8000 | 48 GB / GDDR6 / 384 bit | 3052.35 ± 5.64 | 103.63 ± 0.02 | b8e09f0 | @Hedede |
| RTX A4500 | 20 GB / GDDR6 / 320 bit | 3453.10 ± 49.19 | 103.00 ± 0.25 | 5cdb27e | @aleksyx |
| RTX 5060 Ti 16 GB | 16 GB / GDDR7 / 128 bit | 4195.53 ± 1.98 | 93.46 ± 0.01 | 89d1029 | @mike-llamacpp |
| RTX 2070 SUPER | 8 GB / GDDR6 / 256 bit | 2293.29 ± 5.91 | 87.71 ± 0.29 | bc07349 | @phstudy |
| RTX A4000 | 16 GB / GDDR6 / 256 bit | 2807.83 ± 52.44 | 85.17 ± 0.66 | 65349f2 | @TinyServal |
| RTX 3060 | 12 GB / GDDR6 / 192 bit | 2407.67 ± 3.73 | 76.92 ± 0.03 | baa9255 | @QuantiusBenignus |
| Titan Xp | 12 GB / GDDR5X / 384 bit | 1218.12 ± 1.82 | 73.84 ± 0.04 | c4510dc | @Hedede |
| Quadro RTX 4000 | 8 GB / GDDR6 / 256 bit | 1662.80 ± 2.04 | 67.62 ± 0.67 | 7d77f07 | @Hedede |
| RTX 4060 Ti 8 GB | 8 GB / GDDR6 / 128 bit | 3803.45 ± 70.80 | 64.03 ± 0.53 | 89d1029 | @mike-llamacpp |
| Tesla P100 | 16 GB / HBM2 / 4096 bit | 787.36 ± 3.27 | 61.99 ± 0.00 | b8372ee | @Hedede |
| GTX 1080 Ti | 11 GB / GDDR5X / 352 bit | 1138.14 ± 2.02 | 61.38 ± 0.03 | 9c35706 | @ariya |
| RTX A4000 Ada | 20 GB / GDDR6 / 160 bit | 3171.86 ± 4.34 | 61.37 ± 0.01 | a74a0d6 | @sdwolfz |
| RTX 2060 SUPER | 8 GB / GDDR6 / 256 bit | 1563.77 ± 0.51 | 61.13 ± 0.05 | 5c0eb5e | @ggerganov |
| DGX Spark | 128 GB / LPDDR5x | 3661.37 ± 38.66 | 56.74 ± 0.03 | 5acd455 | @ggerganov |
| Tesla P40 | 24 GB / GDDR5 / 384 bit | 1079.66 ± 0.18 | 53.73 ± 0.05 | c76b420 | @m18coppola |
| RTX 2000 Ada | 16 GB / GDDR6 / 128 bit | 2250.14 ± 5.91 | 50.71 ± 0.01 | 756cfea | @DigitalRudeness |
| Tesla T4 | 16 GB / GDDR6 / 256 bit | 1309.73 ± 1.02 | 44.03 ± 0.57 | d32e03f | @pt13762104 |
| GTX 1660 | 6 GB / GDDR5 / 192 bit | 154.45 ± 0.52 | 41.43 ± 0.01 | 9515c61 | @ariya |
| Tesla M40 | 24 GB / GDDR5 / 384 bit | 290.17 ± 0.11 | 39.98 ± 0.01 | 97d5117 | @Hedede |
| GTX 1070 Ti | 8 GB / GDDR5 / 256 bit | 790.52 ± 2.39 | 37.87 ± 0.00 | 79c1160 | @pebaryan |
| Jetson AGX Orin | 64 GB / LPDDR5 / 256 bit | 1171.96 ± 4.70 | 35.88 ± 0.18 | c1b1876 | @TinyServal |
| Tesla P4 | 8 GB / GDDR5 / 256 bit | 529.53 ± 2.12 | 33.12 ± 0.03 | c76b420 | @m18coppola |
| P106-100 | 6 GB / GDDR5 / 192 bit | 438.49 ± 0.38 | 30.64 ± 0.06 | 5fd160b | @pebaryan |
| GTX 1060 | 6 GB / GDDR5 / 192 bit | 446.19 ± 0.81 | 28.18 ± 0.01 | 5fd160b | @pebaryan |
| Quadro T1000 | 4 GB / GDDR5 / 128 bit | 27.46 ± 0.23 | 27.46 ± 0.23 | f6da8cb | @hanabu |
| Quadro P2000 | 5 GB / GDDR5 / 160 bit | 311.55 ± 0.19 | 23.76 ± 0.01 | baa9255 | @TinyServal |
| Tesla K80 | 12 GB / GDDR5 / 384 bit | 133.36 ± 0.60 | 14.27 ± 0.32 | 32732f2 | @pebaryan |
| Quadro P1000 | 4 GB / GDDR5 / 128 bit | 173.82 ± 0.02 | 13.65 ± 0.14 | 1e74897 | @aleksyx |
Apple Silicon ????
#4167 ????????????????????????Q4_0 ????? F16 ? Q8_0 ?????????PP / TG / t/s ????????????? ?????????????????????
- PP = prompt processing
- TG = ext-generation
- /s = okens per second
????????? M2 Ultra ??????????????
?? ???? ?????/?? ??? GB/s GPU??? F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG 2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27 2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80 2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41 Apple Silicon ????????????? entries ?????? ???? Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG — —: —: —: —: —: —: M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75 M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02 M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27
ROCm / HIP ????
Llama 2 7B, Q4_0, no FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11476.40 ± 72.79 | 232.92 ± 0.53 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3552.27 ± 101.96 | 167.11 ± 0.50 | 2f0c2db | @Diablo-D3 |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2486.22 ± 9.58 | 124.51 ± 0.04 | 8160b38 | @65a |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3213.17 ± 80.47 | 121.18 ± 0.06 | 8160b38 | @65a |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3098.38 ± 24.02 | 116.15 ± 0.06 | 1e15bfd | @AdamNiederer |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2381.77 ± 3.68 | 114.48 ± 0.60 | d0660f2 | @andj1210 |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2732.83 ± 1.98 | 110.48 ± 0.14 | 9c35706 | @firefox42 |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 5055.19 ± 109.58 | 101.27 ± 0.27 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2151.81 + 17.94 | 100.94 + 0.10 | 00131d6 | @olegshulyakov |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1057.24 ± 0.53 | 98.95 ± 0.25 | 97d5117 | @wtarreau |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1456.98 ± 12.39 | 96.07 ± 0.10 | 6fa3b55 | @MihaiBojescu |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4443.54 ± 339.25 | 93.84 ± 0.26 | bd4ef13 | @gogich77 |
| Instinct MI60 | 32 GB / HBM2 / 4096 bit | 1289.11 ± 0.62 | 91.46 ± 0.13 | 504af20 | @Said-Akbar |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1889.84 ± 31.21 | 88.49 ± 0.00 | a972fae | @notgood |
| Pro VII | 16 GB / HBM2 / 4096 bit | 1064.99 ± 1.18 | 87.45 ± 0.04 | 2739a71 | @8XXD8 |
| RX 6800 XT | 16 GB / GDDR6 / 256 bit | 1447.07 ± 1.36 | 83.92 ± 0.03 | 79c1160 | @MrLavender |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1803.65 ± 2.54 | 74.66 ± 0.01 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1419.67 ± 3.64 | 67.58 ± 0.24 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 354.17 ± 0.18 | 67.55 ± 0.04 | c05e8c9 | @daniandtheweb |
| Instinct MI25 | 16 GB / HBM2 / 2048 bit | 409.83 ± 0.23 | 63.94 ± 0.06 | 2739a71 | @8XXD8 |
| AI Max+ 395 | 128 GB / LPDDR5 | 911.36 ± 1.79 | 50.01 ± 0.07 | e60f241 | @firefox42 |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1099.64 ± 2.05 | 48.58 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 240.68 ± 0.09 | 48.46 ± 0.09 | ec428b0 | @davispuh |
| Radeon 8060S | System Shared / DDR5 | 351.36 ± 0.67 | 47.97 ± 0.33 | 1d0125b | @hspak |
| Radeon 880M | System Shared / DDR5 | 163.25 ± 13.86 | 12.97 ± 1.63 | c55d53a | @Hedede |
Llama 2 7B, Q4_0, with FA
| Chip | Memory | pp512 t/s | tg128 t/s | Commit | Thanks to |
|---|---|---|---|---|---|
| Instinct MI300X | 192 GB / HBM3 / 8192 bit | 11945.97 ± 54.29 | 218.53 ± 0.09 | ee3a9fc | @yeahdongcn |
| RX 7900 XTX | 24 GB / GDDR6 / 384 bit | 3874.25 ± 11.92 | 170.12 ± 0.56 | 2f0c2db | @Diablo-D3 |
| Pro W7900 | 48 GB / GDDR6 / 384 bit | 3472.86 ± 52.86 | 127.43 ± 0.12 | 8160b38 | @65a |
| Instinct MI210 | 64 GB / HBM2e / 4096 bit | 2571.82 ± 2.89 | 130.18 ± 0.06 | 8160b38 | @65a |
| RX 9070 | 16 GB / GDDR6 / 256 bit | 2452.68 ± 1.33 | 115.32 ± 0.52 | d0660f2 | @andj1210 |
| RX 7900 XT | 20 GB / GDDR6 / 320 bit | 3261.75 ± 9.09 | 112.30 ± 0.06 | 1e15bfd | @AdamNiederer |
| Instinct MI50 | 32 GB / HBM2 / 4096 bit | 1129.43 ± 0.15 | 105.82 ± 0.07 | 97d5117 | @wtarreau |
| Instinct MI100 | 32 GB / HBM2 / 4096 bit | 2755.00 ± 3.68 | 104.71 ± 0.10 | 9c35706 | @firefox42 |
| AI PRO R9700 | 32 GB / GDDR6 / 256 bit | 4773.07 ± 49.30 | 97.98 ± 0.13 | bd4ef13 | @gogich77 |
| RX 7900 GRE | 16 GB / GDDR6 / 256 bit | 1598.79 ± 11.48 | 97.53 ± 0.06 | 6fa3b55 | @MihaiBojescu |
| RX 9070 XT | 16 GB / GDDR6 / 256 bit | 4903.51 ± 96.36 | 97.28 ± 0.13 | 583cb83 | @Hadrianneue |
| RX 7800 XT | 16 GB / GDDR6 / 256 bit | 2304.63 + 2.85 | 95.99 + 0.21 | 00131d6 | @olegshulyakov |
| RX 6900 XT | 16 GB / GDDR6 / 256 bit | 1948.31 ± 13.51 | 85.04 ± 0.02 | a972fae | @notgood |
| Pro V620 | 32 GB / GDDR6 / 256 bit | 1256.86 ± 0.55 | 70.83 ± 0.02 | 5c0eb5e | @samteezy |
| RX 9060 XT | 16 GB / GDDR6 / 256 bit | 1479.27 ± 0.71 | 65.42 ± 0.19 | a0e13dc | @lcy0321 |
| RX 5700 XT | 8 GB / GDDR6 / 256 bit | 314.17 ± 0.29 | 62.02 ± 0.05 | c05e8c9 | @daniandtheweb |
| AI Max+ 395 | 128 GB / LPDDR5 | 1003.53 ± 2.91 | 49.87 ± 0.02 | e60f241 | @firefox42 |
| Radeon 8060S | System Shared / DDR5 | 366.08 ± 1.44 | 48.97 ± 0.15 | 1d0125b | @hspak |
| RX 7600 XT | 16 GB / GDDR6 / 128 bit | 1199.16 ± 1.07 | 47.65 ± 0.06 | 9c35706 | @wbruna |
| RX Vega 64 | 8 GB / HBM2 / 2048 bit | 153.17 ± 0.72 | 42.46 ± 0.40 | ec428b0 | @davispuh |
| Radeon 880M | System Shared / DDR5 | 213.31 ± 14.05 | 16.16 ± 1.41 | c55d53a | @Hedede |
Vulkan ????
Llama 2 7B, Q4_0, no FA
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 10381.64 ± 508.84 | 263.63 ± 0.91 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3531.93 ± 31.74 | 191.28 ± 0.20 | 2f0c2db | |
| Nvidia RTX 4090 | 9452.03 ± 187.70 | 187.97 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 5080 | 7444.99 ± 20.11 | 185.10 ± 0.54 | f6b533d | coopmat2 |
| Nvidia A100 | 6389.86 ± 4.83 | 160.78 ± 0.16 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4298.97 ± 10.59 | 160.13 ± 0.25 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 7101.18 ± 269.79 | 147.13 ± 5.64 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4287.11 ± 55.50 | 139.15 ± 0.05 | 7c7d6ce | coopmat2 |
| Nvidia RTX A5000 | 3641.55 ± 9.05 | 139.89 ± 0.69 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 5036.04 ± 88.16 | 137.11 ± 0.02 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6213.63 ± 27.72 | 135.63 ± 0.18 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4036.04 ± 34.58 | 130.19 ± 0.39 | 3191462 | |
| Nvidia Tesla V100 | 1391.39 ± 1.19 | 129.58 ± 0.58 | 7d77f07 | |
| Nvidia RTX 4070 Ti Super | 6099.18 ± 154.30 | 129.45 ± 0.18 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 7900 XT | 2941.58 ± 17.17 | 123.18 ± 0.40 | 71e74a3 | |
| AMD Radeon RX 9070 | 3164.10 ± 66.84 | 119.71 ± 3.40 | 21c17b5 | |
| AMD Radeon RX 7800 XT | 2017.33 ± 19.30 | 118.27 ± 0.27 | 4fdbc1e | |
| AMD Radeon RX 7900 GRE | 2336.31 ± 7.52 | 116.11 ± 0.26 | 4b2a477 | |
| Apple M3 Ultra | 1116.83 ± 0.55 | 115.54 ± 0.78 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3379.00 ± 47.92 | 112.02 ± 1.08 | b863507 | |
| Nvidia Titan V | 984.36 ± 4.13 | 108.86 ± 0.28 | e56abd2 | |
| AMD Radeon Pro VII | 1078.54 ± 0.86 | 107.82 ± 0.14 | N/A | |
| AMD Radeon RX 6900 XT | 1837.21 ± 25.44 | 104.60 ± 0.30 | a972fae | |
| Intel Arc Pro A60 | 2261.11 ± 9.53 | 104.25 ± 0.07 | 97d5117 | |
| AMD Radeon RX 6800 XT | 1752.92 ± 1.71 | 100.32 ± 0.97 | N/A | |
| AMD Radeon VII | 1059.14 ± 0.56 | 101.19 ± 0.53 | 77d6ae4 | |
| Nvidia RTX 2080 Ti | 1888.24 ± 9.20 | 97.58 ± 6.60 | N/A | |
| AMD Radeon RX 6800 | 1698.69 ± 0.80 | 95.61 ± 0.19 | 4b385bf | |
| AMD Radeon Pro W6800X Duo | 687.71 ± 4.33 | 94.82 ± 0.12 | N/A | |
| Nvidia RTX 5060 Ti | 3460.92 ± 7.16 | 93.51 ± 0.15 | 89f10ba | coopmat2 |
| Nvidia RTX 4070 | 3179.37 ± 46.16 | 92.29 ± 0.28 | 9a48399 | |
| AMD Radeon Pro W6800X | 510.80 ± 0.13 | 86.47 ± 0.46 | 13b4548 | MoltenVK |
| AMD Radeon RX 6700 XT | 1051.20 ± 0.98 | 83.88 ± 0.08 | 6d75883 | |
| AMD Radeon RX 6750 XT | 1040.58 ± 0.35 | 81.98 ± 0.03 | 228f34c | |
| AMD Radeon Pro V620 | 1595.32 ± 1.59 | 81.78 ± 0.06 | 03d4698 | |
| Nvidia RTX 3070 | 2113.02 ± 7.38 | 78.71 ± 0.13 | 1b8fb81 | |
| AMD Radeon Instinct MI60 | 369.26 ± 2.48 | 78.16 ± 1.40 | 504af20 | |
| Nvidia RTX 3060 | 1815.70 ± 5.85 | 75.94 ± 0.80 | 92c0b38 | coopmat2 |
| Apple M4 Max | 724.77 ± 20.93 | 75.02 ± 0.14 | 1ece0cb6 | |
| Nvidia Tesla T10 | 1692.70 ± 2.05 | 75.01 ± 0.21 | 7f76692 | coopmat2 |
| Nvidia RTX A4000 | 2248.14 ± 7.59 | 73.74 ± 0.08 | f5245b5 | coopmat2 |
| AMD Radeon RX 5700 XT | 529.69 ± 0.26 | 70.73 ± 0.04 | 4fdbc1e | |
| AMD Radeon RX 9060 XT | 2141.67 ± 6.87 | 70.54 ± 0.74 | ed52f36 | |
| Intel Arc B580 | 620.94 ± 15.33 | 70.14 ± 0.28 | 7f76692 | |
| AMD Radeon Pro V540 | 583.88 ± 6.56 | 69.64 ± 0.24 | 9da3dcd | |
| AMD Radeon Pro W5700 | 449.85 ± 0.46 | 68.55 ± 0.15 | 23bc779 | |
| Intel Arc Pro B60 | 522.36 ± 3.60 | 68.55 ± 0.01 | 516a4ca | |
| Nvidia GTX 1080 Ti | 540.69 ± 0.71 | 64.99 ± 0.08 | 360d653 | |
| Nvidia RTX 2070 Super | 1199.13 ± 7.70 | 64.64 ± 0.20 | b7552cf | |
| Nvidia RTX 3070 Mobile | 1689.40 ± 19.57 | 63.64 ± 0.39 | ceff6bb | coopmat2 |
| Nvidia Tesla P100 | 678.14 ± 1.40 | 63.16 ± 0.06 | eec1e33 | |
| AMD BC-250 | 370.66 ± 0.04 | 62.32 ± 0.32 | 5886f4f | |
| AMD Radeon RX 6650 XT | 1029.52 ± 1.21 | 62.14 ± 0.02 | dbb852b | |
| Nvidia RTX 4060 Mobile | 2135.66 ± 23.18 | 59.53 ± 0.03 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 488.06 ± 0.27 | 59.36 ± 0.16 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 511.67 ± 2.85 | 56.60 ± 0.07 | b43556e | |
| AMD Radeon Instinct MI25 | 439.42 ± 0.34 | 54.69 ± 0.03 | 2739a71 | |
| AMD Radeon RX 6600 XT | 574.65 ± 0.86 | 53.92 ± 0.11 | 091592d | |
| AMD Ryzen AI Max+ 395 | 1288.96 ± 6.49 | 53.59 ± 0.38 | 7f76692 | |
| AMD Radeon RX 7600 XT | 840.85 ± 3.02 | 53.02 ± 0.01 | 01d8eaa | |
| Intel Arc A770 | 1073.85 + 29.68 | 52.56 + 0.11 | a69d54f | |
| Nvidia GB10 | 2737.79 ± 19.56 | 52.28 ± 0.03 | b9da444 | coopmat2 |
| AMD FirePro S9300 x2 | 247.26 ± 0.43 | 51.86 ± 0.11 | eec1e33 | Split across two GPUs |
| AMD Radeon RX 6600 | 761.89 ± 1.76 | 50.63 ± 0.02 | b1c70e2 | |
| AMD Radeon RX Vega 56 | 439.87 ± 0.61 | 50.23 ± 0.14 | 92c0b38 | |
| Intel Arc B570 | 913.95 ± 0.90 | 49.64 ± 0.03 | 7f76692 | |
| Nvidia RTX 3060 Mobile | 1059.76 ± 3.54 | 49.03 ± 0.13 | dbb3a47 | |
| AMD Radeon RX 6800M | 861.99 ± 7.67 | 48.71 ± 0.71 | 8e6f8bc | |
| AMD Radeon RX 6600M | 605.59 ± 0.65 | 48.21 ± 0.07 | fe5b78c | |
| Intel Arc A770M | 875.92 ± 2.16 | 47.69 ± 0.16 | eeee367 | |
| Nvidia P104-100 | 311.90 ± 0.22 | 46.18 ± 0.05 | eec1e33 | |
| AMD Radeon RX Vega 64 | 356.08 ± 0.09 | 45.73 ± 0.18 | ec428b0 | |
| Nvidia RTX A2000 | 1245.19 ± 8.76 | 45.52 ± 0.54 | b1afcab | coopmat2 |
| AMD Radeon RX 7600M XT | 459.39 ± 2.34 | 45.28 ± 0.10 | b9ab0a4 | eGPU |
| AMD Radeon Pro V340 | 375.41 ± 0.24 | 45.16 ± 0.06 | 9da3dcd | Split across two GPUs |
| Nvidia GTX 1070 Ti | 297.50 ± 0.54 | 42.86 ± 1.20 | 860a9e4 | eGPU |
| Intel Arc A750 | 1075.94 ± 13.89 | 42.66 ± 0.18 | c1b1876 | |
| Nvidia RTX 4050 Mobile | 1154.28 + 15.76 | 41.89 + 0.10 | d79d8f3 | |
| Nvidia GTX 1070 | 321.57 ± 0.93 | 41.48 ± 0.09 | eec1e33 | |
| Intel Arc Pro B50 | 193.50 ± 0.24 | 39.99 ± 0.10 | 7b43f55 | |
| Nvidia Tesla M40 | 92.48 ± 0.02 | 39.35 ± 1.22 | b8372ee | |
| AMD Radeon RX 580 | 258.03 ± 0.71 | 39.32 ± 0.03 | de4c07f | |
| AMD Radeon RX 470 | 218.07 ± 0.56 | 38.63 ± 0.21 | e288693 | |
| AMD Radeon Pro W5500 | 315.39 ± 3.76 | 36.82 ± 0.38 | 860a9e4 | |
| AMD Radeon RX 480 | 248.66 ± 0.28 | 34.71 ± 0.14 | 3b15924 | |
| Apple M2 Ultra | 205.98 ± 0.02 | 34.34 ± 0.12 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 186.24 ± 0.09 | 33.90 ± 0.51 | 860a9e4 | |
| Nvidia P106-100 | 183.78 ± 0.26 | 29.77 ± 0.04 | 23bc779 | |
| AMD FirePro W8100 | 155.22 ± 0.17 | 29.52 ± 0.05 | 4536363 | |
| Nvidia Tesla P4 | 265.54 ± 0.21 | 28.03 ± 0.14 | 24d2ee0 | |
| AMD Radeon RX 6500 XT | 255.25 ± 0.35 | 27.81 ± 0.10 | g9fdfcd | |
| Apple M3 | 263.70 ± 0.02 | 26.39 ± 0.14 | b9ab0a4 | MoltenVK |
| AMD FirePro S10000 | 94.78 ± 0.02 | 25.32 ± 0.02 | 914a82d | Split across two GPUs |
| Nvidia Quadro P2000 | 169.55 ± 0.17 | 23.05 ± 0.03 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 544.95 ± 4.15 | 22.49 ± 0.09 | cea560f | |
| AMD Ryzen AI 9 300 Series | 479.07 ± 0.41 | 22.41 ± 0.18 | N/A | |
| AMD Ryzen 6000 Series | 240.89 ± 0.52 | 21.26 ± 0.08 | ee09828 | |
| Apple M2 Pro | 62.70 ± 0.03 | 20.95 ± 0.11 | 1fe0029 | Asahi Linux |
| Nvidia GTX 1050 Ti | 136.42 ± 0.67 | 20.96 ± 0.21 | 2f0c2db | |
| AMD Ryzen 8000 Series | 266.19 ± 1.36 | 20.53 ± 0.08 | a5c07dc | |
| AMD Ryzen 7000 Series | 281.62 ± 1.56 | 19.91 ± 0.07 | ebce03e | |
| AMD Ryzen Z1 Extreme | 199.36 ± 7.02 | 18.77 ± 0.02 | 53ff6b9 | |
| AMD FirePro D700 | 69.95 ± 0.04 | 16.62 ± 0.01 | d3bd719 | MoltenVK, running in FP16 mode on FP32 only chip |
| AMD Radeon Pro WX 4100 | 78.79 ± 0.10 | 16.05 ± 0.07 | 860a9e4 | |
| Apple M2 | 50.79 ± 0.16 | 13.50 ± 0.02 | 8c0d6bb | Asahi Linux |
| Apple M1 | 38.29 ± 0.00 | 12.47 ± 0.03 | 2370665 | Asahi Linux |
| AMD Ryzen 5000 Series | 90.55 ± 0.08 | 10.98 ± 0.07 | d84635b | |
| Intel Core 1100 Series | 187.20 ± 1.78 | 10.39 ± 0.04 | abb9f3c | |
| AMD Radeon RX 550 | 52.66 ± 0.49 | 10.20 ± 0.01 | N/A | |
| AMD Ryzen 4000 Series | 103.87 ± 0.02 | 9.63 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 89.46 ± 0.10 | 9.39 ± 0.06 | 5d46bab | Running on single GPU |
| Nvidia Tesla K40 | 64.37 ± 0.09 | 9.30 ± 0.19 | eec1e33 | |
| MediaTek Dimensity 9400 | 38.36 ± 15.15 | 8.92 ± 0.06 | b9ab0a4 | GPU supports coopmat but pp512 is faster with it turned off |
| Intel Core Ultra 100 Series | 185.51 ± 0.22 | 8.21 ± 0.07 | 1d72c84 | |
| AMD Ryzen 3000 Series | 48.63 ± 0.10 | 8.49 ± 0.01 | 1fe0029 | |
| CIX CD8180 | 2.80 ± 0.01 | 5.51 ± 0.00 | 4dca015 | |
| Intel Core 1000 Series | 25.58 ± 0.00 | 4.25 ± 0.18 | N/A | |
| Intel Core 8000 Series | 25.43 ± 0.17 | 3.35 ± 0.03 | c4df49a | |
| Intel N150 | 28.84 ± 0.02 | 2.93 ± 0.00 | 4f63cd7 |
Llama 2 7B, Q4_0, FA enabled
| Chip | pp512 t/s | tg128 t/s | Commit | Comments |
|---|---|---|---|---|
| Nvidia RTX 5090 | 11796.38 ± 601.36 | 273.68 ± 0.52 | ca71fb9 | coopmat2 |
| AMD Radeon RX 7900 XTX | 3332.90 ± 11.47 | 195.30 ± 0.23 | 2f0c2db | |
| Nvidia RTX 5080 | 8054.59 ± 35.68 | 192.17 ± 0.21 | f6b533d | coopmat2 |
| Nvidia RTX 4090 | 10830.41 ± 36.25 | 190.10 ± 0.31 | 4ae88d0 | coopmat2 |
| Nvidia A100 | 7064.40 ± 1.63 | 170.56 ± 0.02 | 2257758 | coopmat2 |
| Nvidia RTX 3090 | 4732.33 ± 4.80 | 162.28 ± 0.21 | 4ae88d0 | coopmat2 |
| Nvidia RTX 4080 Super | 8007.37 ± 46.03 | 150.20 ± 0.26 | 81086cd | coopmat2 |
| Nvidia RTX 3080 | 4913.83 ± 21.52 | 145.74 ± 0.16 | 7c7d6ce | coopmat2 |
| Nvidia Tesla V100 | 1411.25 ± 2.12 | 142.13 ± 0.03 | 7d77f07 | |
| Nvidia RTX A5000 | 4071.22 ± 13.13 | 140.43 ± 0.22 | 4ae88d0 | coopmat2 |
| AMD Radeon RX 9070 XT | 4911.74 ± 28.52 | 138.20 ± 0.18 | e9fd8dc | |
| Nvidia RTX 5070 Ti | 6764.53 ± 11.95 | 135.65 ± 0.02 | d13d0f6 | coopmat2 |
| AMD Radeon AI Pro R9700 | 4333.83 ± 29.36 | 130.90 ± 0.12 | 3191462 | |
| AMD Radeon RX 7900 XT | 3043.93 ± 10.42 | 124.20 ± 0.09 | 71e74a3 | |
| AMD Radeon RX 7800 XT | 2094.64 ± 14.38 | 119.63 ± 0.13 | 4fdbc1e | |
| AMD Radeon RX 9070 | 3277.24 ± 18.17 | 119.55 ± 0.06 | 21c17b5 | |
| AMD Radeon RX 7900 GRE | 2402.07 ± 22.50 | 116.77 ± 0.08 | 4b2a477 | |
| Apple M3 Ultra | 1115.55 ± 0.75 | 115.99 ± 0.12 | 2d451c8 | MoltenVK |
| Intel Arc Pro B70 | 3314.53 ± 17.95 | 111.63 ± 0.05 | b863507 | |
| Nvidia Titan V | 792.74 ± 4.30 | 109.21 ± 0.72 | e56abd2 | |
| AMD Radeon Pro VII | 783.94 ± 0.77 | 108.45 ± 0.48 | N/A | |
| AMD Radeon RX 6900 XT | 1761.93 ± 4.75 | 106.15 ± 0.04 | a972fae | |
| Nvidia RTX 2080 Ti | 1936.25 ± 32.08 | 100.99 ± 0.24 | N/A | |
| AMD Radeon RX 6800 XT | 1704.79 ± 0.71 | 100.50 ± 0.06 | N/A | |
| AMD Radeon Pro W6800X Duo | 795.28 ± 0.72 | 100.08 ± 0.02 | N/A | |
| Nvidia RTX 5060 Ti | 3912.65 ± 5.86 | 97.01 ± 0.14 | 89f10ba | coopmat2 |
| AMD Radeon RX 6800 | 1749.46 ± 3.36 | 96.65 ± 0.48 | 4b385bf | |
| Nvidia RTX 4070 | 4293.57 ± 27.70 | 91.49 ± 0.89 | 9a48399 | coopmat2 |
| AMD Radeon RX 6750 XT | 997.05 ± 0.45 | 82.29 ± 0.06 | 228f34c | |
| AMD Radeon RX 6700 XT | 1010.90 ± 12.89 | 81.86 ± 0.19 | 6d75883 | |
| Nvidia RTX 3060 | 2012.88 ± 10.12 | 80.59 ± 0.02 | 92c0b38 | coopmat2 |
| AMD Radeon Pro V620 | 1556.31 ± 2.82 | 79.24 ± 0.09 | 03d4698 | |
| Nvidia RTX A4000 | 2482.74 ± 26.05 | 76.07 ± 0.08 | f5245b5 | coopmat2 |
| Nvidia Tesla T10 | 1840.14 ± 1.22 | 76.05 ± 0.13 | 7f76692 | coopmat2 |
| AMD Radeon RX 5700 XT | 538.31 ± 0.35 | 74.43 ± 0.03 | 4fdbc1e | |
| Intel Arc B580 | 419.49 ± 3.37 | 72.00 ± 0.24 | 7f76692 | |
| Apple M4 Max | 557.46 ± 26.87 | 71.79 ± 4.16 | 1ece0cb6 | |
| AMD Radeon Pro W5700 | 446.98 ± 0.39 | 71.30 ± 0.24 | 23bc779 | |
| Intel Arc Pro B60 | 274.76 ± 0.27 | 70.54 ± 0.03 | 516a4ca | |
| AMD Radeon RX 9060 XT | 1915.41 ± 7.90 | 70.52 ± 0.16 | ed52f36 | |
| Nvidia Tesla P100 | 685.51 ± 0.88 | 66.48 ± 0.02 | eec1e33 | |
| AMD Radeon RX 6650 XT | 1088.90 ± 0.40 | 64.53 ± 0.75 | dbb852b | |
| Nvidia GTX 1080 Ti | 529.96 ± 0.38 | 64.63 ± 0.10 | 360d653 | |
| AMD BC-250 | 356.87 ± 1.24 | 63.14 ± 0.09 | 5886f4f | |
| Nvidia RTX 3070 Mobile | 1832.07 ± 57.14 | 62.92 ± 0.37 | ceff6bb | coopmat2 |
| Nvidia RTX 4060 Mobile | 2358.03 ± 12.17 | 60.01 ± 0.08 | a5c07dc | coopmat2 |
| Nvidia Tesla P40 | 484.37 ± 0.27 | 59.22 ± 0.15 | N/A | |
| Nvidia GTX 1660 Ti Mobile | 514.34 ± 0.88 | 57.30 ± 0.42 | b43556e | |
| AMD Radeon RX 7600 XT | 1024.38 ± 7.56 | 56.11 ± 0.02 | 01d8eaa | |
| AMD FirePro S9300 x2 | 243.33 ± 0.22 | 55.64 ± 0.06 | eec1e33 | Split across two GPUs |
| Nvidia GB10 | 3279.89 ± 26.78 | 53.64 ± 0.05 | b9da444 | coopmat2 |
| AMD Radeon RX 6600 | 808.76 ± 0.15 | 53.24 ± 0.03 | b1c70e2 | |
| Intel Arc A770 | 1119.68 + 30.25 | 53.07 + 0.09 | a69d54f | |
| AMD Ryzen AI Max+ 395 | 1357.07 ± 10.94 | 53.00 ± 0.13 | 7f76692 | |
| AMD Radeon RX Vega 56 | 428.54 ± 0.50 | 52.66 ± 0.03 | 92c0b38 | |
| Intel Arc B570 | 288.51 ± 0.09 | 50.49 ± 0.05 | 7f76692 | |
| Nvidia P104-100 | 325.30 ± 0.25 | 48.64 ± 0.04 | eec1e33 | |
| AMD Radeon Pro V340 | 360.23 ± 0.74 | 47.54 ± 0.06 | 9da3dcd | Split across two GPUs |
| AMD Radeon RX 6800M | 784.16 ± 2.76 | 49.06 ± 0.34 | 8e6f8bc | |
| AMD Radeon RX Vega 64 | 320.12 ± 0.22 | 47.06 ± 0.01 | ec428b0 | |
| Nvidia RTX A2000 | 1361.85 ± 3.26 | 45.69 ± 0.20 | b1afcab | coopmat2 |
| Intel Arc A770M | 384.74 ± 0.78 | 45.68 ± 0.06 | eeee367 | |
| Intel Arc A750 | 303.37 ± 1.44 | 43.96 ± 0.03 | c1b1876 | |
| Nvidia GTX 1070 Ti | 292.85 ± 0.23 | 43.42 ± 0.34 | 860a9e4 | eGPU |
| Nvidia GTX 1070 | 330.84 ± 1.02 | 43.33 ± 0.06 | 360d653 | |
| Nvidia Tesla M40 | 93.35 ± 0.01 | 41.68 ± 0.01 | b8372ee | |
| Intel Arc Pro B50 | 132.48 ± 0.04 | 41.02 ± 0.04 | 7b43f55 | |
| AMD Radeon RX 470 | 197.26 ± 0.27 | 37.28 ± 0.11 | 3769fe6 | |
| AMD Radeon RX 480 | 194.52 ± 0.61 | 37.23 ± 0.09 | 0bcb40b | |
| Apple M2 Ultra | 198.83 ± 0.85 | 198.83 ± 0.85 | dbb852b | Asahi Linux |
| Nvidia GTX 980 | 180.97 ± 0.74 | 34.16 ± 0.10 | 860a9e4 | |
| Nvidia P106-100 | 183.40 ± 0.34 | 30.79 ± 0.32 | 23bc779 | |
| AMD FirePro W8100 | 140.52 ± 0.34 | 29.28 ± 0.14 | 4536363 | |
| Nvidia Tesla P4 | 287.14 ± 0.29 | 28.37 ± 0.24 | 24d2ee0 | |
| Nvidia Quadro P2000 | 181.71 ± 0.12 | 23.77 ± 0.02 | 63f8fe0 | |
| Intel Core Ultra 200 Series | 536.48 ± 1.27 | 23.05 ± 0.04 | cea560f | |
| AMD Ryzen AI 9 300 Series | 532.59 ± 3.55 | 22.31 ± 0.06 | N/A | |
| AMD Ryzen 6000 Series | 277.91 ± 0.37 | 21.15 ± 0.09 | ee09828 | |
| Apple M2 Pro | 58.86 ± 0.02 | 20.97 ± 0.03 | 1fe0029 | Asahi Linux |
| AMD Ryzen 8000 Series | 297.39 ± 1.22 | 20.59 ± 0.38 | a5c07dc | |
| AMD Ryzen 7000 Series | 312.85 ± 2.51 | 20.09 ± 0.35 | 835b2b9 | |
| Nvidia GTX 1050 Ti | 127.54 ± 1.03 | 20.08 ± 0.17 | 2f0c2db | |
| AMD Radeon Pro WX 4100 | 75.59 ± 0.19 | 16.56 ± 0.04 | 860a9e4 | |
| Apple M1 | 35.93 ± 0.00 | 12.85 ± 0.02 | 2370665 | Asahi Linux |
| Apple M2 | 46.81 ± 0.08 | 12.25 ± 2.30 | 8c0d6bb | Asahi Linux |
| AMD Ryzen 5000 Series | 79.06 ± 0.01 | 10.75 ± 0.00 | 5d195f1 | |
| Intel Core 1100 Series | 174.77 ± 4.47 | 10.58 ± 0.03 | abb9f3c | |
| Nvidia Tesla K40 | 64.37 ± 0.02 | 9.92 ± 0.06 | eec1e33 | |
| AMD Ryzen 4000 Series | 113.32 ± 0.01 | 9.87 ± 0.01 | 4b385bf | |
| Nvidia Tesla K80 | 88.26 ± 0.19 | 9.49 ± 0.01 | 5d46bab | Running on single GPU |
| AMD Ryzen 5 3000 Series | 47.41 ± 0.14 | 8.47 ± 0.01 | 1fe0029 | |
| Intel Core Ultra 100 Series | 77.66 ± 2.75 | 7.75 ± 0.05 | 2e89f76 | |
| Intel Core 8000 Series | 25.55 ± 0.04 | 3.35 ± 0.02 | c4df49a | |
| Intel N150 | 25.59 ± 0.00 | 2.91 ± 0.00 | 4f63cd7 |
???????????
- ?? g128 ? pp512 ???????????????? ??????????????? g128 ??????? prompt ???????? pp512 ????????
- ????????????????????????? Nvidia ????? CUDA?AMD ?? ROCm ? Vulkan????????????? Vulkan ?????????
- ??? FA ??????? ??? GPU ?? FA ?????? pp512 ??????????????????????????????
????????
llama.cpp ??????????pp512? g128?Q4_0?FA?CUDA / ROCm / Vulkan ?????????????????????????????????????????
?????
- CUDA discussion #15013: https://github.com/ggml-org/llama.cpp/discussions/15013
- Apple Silicon discussion #4167: https://github.com/ggml-org/llama.cpp/discussions/4167
- ROCm discussion #15021: https://github.com/ggml-org/llama.cpp/discussions/15021
- Vulkan discussion #10879: https://github.com/ggml-org/llama.cpp/discussions/10879