llama.cpp ???????CUDA?ROCm?Vulkan ?????pp512 / tg128 / FA ????

?? 2026-04-23 ? GitHub Discussions ????? scoreboard ????? llama.cpp ? CUDA?ROCm?Vulkan ???? GPU ??????? pp512?tg128?Q4_0?FA ??????????

???????

Q4_0 ???

Q4_0 ??? 4-bit ???????????????????????????????????????????????? Llama 2 7B, Q4_0?????????????????

pp512 ???

pp512 ??????? prompt processing 512 tokens?????? 512 ??? token ???????

  • pp = prompt processing
  • 512 = ????? 512 token
  • /s = tokens per second ????????????????????????????

g128 ???

g128 ??????? 	ext generation 128 tokens???????? 128 ? token ?????
  • g = text generation
  • 128 = ???? 128 token
  • /s = tokens per second ????????????????????

FA ???

FA ?? Flash Attention?

  • with FA ???? Flash Attention

o FA ???? Flash Attention ??????? FA ???pp512 ????? g128 ????????????????????????

/s ???

/s ?? 	okens per second??????????????????????????????
  • ??? pp512 ? g128 ?????
  • ??? o FA ? with FA ???
  • ??? CUDA?ROCm?Vulkan ?????????

????

  • CUDA ????? llama.cpp GPU ???????????????? Nvidia ??? pp512 ?????
  • ROCm ??? AMD ??? Instinct ??????????????
  • Vulkan ????????????????????
  • g128 ????????pp512 ??????????

CUDA ????

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14073.41 ± 115.16 290.02 ± 1.10 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 14854.63 ± 22.73 274.20 ± 0.14 79c1160 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 9918.34 ± 176.97 267.81 ± 1.54 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 4849.53 ± 8.94 190.88 ± 0.33 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 10293.86 ± 134.72 189.33 ± 0.19 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 11992.70 ± 107.99 186.21 ± 0.13 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 8297.36 ± 9.50 181.99 ± 0.42 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 6952.38 ± 13.73 176.85 ± 0.07 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 9229.23 ± 101.78 176.07 ± 0.26 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6567.49 ± 20.30 171.19 ± 3.98 9c35706 @slaren
RTX 3090 24 GB / GDDR6X / 384 bit 5174.69 ± 21.83 158.16 ± 0.21 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 8870.49 ± 378.76 152.01 ± 0.28 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 8125.15 ± 41.05 148.33 ± 0.20 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 8031.64 ± 26.49 142.49 ± 0.16 20638e4 @Ristovski
RTX 3080 10 GB / GDDR6X / 320 bit 5013.86 ± 24.80 139.65 ± 0.99 9c35706 @slaren
RTX A6000 48 GB / GDDR6 / 384 bit 4913.93 ± 6.79 138.73 ± 2.75 4795c91 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 6924.53 ± 13.87 132.26 ± 0.16 9c35706 @Ristovski
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 4992.83 ± 113.52 131.66 ± 0.20 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4028.16 ± 19.14 130.07 ± 2.74 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 3042.64 ± 40.71 129.08 ± 0.05 51f5a45 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5184.75 ± 18.70 127.54 ± 0.46 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 4609.01 ± 10.67 124.11 ± 0.17 3470a5c @Hedede
A30 24 GB / HBM2e / 3072 bit 2767.10 ± 1.88 124.81 ± 0.16 583cb83 @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2617.46 ± 2.10 108.79 ± 0.05 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 2890.66 ± 2.42 107.51 ± 0.21 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 2751.18 ± 19.43 102.77 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 2709.95 ± 3.35 102.68 ± 0.03 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 2827.20 ± 66.43 97.32 ± 2.80 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 3737.25 ± 6.79 90.94 ± 0.02 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2088.34 ± 1.94 88.06 ± 0.28 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2684.06 ± 15.28 83.77 ± 0.37 65349f2 @TinyServal
Titan Xp 12 GB / GDDR5X / 384 bit 1154.96 ± 1.46 76.08 ± 0.08 c4510dc @Hedede
RTX 3060 12 GB / GDDR6 / 192 bit 2137.50 ± 10.12 75.57 ± 0.07 baa9255 @QuantiusBenignus
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1536.89 ± 0.90 65.62 ± 0.62 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3394.63 ± 7.44 63.86 ± 0.01 89d1029 @mike-llamacpp
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1084.41 ± 3.01 62.49 ± 0.06 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 2779.77 ± 9.91 61.83 ± 0.04 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1420.24 ± 1.95 60.04 ± 0.01 5c0eb5e @ggerganov
Tesla P100 16 GB / HBM2 / 4096 bit 760.80 ± 2.92 58.35 ± 0.00 b8372ee @Hedede
DGX Spark 128 GB / LPDDR5x 3062.31 ± 11.02 57.21 ± 0.06 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1007.42 ± 1.23 54.74 ± 0.07 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 1956.22 ± 7.74 50.62 ± 0.04 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1219.06 ± 4.18 46.38 ± 0.73 d32e03f @pt13762104
RTX 4050 Laptop 6 GB / GDDR6 / 96 bit 1725.85 + 17.85 43.72 + 0.41 d79d8f3 @TimCabbage
GTX 1660 6 GB / GDDR5 / 192 bit 148.91 ± 0.01 41.35 ± 0.02 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 282.65 ± 0.15 38.04 ± 0.02 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 714.44 ± 2.04 37.82 ± 0.02 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 991.31 ± 1.15 33.58 ± 0.14 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 514.53 ± 3.06 33.29 ± 0.00 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 406.94 ± 0.25 30.40 ± 0.02 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 416.85 ± 1.75 27.79 ± 0.02 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 79.44 ± 0.01 27.82 ± 0.18 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 309.30 ± 0.05 23.63 ± 0.00 baa9255 @TinyServal
Quadro P1000 4 GB / GDDR5 / 128 bit 183.40 ± 0.11 13.99 ± 0.13 1e74897 @aleksyx
Tesla K80 12 GB / GDDR5 / 384 bit 133.14 ± 0.55 13.80 ± 0.02 32732f2 @pebaryan

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
RTX 5090 32 GB / GDDR7 / 512 bit 14970.15 ± 381.06 300.40 ± 0.28 8cf6b42 @totaldev
RTX PRO 6000 Blackwell 96 GB / GDDR7 / 512 bit 16618.98 ± 20.66 281.11 ± 0.41 5143fa8 @Tom94
H100 80 GB 80 GB / HBM3 / 5120 bit 11263.29 ± 98.34 280.74 ± 1.17 5143fa8 @Hedede
A100 80 GB 80 GB / HBM2e / 5120 bit 5285.96 ± 6.58 200.90 ± 0.12 5143fa8 @Hedede
RTX 4090 D 24 GB / GDDR6X / 384 bit 12506.97 ± 11.51 191.57 ± 0.03 79c1160 @autonomous-AI-lab
RTX 4090 24 GB / GDDR6X / 384 bit 14770.63 ± 102.93 188.96 ± 0.05 2241453 @lhl
RTX 5080 16 GB / GDDR7 / 256 bit 9487.70 ± 21.89 184.68 ± 0.05 8a4280c @Hedede
RTX 5070 Ti 16 GB / GDDR7 / 256 bit 8419.56 ± 35.50 182.43 ± 0.09 933414c @TinyServal
RTX 6000 Ada 48 GB / GDDR6 / 384 bit 10576.85 ± 530.21 179.47 ± 0.32 b8e09f0 @Hedede
RTX 3090 Ti 24 GB / GDDR6X / 384 bit 6924.01 ± 10.76 172.26 ± 1.31 9c35706 @slaren
RTX PRO 4500 Blackwell 32 GB / GDDR7 / 256 bit 7251.66 ± 92.40 168.90 ± 0.20 becc481 @Hedede
RTX 3090 24 GB / GDDR6X / 384 bit 5560.06 ± 16.28 161.89 ± 0.18 c76b420 @m18coppola
L40 48 GB / GDDR6 / 384 bit 10097.64 ± 671.22 153.76 ± 0.12 ee09828 @Hedede
RTX 4080 SUPER 16 GB / GDDR6X / 256 bit 9439.01 ± 56.75 147.48 ± 1.41 81086cd @zacharyarnaise
RTX 4080 16 GB / GDDR6X / 256 bit 9205.93 ± 22.31 143.47 ± 0.02 20638e4 @Ristovski
RTX A6000 48 GB / GDDR6 / 384 bit 5662.39 ± 13.87 144.87 ± 0.18 4795c91 @Hedede
RTX 3080 10 GB / GDDR6X / 320 bit 5569.56 ± 14.04 139.95 ± 0.95 9c35706 @slaren
RTX PRO 4000 Blackwell 24 GB / GDDR7 / 192 bit 5674.44 ± 139.53 136.38 ± 0.13 7d77f07 @Hedede
RTX A5000 24 GB / GDDR6 / 384 bit 4552.15 ± 9.68 135.83 ± 0.11 e5155e6 @Hedede
Tesla V100 32 GB / HBM2 / 4096 bit 2973.78 ± 3.62 134.76 ± 0.02 51f5a45 @Hedede
RTX 4070 Ti SUPER 16 GB / GDDR6X / 256 bit 7612.32 ± 37.35 132.85 ± 0.31 9c35706 @Ristovski
A30 24 GB / HBM2e / 3072 bit 3068.72 ± 0.63 131.93 ± 0.18 583cb83 @Hedede
RTX 5070 12 GB / GDDR7 / 192 bit 5783.44 ± 36.95 128.21 ± 2.52 @Spyro000 -
A40 48 GB / GDDR6 / 384 bit 5256.38 ± 19.39 126.24 ± 0.06 3470a5c @Hedede
Titan V 12 GB / HBM2 / 3072 bit 2481.25 ± 1.31 112.17 ± 0.01 e56abd2 @Hedede
RTX 2080 Ti 11 GB / GDDR6 / 352 bit 3107.61 ± 4.34 109.17 ± 0.07 9c35706 @ariya
Quadro RTX 6000 24 GB / GDDR6 / 384 bit 3053.96 ± 1.37 104.38 ± 0.04 b8e09f0 @Hedede
Quadro RTX 8000 48 GB / GDDR6 / 384 bit 3052.35 ± 5.64 103.63 ± 0.02 b8e09f0 @Hedede
RTX A4500 20 GB / GDDR6 / 320 bit 3453.10 ± 49.19 103.00 ± 0.25 5cdb27e @aleksyx
RTX 5060 Ti 16 GB 16 GB / GDDR7 / 128 bit 4195.53 ± 1.98 93.46 ± 0.01 89d1029 @mike-llamacpp
RTX 2070 SUPER 8 GB / GDDR6 / 256 bit 2293.29 ± 5.91 87.71 ± 0.29 bc07349 @phstudy
RTX A4000 16 GB / GDDR6 / 256 bit 2807.83 ± 52.44 85.17 ± 0.66 65349f2 @TinyServal
RTX 3060 12 GB / GDDR6 / 192 bit 2407.67 ± 3.73 76.92 ± 0.03 baa9255 @QuantiusBenignus
Titan Xp 12 GB / GDDR5X / 384 bit 1218.12 ± 1.82 73.84 ± 0.04 c4510dc @Hedede
Quadro RTX 4000 8 GB / GDDR6 / 256 bit 1662.80 ± 2.04 67.62 ± 0.67 7d77f07 @Hedede
RTX 4060 Ti 8 GB 8 GB / GDDR6 / 128 bit 3803.45 ± 70.80 64.03 ± 0.53 89d1029 @mike-llamacpp
Tesla P100 16 GB / HBM2 / 4096 bit 787.36 ± 3.27 61.99 ± 0.00 b8372ee @Hedede
GTX 1080 Ti 11 GB / GDDR5X / 352 bit 1138.14 ± 2.02 61.38 ± 0.03 9c35706 @ariya
RTX A4000 Ada 20 GB / GDDR6 / 160 bit 3171.86 ± 4.34 61.37 ± 0.01 a74a0d6 @sdwolfz
RTX 2060 SUPER 8 GB / GDDR6 / 256 bit 1563.77 ± 0.51 61.13 ± 0.05 5c0eb5e @ggerganov
DGX Spark 128 GB / LPDDR5x 3661.37 ± 38.66 56.74 ± 0.03 5acd455 @ggerganov
Tesla P40 24 GB / GDDR5 / 384 bit 1079.66 ± 0.18 53.73 ± 0.05 c76b420 @m18coppola
RTX 2000 Ada 16 GB / GDDR6 / 128 bit 2250.14 ± 5.91 50.71 ± 0.01 756cfea @DigitalRudeness
Tesla T4 16 GB / GDDR6 / 256 bit 1309.73 ± 1.02 44.03 ± 0.57 d32e03f @pt13762104
GTX 1660 6 GB / GDDR5 / 192 bit 154.45 ± 0.52 41.43 ± 0.01 9515c61 @ariya
Tesla M40 24 GB / GDDR5 / 384 bit 290.17 ± 0.11 39.98 ± 0.01 97d5117 @Hedede
GTX 1070 Ti 8 GB / GDDR5 / 256 bit 790.52 ± 2.39 37.87 ± 0.00 79c1160 @pebaryan
Jetson AGX Orin 64 GB / LPDDR5 / 256 bit 1171.96 ± 4.70 35.88 ± 0.18 c1b1876 @TinyServal
Tesla P4 8 GB / GDDR5 / 256 bit 529.53 ± 2.12 33.12 ± 0.03 c76b420 @m18coppola
P106-100 6 GB / GDDR5 / 192 bit 438.49 ± 0.38 30.64 ± 0.06 5fd160b @pebaryan
GTX 1060 6 GB / GDDR5 / 192 bit 446.19 ± 0.81 28.18 ± 0.01 5fd160b @pebaryan
Quadro T1000 4 GB / GDDR5 / 128 bit 27.46 ± 0.23 27.46 ± 0.23 f6da8cb @hanabu
Quadro P2000 5 GB / GDDR5 / 160 bit 311.55 ± 0.19 23.76 ± 0.01 baa9255 @TinyServal
Tesla K80 12 GB / GDDR5 / 384 bit 133.36 ± 0.60 14.27 ± 0.32 32732f2 @pebaryan
Quadro P1000 4 GB / GDDR5 / 128 bit 173.82 ± 0.02 13.65 ± 0.14 1e74897 @aleksyx

Apple Silicon ????

#4167 ??????????????????????? Q4_0????? F16 ? Q8_0?????????? PP / TG / t/s? ????????

  • PP = prompt processing
  • TG = ext-generation
  • /s = okens per second ????????????????? M2 Ultra ????? FA ???????
    ?? ?? ??/?? ?? GB/s GPU ?? F16 PP F16 TG Q8_0 PP Q8_0 TG Q4_0 PP Q4_0 TG
    2023-11-21 M2 Ultra 8e672ef 800 76 1401.85 41.02 1248.59 66.64 1238.48 94.27
    2024-11-12 M2 Ultra 86ed72d + FA 800 76 1525.95 43.15 1368.18 73.11 1391.78 108.80
    2025-08-02 M2 Ultra 5c0eb5e + FA 800 76 1561.35 43.24 1386.97 73.35 1412.42 109.41
    ????????????? Apple Silicon ???
    ?? Q4_0 PP Q4_0 TG Q8_0 PP Q8_0 TG F16 PP F16 TG
    —: —: —: —: —: —:
    M1 Pro 16 GPU 266.25 36.41 270.37 22.34 302.14 12.75
    M2 Ultra 76 GPU 1238.48 94.27 1248.59 66.64 1401.85 41.02
    M3 Max 40 GPU 690.99 65.85 749.37 43.00 794.26 25.27

ROCm / HIP ????

Llama 2 7B, Q4_0, no FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11476.40 ± 72.79 232.92 ± 0.53 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3552.27 ± 101.96 167.11 ± 0.50 2f0c2db @Diablo-D3
Instinct MI210 64 GB / HBM2e / 4096 bit 2486.22 ± 9.58 124.51 ± 0.04 8160b38 @65a
Pro W7900 48 GB / GDDR6 / 384 bit 3213.17 ± 80.47 121.18 ± 0.06 8160b38 @65a
RX 7900 XT 20 GB / GDDR6 / 320 bit 3098.38 ± 24.02 116.15 ± 0.06 1e15bfd @AdamNiederer
RX 9070 16 GB / GDDR6 / 256 bit 2381.77 ± 3.68 114.48 ± 0.60 d0660f2 @andj1210
Instinct MI100 32 GB / HBM2 / 4096 bit 2732.83 ± 1.98 110.48 ± 0.14 9c35706 @firefox42
RX 9070 XT 16 GB / GDDR6 / 256 bit 5055.19 ± 109.58 101.27 ± 0.27 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2151.81 + 17.94 100.94 + 0.10 00131d6 @olegshulyakov
Instinct MI50 32 GB / HBM2 / 4096 bit 1057.24 ± 0.53 98.95 ± 0.25 97d5117 @wtarreau
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1456.98 ± 12.39 96.07 ± 0.10 6fa3b55 @MihaiBojescu
AI PRO R9700 32 GB / GDDR6 / 256 bit 4443.54 ± 339.25 93.84 ± 0.26 bd4ef13 @gogich77
Instinct MI60 32 GB / HBM2 / 4096 bit 1289.11 ± 0.62 91.46 ± 0.13 504af20 @Said-Akbar
RX 6900 XT 16 GB / GDDR6 / 256 bit 1889.84 ± 31.21 88.49 ± 0.00 a972fae @notgood
Pro VII 16 GB / HBM2 / 4096 bit 1064.99 ± 1.18 87.45 ± 0.04 2739a71 @8XXD8
RX 6800 XT 16 GB / GDDR6 / 256 bit 1447.07 ± 1.36 83.92 ± 0.03 79c1160 @MrLavender
Pro V620 32 GB / GDDR6 / 256 bit 1803.65 ± 2.54 74.66 ± 0.01 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1419.67 ± 3.64 67.58 ± 0.24 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 354.17 ± 0.18 67.55 ± 0.04 c05e8c9 @daniandtheweb
Instinct MI25 16 GB / HBM2 / 2048 bit 409.83 ± 0.23 63.94 ± 0.06 2739a71 @8XXD8
AI Max+ 395 128 GB / LPDDR5 911.36 ± 1.79 50.01 ± 0.07 e60f241 @firefox42
RX 7600 XT 16 GB / GDDR6 / 128 bit 1099.64 ± 2.05 48.58 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 240.68 ± 0.09 48.46 ± 0.09 ec428b0 @davispuh
Radeon 8060S System Shared / DDR5 351.36 ± 0.67 47.97 ± 0.33 1d0125b @hspak
Radeon 880M System Shared / DDR5 163.25 ± 13.86 12.97 ± 1.63 c55d53a @Hedede

Llama 2 7B, Q4_0, with FA

Chip Memory pp512 t/s tg128 t/s Commit Thanks to
Instinct MI300X 192 GB / HBM3 / 8192 bit 11945.97 ± 54.29 218.53 ± 0.09 ee3a9fc @yeahdongcn
RX 7900 XTX 24 GB / GDDR6 / 384 bit 3874.25 ± 11.92 170.12 ± 0.56 2f0c2db @Diablo-D3
Pro W7900 48 GB / GDDR6 / 384 bit 3472.86 ± 52.86 127.43 ± 0.12 8160b38 @65a
Instinct MI210 64 GB / HBM2e / 4096 bit 2571.82 ± 2.89 130.18 ± 0.06 8160b38 @65a
RX 9070 16 GB / GDDR6 / 256 bit 2452.68 ± 1.33 115.32 ± 0.52 d0660f2 @andj1210
RX 7900 XT 20 GB / GDDR6 / 320 bit 3261.75 ± 9.09 112.30 ± 0.06 1e15bfd @AdamNiederer
Instinct MI50 32 GB / HBM2 / 4096 bit 1129.43 ± 0.15 105.82 ± 0.07 97d5117 @wtarreau
Instinct MI100 32 GB / HBM2 / 4096 bit 2755.00 ± 3.68 104.71 ± 0.10 9c35706 @firefox42
AI PRO R9700 32 GB / GDDR6 / 256 bit 4773.07 ± 49.30 97.98 ± 0.13 bd4ef13 @gogich77
RX 7900 GRE 16 GB / GDDR6 / 256 bit 1598.79 ± 11.48 97.53 ± 0.06 6fa3b55 @MihaiBojescu
RX 9070 XT 16 GB / GDDR6 / 256 bit 4903.51 ± 96.36 97.28 ± 0.13 583cb83 @Hadrianneue
RX 7800 XT 16 GB / GDDR6 / 256 bit 2304.63 + 2.85 95.99 + 0.21 00131d6 @olegshulyakov
RX 6900 XT 16 GB / GDDR6 / 256 bit 1948.31 ± 13.51 85.04 ± 0.02 a972fae @notgood
Pro V620 32 GB / GDDR6 / 256 bit 1256.86 ± 0.55 70.83 ± 0.02 5c0eb5e @samteezy
RX 9060 XT 16 GB / GDDR6 / 256 bit 1479.27 ± 0.71 65.42 ± 0.19 a0e13dc @lcy0321
RX 5700 XT 8 GB / GDDR6 / 256 bit 314.17 ± 0.29 62.02 ± 0.05 c05e8c9 @daniandtheweb
AI Max+ 395 128 GB / LPDDR5 1003.53 ± 2.91 49.87 ± 0.02 e60f241 @firefox42
Radeon 8060S System Shared / DDR5 366.08 ± 1.44 48.97 ± 0.15 1d0125b @hspak
RX 7600 XT 16 GB / GDDR6 / 128 bit 1199.16 ± 1.07 47.65 ± 0.06 9c35706 @wbruna
RX Vega 64 8 GB / HBM2 / 2048 bit 153.17 ± 0.72 42.46 ± 0.40 ec428b0 @davispuh
Radeon 880M System Shared / DDR5 213.31 ± 14.05 16.16 ± 1.41 c55d53a @Hedede

Vulkan ????

Llama 2 7B, Q4_0, no FA

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 10381.64 ± 508.84 263.63 ± 0.91 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3531.93 ± 31.74 191.28 ± 0.20 2f0c2db
Nvidia RTX 4090 9452.03 ± 187.70 187.97 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 5080 7444.99 ± 20.11 185.10 ± 0.54 f6b533d coopmat2
Nvidia A100 6389.86 ± 4.83 160.78 ± 0.16 2257758 coopmat2
Nvidia RTX 3090 4298.97 ± 10.59 160.13 ± 0.25 4ae88d0 coopmat2
Nvidia RTX 4080 Super 7101.18 ± 269.79 147.13 ± 5.64 81086cd coopmat2
Nvidia RTX 3080 4287.11 ± 55.50 139.15 ± 0.05 7c7d6ce coopmat2
Nvidia RTX A5000 3641.55 ± 9.05 139.89 ± 0.69 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 5036.04 ± 88.16 137.11 ± 0.02 e9fd8dc
Nvidia RTX 5070 Ti 6213.63 ± 27.72 135.63 ± 0.18 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4036.04 ± 34.58 130.19 ± 0.39 3191462
Nvidia Tesla V100 1391.39 ± 1.19 129.58 ± 0.58 7d77f07
Nvidia RTX 4070 Ti Super 6099.18 ± 154.30 129.45 ± 0.18 4ae88d0 coopmat2
AMD Radeon RX 7900 XT 2941.58 ± 17.17 123.18 ± 0.40 71e74a3
AMD Radeon RX 9070 3164.10 ± 66.84 119.71 ± 3.40 21c17b5
AMD Radeon RX 7800 XT 2017.33 ± 19.30 118.27 ± 0.27 4fdbc1e
AMD Radeon RX 7900 GRE 2336.31 ± 7.52 116.11 ± 0.26 4b2a477
Apple M3 Ultra 1116.83 ± 0.55 115.54 ± 0.78 2d451c8 MoltenVK
Intel Arc Pro B70 3379.00 ± 47.92 112.02 ± 1.08 b863507
Nvidia Titan V 984.36 ± 4.13 108.86 ± 0.28 e56abd2
AMD Radeon Pro VII 1078.54 ± 0.86 107.82 ± 0.14 N/A
AMD Radeon RX 6900 XT 1837.21 ± 25.44 104.60 ± 0.30 a972fae
Intel Arc Pro A60 2261.11 ± 9.53 104.25 ± 0.07 97d5117
AMD Radeon RX 6800 XT 1752.92 ± 1.71 100.32 ± 0.97 N/A
AMD Radeon VII 1059.14 ± 0.56 101.19 ± 0.53 77d6ae4
Nvidia RTX 2080 Ti 1888.24 ± 9.20 97.58 ± 6.60 N/A
AMD Radeon RX 6800 1698.69 ± 0.80 95.61 ± 0.19 4b385bf
AMD Radeon Pro W6800X Duo 687.71 ± 4.33 94.82 ± 0.12 N/A
Nvidia RTX 5060 Ti 3460.92 ± 7.16 93.51 ± 0.15 89f10ba coopmat2
Nvidia RTX 4070 3179.37 ± 46.16 92.29 ± 0.28 9a48399
AMD Radeon Pro W6800X 510.80 ± 0.13 86.47 ± 0.46 13b4548 MoltenVK
AMD Radeon RX 6700 XT 1051.20 ± 0.98 83.88 ± 0.08 6d75883
AMD Radeon RX 6750 XT 1040.58 ± 0.35 81.98 ± 0.03 228f34c
AMD Radeon Pro V620 1595.32 ± 1.59 81.78 ± 0.06 03d4698
Nvidia RTX 3070 2113.02 ± 7.38 78.71 ± 0.13 1b8fb81
AMD Radeon Instinct MI60 369.26 ± 2.48 78.16 ± 1.40 504af20
Nvidia RTX 3060 1815.70 ± 5.85 75.94 ± 0.80 92c0b38 coopmat2
Apple M4 Max 724.77 ± 20.93 75.02 ± 0.14 1ece0cb6
Nvidia Tesla T10 1692.70 ± 2.05 75.01 ± 0.21 7f76692 coopmat2
Nvidia RTX A4000 2248.14 ± 7.59 73.74 ± 0.08 f5245b5 coopmat2
AMD Radeon RX 5700 XT 529.69 ± 0.26 70.73 ± 0.04 4fdbc1e
AMD Radeon RX 9060 XT 2141.67 ± 6.87 70.54 ± 0.74 ed52f36
Intel Arc B580 620.94 ± 15.33 70.14 ± 0.28 7f76692
AMD Radeon Pro V540 583.88 ± 6.56 69.64 ± 0.24 9da3dcd
AMD Radeon Pro W5700 449.85 ± 0.46 68.55 ± 0.15 23bc779
Intel Arc Pro B60 522.36 ± 3.60 68.55 ± 0.01 516a4ca
Nvidia GTX 1080 Ti 540.69 ± 0.71 64.99 ± 0.08 360d653
Nvidia RTX 2070 Super 1199.13 ± 7.70 64.64 ± 0.20 b7552cf
Nvidia RTX 3070 Mobile 1689.40 ± 19.57 63.64 ± 0.39 ceff6bb coopmat2
Nvidia Tesla P100 678.14 ± 1.40 63.16 ± 0.06 eec1e33
AMD BC-250 370.66 ± 0.04 62.32 ± 0.32 5886f4f
AMD Radeon RX 6650 XT 1029.52 ± 1.21 62.14 ± 0.02 dbb852b
Nvidia RTX 4060 Mobile 2135.66 ± 23.18 59.53 ± 0.03 a5c07dc coopmat2
Nvidia Tesla P40 488.06 ± 0.27 59.36 ± 0.16 N/A
Nvidia GTX 1660 Ti Mobile 511.67 ± 2.85 56.60 ± 0.07 b43556e
AMD Radeon Instinct MI25 439.42 ± 0.34 54.69 ± 0.03 2739a71
AMD Radeon RX 6600 XT 574.65 ± 0.86 53.92 ± 0.11 091592d
AMD Ryzen AI Max+ 395 1288.96 ± 6.49 53.59 ± 0.38 7f76692
AMD Radeon RX 7600 XT 840.85 ± 3.02 53.02 ± 0.01 01d8eaa
Intel Arc A770 1073.85 + 29.68 52.56 + 0.11 a69d54f
Nvidia GB10 2737.79 ± 19.56 52.28 ± 0.03 b9da444 coopmat2
AMD FirePro S9300 x2 247.26 ± 0.43 51.86 ± 0.11 eec1e33 Split across two GPUs
AMD Radeon RX 6600 761.89 ± 1.76 50.63 ± 0.02 b1c70e2
AMD Radeon RX Vega 56 439.87 ± 0.61 50.23 ± 0.14 92c0b38
Intel Arc B570 913.95 ± 0.90 49.64 ± 0.03 7f76692
Nvidia RTX 3060 Mobile 1059.76 ± 3.54 49.03 ± 0.13 dbb3a47
AMD Radeon RX 6800M 861.99 ± 7.67 48.71 ± 0.71 8e6f8bc
AMD Radeon RX 6600M 605.59 ± 0.65 48.21 ± 0.07 fe5b78c
Intel Arc A770M 875.92 ± 2.16 47.69 ± 0.16 eeee367
Nvidia P104-100 311.90 ± 0.22 46.18 ± 0.05 eec1e33
AMD Radeon RX Vega 64 356.08 ± 0.09 45.73 ± 0.18 ec428b0
Nvidia RTX A2000 1245.19 ± 8.76 45.52 ± 0.54 b1afcab coopmat2
AMD Radeon RX 7600M XT 459.39 ± 2.34 45.28 ± 0.10 b9ab0a4 eGPU
AMD Radeon Pro V340 375.41 ± 0.24 45.16 ± 0.06 9da3dcd Split across two GPUs
Nvidia GTX 1070 Ti 297.50 ± 0.54 42.86 ± 1.20 860a9e4 eGPU
Intel Arc A750 1075.94 ± 13.89 42.66 ± 0.18 c1b1876
Nvidia RTX 4050 Mobile 1154.28 + 15.76 41.89 + 0.10 d79d8f3
Nvidia GTX 1070 321.57 ± 0.93 41.48 ± 0.09 eec1e33
Intel Arc Pro B50 193.50 ± 0.24 39.99 ± 0.10 7b43f55
Nvidia Tesla M40 92.48 ± 0.02 39.35 ± 1.22 b8372ee
AMD Radeon RX 580 258.03 ± 0.71 39.32 ± 0.03 de4c07f
AMD Radeon RX 470 218.07 ± 0.56 38.63 ± 0.21 e288693
AMD Radeon Pro W5500 315.39 ± 3.76 36.82 ± 0.38 860a9e4
AMD Radeon RX 480 248.66 ± 0.28 34.71 ± 0.14 3b15924
Apple M2 Ultra 205.98 ± 0.02 34.34 ± 0.12 dbb852b Asahi Linux
Nvidia GTX 980 186.24 ± 0.09 33.90 ± 0.51 860a9e4
Nvidia P106-100 183.78 ± 0.26 29.77 ± 0.04 23bc779
AMD FirePro W8100 155.22 ± 0.17 29.52 ± 0.05 4536363
Nvidia Tesla P4 265.54 ± 0.21 28.03 ± 0.14 24d2ee0
AMD Radeon RX 6500 XT 255.25 ± 0.35 27.81 ± 0.10 g9fdfcd
Apple M3 263.70 ± 0.02 26.39 ± 0.14 b9ab0a4 MoltenVK
AMD FirePro S10000 94.78 ± 0.02 25.32 ± 0.02 914a82d Split across two GPUs
Nvidia Quadro P2000 169.55 ± 0.17 23.05 ± 0.03 63f8fe0
Intel Core Ultra 200 Series 544.95 ± 4.15 22.49 ± 0.09 cea560f
AMD Ryzen AI 9 300 Series 479.07 ± 0.41 22.41 ± 0.18 N/A
AMD Ryzen 6000 Series 240.89 ± 0.52 21.26 ± 0.08 ee09828
Apple M2 Pro 62.70 ± 0.03 20.95 ± 0.11 1fe0029 Asahi Linux
Nvidia GTX 1050 Ti 136.42 ± 0.67 20.96 ± 0.21 2f0c2db
AMD Ryzen 8000 Series 266.19 ± 1.36 20.53 ± 0.08 a5c07dc
AMD Ryzen 7000 Series 281.62 ± 1.56 19.91 ± 0.07 ebce03e
AMD Ryzen Z1 Extreme 199.36 ± 7.02 18.77 ± 0.02 53ff6b9
AMD FirePro D700 69.95 ± 0.04 16.62 ± 0.01 d3bd719 MoltenVK, running in FP16 mode on FP32 only chip
AMD Radeon Pro WX 4100 78.79 ± 0.10 16.05 ± 0.07 860a9e4
Apple M2 50.79 ± 0.16 13.50 ± 0.02 8c0d6bb Asahi Linux
Apple M1 38.29 ± 0.00 12.47 ± 0.03 2370665 Asahi Linux
AMD Ryzen 5000 Series 90.55 ± 0.08 10.98 ± 0.07 d84635b
Intel Core 1100 Series 187.20 ± 1.78 10.39 ± 0.04 abb9f3c
AMD Radeon RX 550 52.66 ± 0.49 10.20 ± 0.01 N/A
AMD Ryzen 4000 Series 103.87 ± 0.02 9.63 ± 0.01 4b385bf
Nvidia Tesla K80 89.46 ± 0.10 9.39 ± 0.06 5d46bab Running on single GPU
Nvidia Tesla K40 64.37 ± 0.09 9.30 ± 0.19 eec1e33
MediaTek Dimensity 9400 38.36 ± 15.15 8.92 ± 0.06 b9ab0a4 GPU supports coopmat but pp512 is faster with it turned off
Intel Core Ultra 100 Series 185.51 ± 0.22 8.21 ± 0.07 1d72c84
AMD Ryzen 3000 Series 48.63 ± 0.10 8.49 ± 0.01 1fe0029
CIX CD8180 2.80 ± 0.01 5.51 ± 0.00 4dca015
Intel Core 1000 Series 25.58 ± 0.00 4.25 ± 0.18 N/A
Intel Core 8000 Series 25.43 ± 0.17 3.35 ± 0.03 c4df49a
Intel N150 28.84 ± 0.02 2.93 ± 0.00 4f63cd7

Llama 2 7B, Q4_0, FA enabled

Chip pp512 t/s tg128 t/s Commit Comments
Nvidia RTX 5090 11796.38 ± 601.36 273.68 ± 0.52 ca71fb9 coopmat2
AMD Radeon RX 7900 XTX 3332.90 ± 11.47 195.30 ± 0.23 2f0c2db
Nvidia RTX 5080 8054.59 ± 35.68 192.17 ± 0.21 f6b533d coopmat2
Nvidia RTX 4090 10830.41 ± 36.25 190.10 ± 0.31 4ae88d0 coopmat2
Nvidia A100 7064.40 ± 1.63 170.56 ± 0.02 2257758 coopmat2
Nvidia RTX 3090 4732.33 ± 4.80 162.28 ± 0.21 4ae88d0 coopmat2
Nvidia RTX 4080 Super 8007.37 ± 46.03 150.20 ± 0.26 81086cd coopmat2
Nvidia RTX 3080 4913.83 ± 21.52 145.74 ± 0.16 7c7d6ce coopmat2
Nvidia Tesla V100 1411.25 ± 2.12 142.13 ± 0.03 7d77f07
Nvidia RTX A5000 4071.22 ± 13.13 140.43 ± 0.22 4ae88d0 coopmat2
AMD Radeon RX 9070 XT 4911.74 ± 28.52 138.20 ± 0.18 e9fd8dc
Nvidia RTX 5070 Ti 6764.53 ± 11.95 135.65 ± 0.02 d13d0f6 coopmat2
AMD Radeon AI Pro R9700 4333.83 ± 29.36 130.90 ± 0.12 3191462
AMD Radeon RX 7900 XT 3043.93 ± 10.42 124.20 ± 0.09 71e74a3
AMD Radeon RX 7800 XT 2094.64 ± 14.38 119.63 ± 0.13 4fdbc1e
AMD Radeon RX 9070 3277.24 ± 18.17 119.55 ± 0.06 21c17b5
AMD Radeon RX 7900 GRE 2402.07 ± 22.50 116.77 ± 0.08 4b2a477
Apple M3 Ultra 1115.55 ± 0.75 115.99 ± 0.12 2d451c8 MoltenVK
Intel Arc Pro B70 3314.53 ± 17.95 111.63 ± 0.05 b863507
Nvidia Titan V 792.74 ± 4.30 109.21 ± 0.72 e56abd2
AMD Radeon Pro VII 783.94 ± 0.77 108.45 ± 0.48 N/A
AMD Radeon RX 6900 XT 1761.93 ± 4.75 106.15 ± 0.04 a972fae
Nvidia RTX 2080 Ti 1936.25 ± 32.08 100.99 ± 0.24 N/A
AMD Radeon RX 6800 XT 1704.79 ± 0.71 100.50 ± 0.06 N/A
AMD Radeon Pro W6800X Duo 795.28 ± 0.72 100.08 ± 0.02 N/A
Nvidia RTX 5060 Ti 3912.65 ± 5.86 97.01 ± 0.14 89f10ba coopmat2
AMD Radeon RX 6800 1749.46 ± 3.36 96.65 ± 0.48 4b385bf
Nvidia RTX 4070 4293.57 ± 27.70 91.49 ± 0.89 9a48399 coopmat2
AMD Radeon RX 6750 XT 997.05 ± 0.45 82.29 ± 0.06 228f34c
AMD Radeon RX 6700 XT 1010.90 ± 12.89 81.86 ± 0.19 6d75883
Nvidia RTX 3060 2012.88 ± 10.12 80.59 ± 0.02 92c0b38 coopmat2
AMD Radeon Pro V620 1556.31 ± 2.82 79.24 ± 0.09 03d4698
Nvidia RTX A4000 2482.74 ± 26.05 76.07 ± 0.08 f5245b5 coopmat2
Nvidia Tesla T10 1840.14 ± 1.22 76.05 ± 0.13 7f76692 coopmat2
AMD Radeon RX 5700 XT 538.31 ± 0.35 74.43 ± 0.03 4fdbc1e
Intel Arc B580 419.49 ± 3.37 72.00 ± 0.24 7f76692
Apple M4 Max 557.46 ± 26.87 71.79 ± 4.16 1ece0cb6
AMD Radeon Pro W5700 446.98 ± 0.39 71.30 ± 0.24 23bc779
Intel Arc Pro B60 274.76 ± 0.27 70.54 ± 0.03 516a4ca
AMD Radeon RX 9060 XT 1915.41 ± 7.90 70.52 ± 0.16 ed52f36
Nvidia Tesla P100 685.51 ± 0.88 66.48 ± 0.02 eec1e33
AMD Radeon RX 6650 XT 1088.90 ± 0.40 64.53 ± 0.75 dbb852b
Nvidia GTX 1080 Ti 529.96 ± 0.38 64.63 ± 0.10 360d653
AMD BC-250 356.87 ± 1.24 63.14 ± 0.09 5886f4f
Nvidia RTX 3070 Mobile 1832.07 ± 57.14 62.92 ± 0.37 ceff6bb coopmat2
Nvidia RTX 4060 Mobile 2358.03 ± 12.17 60.01 ± 0.08 a5c07dc coopmat2
Nvidia Tesla P40 484.37 ± 0.27 59.22 ± 0.15 N/A
Nvidia GTX 1660 Ti Mobile 514.34 ± 0.88 57.30 ± 0.42 b43556e
AMD Radeon RX 7600 XT 1024.38 ± 7.56 56.11 ± 0.02 01d8eaa
AMD FirePro S9300 x2 243.33 ± 0.22 55.64 ± 0.06 eec1e33 Split across two GPUs
Nvidia GB10 3279.89 ± 26.78 53.64 ± 0.05 b9da444 coopmat2
AMD Radeon RX 6600 808.76 ± 0.15 53.24 ± 0.03 b1c70e2
Intel Arc A770 1119.68 + 30.25 53.07 + 0.09 a69d54f
AMD Ryzen AI Max+ 395 1357.07 ± 10.94 53.00 ± 0.13 7f76692
AMD Radeon RX Vega 56 428.54 ± 0.50 52.66 ± 0.03 92c0b38
Intel Arc B570 288.51 ± 0.09 50.49 ± 0.05 7f76692
Nvidia P104-100 325.30 ± 0.25 48.64 ± 0.04 eec1e33
AMD Radeon Pro V340 360.23 ± 0.74 47.54 ± 0.06 9da3dcd Split across two GPUs
AMD Radeon RX 6800M 784.16 ± 2.76 49.06 ± 0.34 8e6f8bc
AMD Radeon RX Vega 64 320.12 ± 0.22 47.06 ± 0.01 ec428b0
Nvidia RTX A2000 1361.85 ± 3.26 45.69 ± 0.20 b1afcab coopmat2
Intel Arc A770M 384.74 ± 0.78 45.68 ± 0.06 eeee367
Intel Arc A750 303.37 ± 1.44 43.96 ± 0.03 c1b1876
Nvidia GTX 1070 Ti 292.85 ± 0.23 43.42 ± 0.34 860a9e4 eGPU
Nvidia GTX 1070 330.84 ± 1.02 43.33 ± 0.06 360d653
Nvidia Tesla M40 93.35 ± 0.01 41.68 ± 0.01 b8372ee
Intel Arc Pro B50 132.48 ± 0.04 41.02 ± 0.04 7b43f55
AMD Radeon RX 470 197.26 ± 0.27 37.28 ± 0.11 3769fe6
AMD Radeon RX 480 194.52 ± 0.61 37.23 ± 0.09 0bcb40b
Apple M2 Ultra 198.83 ± 0.85 198.83 ± 0.85 dbb852b Asahi Linux
Nvidia GTX 980 180.97 ± 0.74 34.16 ± 0.10 860a9e4
Nvidia P106-100 183.40 ± 0.34 30.79 ± 0.32 23bc779
AMD FirePro W8100 140.52 ± 0.34 29.28 ± 0.14 4536363
Nvidia Tesla P4 287.14 ± 0.29 28.37 ± 0.24 24d2ee0
Nvidia Quadro P2000 181.71 ± 0.12 23.77 ± 0.02 63f8fe0
Intel Core Ultra 200 Series 536.48 ± 1.27 23.05 ± 0.04 cea560f
AMD Ryzen AI 9 300 Series 532.59 ± 3.55 22.31 ± 0.06 N/A
AMD Ryzen 6000 Series 277.91 ± 0.37 21.15 ± 0.09 ee09828
Apple M2 Pro 58.86 ± 0.02 20.97 ± 0.03 1fe0029 Asahi Linux
AMD Ryzen 8000 Series 297.39 ± 1.22 20.59 ± 0.38 a5c07dc
AMD Ryzen 7000 Series 312.85 ± 2.51 20.09 ± 0.35 835b2b9
Nvidia GTX 1050 Ti 127.54 ± 1.03 20.08 ± 0.17 2f0c2db
AMD Radeon Pro WX 4100 75.59 ± 0.19 16.56 ± 0.04 860a9e4
Apple M1 35.93 ± 0.00 12.85 ± 0.02 2370665 Asahi Linux
Apple M2 46.81 ± 0.08 12.25 ± 2.30 8c0d6bb Asahi Linux
AMD Ryzen 5000 Series 79.06 ± 0.01 10.75 ± 0.00 5d195f1
Intel Core 1100 Series 174.77 ± 4.47 10.58 ± 0.03 abb9f3c
Nvidia Tesla K40 64.37 ± 0.02 9.92 ± 0.06 eec1e33
AMD Ryzen 4000 Series 113.32 ± 0.01 9.87 ± 0.01 4b385bf
Nvidia Tesla K80 88.26 ± 0.19 9.49 ± 0.01 5d46bab Running on single GPU
AMD Ryzen 5 3000 Series 47.41 ± 0.14 8.47 ± 0.01 1fe0029
Intel Core Ultra 100 Series 77.66 ± 2.75 7.75 ± 0.05 2e89f76
Intel Core 8000 Series 25.55 ± 0.04 3.35 ± 0.02 c4df49a
Intel N150 25.59 ± 0.00 2.91 ± 0.00 4f63cd7

????????

  1. ?????? g128 ?? pp512? ???????????????? g128????????????????pp512 ????
  2. ??????????? Nvidia ???? CUDA?AMD ??????? ROCm ? Vulkan???????????? Vulkan?
  3. ???? FA? ?????? FA ???pp512 ???? g128 ???????????????

?????

? llama.cpp ????pp512? g128?Q4_0?FA?CUDA / ROCm / Vulkan ????????????????????????????

????

记录并分享
使用 Hugo 建立
主題 StackJimmy 設計