How to Get GGUF Models from Hugging Face with llama.cpp

llama.cpp can work directly with GGUF models hosted on Hugging Face, so you do not always need to download model files manually first.

If a model repository already provides GGUF files, you can use the -hf argument in the CLI, for example:

1

llama-cli -hf ggml-org/gemma-3-1b-it-GGUF

By default, this downloads from Hugging Face.
If you use another service that exposes a Hugging Face compatible API, you can switch the download endpoint with the MODEL_ENDPOINT environment variable.

One important detail is that llama.cpp only works directly with the GGUF format.
If your model is in another format, you need to convert it first with the convert_*.py scripts provided in the repository.

Hugging Face also offers several online tools related to llama.cpp, including:

converting models to GGUF
quantizing weights to reduce size
converting LoRA adapters
editing GGUF metadata in the browser
hosting llama.cpp inference endpoints

If you only want the practical takeaway, start with repositories that already provide GGUF, then use llama-cli -hf <user>/<model>. In most cases, that is the simplest path.