llama.cpp can work directly with GGUF models hosted on Hugging Face, so you do not always need to download model files manually first.
If a model repository already provides GGUF files, you can use the -hf argument in the CLI, for example:
|
|
By default, this downloads from Hugging Face.
If you use another service that exposes a Hugging Face compatible API, you can switch the download endpoint with the MODEL_ENDPOINT environment variable.
One important detail is that llama.cpp only works directly with the GGUF format.
If your model is in another format, you need to convert it first with the convert_*.py scripts provided in the repository.
Hugging Face also offers several online tools related to llama.cpp, including:
- converting models to
GGUF - quantizing weights to reduce size
- converting LoRA adapters
- editing GGUF metadata in the browser
- hosting
llama.cppinference endpoints
If you only want the practical takeaway, start with repositories that already provide GGUF, then use llama-cli -hf <user>/<model>. In most cases, that is the simplest path.