How to Download a GGUF Model from Hugging Face and Import It into Ollama

If a model does not have a ready-made Ollama version, you can download the GGUF file from Hugging Face and import it into Ollama with a Modelfile.

If a model is not available in the official Ollama library, or if you want to use a specific GGUF file from Hugging Face, you can download it manually and then import it into Ollama.

Step 1: Download the GGUF file from Hugging Face

First, find the target model’s GGUF file on Hugging Face. You will usually see multiple quantized versions, such as:

  • Q4_K_M
  • Q5_K_M
  • Q8_0

Which version you choose depends on your VRAM, RAM, and your tradeoff between speed and quality. After downloading, place the .gguf file in a fixed directory so you can reference it from the Modelfile.

Step 2: Write the Modelfile

Create a Modelfile in the same directory as the model file. The most basic version looks like this:

1
FROM ./model.gguf

If the filename is different, replace it with the actual filename, for example:

1
FROM ./gemma-3-12b-it-q4_k_m.gguf

If your goal is just to get it running, this single FROM line is usually enough.

Step 3: Import it into Ollama

Then run:

1
ollama create myModelName -f Modelfile
  • myModelName is the local model name you want to use inside Ollama
  • -f Modelfile tells Ollama to create the model from that file

Once the creation succeeds, the GGUF file becomes a local model that you can call directly.

Step 4: Run the model

After creation, run:

1
ollama run myModelName

From that point on, it works much like a model pulled with ollama pull.

How to inspect an existing model’s Modelfile

If you are not sure how to write a Modelfile, you can inspect the configuration of an existing model directly:

1
ollama show --modelfile llama3.2

This command prints the Modelfile for llama3.2, which is useful as a reference for:

  • How FROM should be written
  • How the template and system prompt are structured
  • How parameters are declared

When this approach makes sense

This manual Hugging Face import flow is useful when:

  • The model you want is not available in Ollama’s official library
  • You want a specific quantized variant
  • You have already downloaded the GGUF file manually
  • You want finer control over how the model is packaged

If Ollama already provides an official version, using pull is usually simpler. But when you need a specific quantization or a custom wrapper, GGUF + Modelfile gives you more flexibility.

Common notes

  • The path after FROM must match the actual location of the .gguf file.
  • If the filename contains spaces or special characters, it is better to rename it first.
  • Different GGUF quantization levels can greatly affect memory use and speed, so successful import does not guarantee smooth runtime performance.
  • If the model is a chat model, you may still need to adjust the prompt template later for better results.

Conclusion

Downloading a GGUF file from Hugging Face and importing it into Ollama is not complicated. Prepare the model file, write a minimal Modelfile, then run ollama create, and you can bring a third-party GGUF model into your Ollama workflow.

记录并分享
Built with Hugo
Theme Stack designed by Jimmy