Local LLMs on KnightLi Blog

What are Ollama cloud models and how do you use them

Thu, 09 Apr 2026 18:42:32 +0800

If you already use Ollama to run local models, cloud models are easy to understand.

There is only one core difference:
local models run on your own machine, while cloud models run on Ollama’s cloud infrastructure and return the result to you.

What are Ollama cloud models

Ollama cloud models keep the Ollama workflow, but move the actual computation from your local machine to the cloud.

The main benefits are:

Less pressure on local hardware
Easier access to larger models that your machine cannot run well
You can keep using the familiar Ollama workflow

How they differ from local models

Item	Local models	Cloud models
Runtime location	Your machine	Cloud
Hardware requirements	High	Low
Latency	Usually lower	Affected by network
Privacy	Stronger	Requests are sent to the cloud

If you care more about privacy, low latency, and offline use, local models are a better fit.
If your hardware is limited but you still want to use larger models, cloud models are more convenient.

How to identify a cloud model

At the moment, Ollama cloud models are typically labeled with a -cloud suffix, for example:

`1`	`gpt-oss:120b-cloud`

The available model list may change over time, so the official Ollama pages should be treated as the source of truth.

How to use them

First, sign in:

`1`	`ollama signin`

After that, run a cloud model directly:

`1`	`ollama run gpt-oss:120b-cloud`

If you are calling it from code, you can also configure an API key:

`1`	`export OLLAMA_API_KEY=your_api_key`

Python example:

import os
from ollama import Client

client = Client(
    host="https://ollama.com",
    headers={"Authorization": "Bearer " + os.environ["OLLAMA_API_KEY"]},
)

messages = [
    {"role": "user", "content": "Why is the sky blue?"}
]

for part in client.chat("gpt-oss:120b-cloud", messages=messages, stream=True):
    print(part["message"]["content"], end="", flush=True)

Summary

Ollama cloud models can be summarized in one sentence:

the commands are almost the same, but the model is no longer running on your local machine.

If your computer cannot handle large models well, but you still want to keep the Ollama workflow, cloud models are a very direct option.

How to Download a GGUF Model from Hugging Face and Import It into Ollama

Thu, 09 Apr 2026 11:00:07 +0800

If a model is not available in the official Ollama library, or if you want to use a specific GGUF file from Hugging Face, you can download it manually and then import it into Ollama.

Step 1: Download the GGUF file from Hugging Face

First, find the target model’s GGUF file on Hugging Face. You will usually see multiple quantized versions, such as:

Q4_K_M
Q5_K_M
Q8_0

Which version you choose depends on your VRAM, RAM, and your tradeoff between speed and quality. After downloading, place the .gguf file in a fixed directory so you can reference it from the Modelfile.

Step 2: Write the Modelfile

Create a Modelfile in the same directory as the model file. The most basic version looks like this:

`1`	`FROM ./model.gguf`

If the filename is different, replace it with the actual filename, for example:

`1`	`FROM ./gemma-3-12b-it-q4_k_m.gguf`

If your goal is just to get it running, this single FROM line is usually enough.

Step 3: Import it into Ollama

Then run:

`1`	`ollama create myModelName -f Modelfile`

myModelName is the local model name you want to use inside Ollama
-f Modelfile tells Ollama to create the model from that file

Once the creation succeeds, the GGUF file becomes a local model that you can call directly.

Step 4: Run the model

After creation, run:

`1`	`ollama run myModelName`

From that point on, it works much like a model pulled with ollama pull.

How to inspect an existing model’s Modelfile

If you are not sure how to write a Modelfile, you can inspect the configuration of an existing model directly:

`1`	`ollama show --modelfile llama3.2`

This command prints the Modelfile for llama3.2, which is useful as a reference for:

How FROM should be written
How the template and system prompt are structured
How parameters are declared

When this approach makes sense

This manual Hugging Face import flow is useful when:

The model you want is not available in Ollama’s official library
You want a specific quantized variant
You have already downloaded the GGUF file manually
You want finer control over how the model is packaged

If Ollama already provides an official version, using pull is usually simpler. But when you need a specific quantization or a custom wrapper, GGUF + Modelfile gives you more flexibility.

Common notes

The path after FROM must match the actual location of the .gguf file.
If the filename contains spaces or special characters, it is better to rename it first.
Different GGUF quantization levels can greatly affect memory use and speed, so successful import does not guarantee smooth runtime performance.
If the model is a chat model, you may still need to adjust the prompt template later for better results.

Conclusion

Downloading a GGUF file from Hugging Face and importing it into Ollama is not complicated. Prepare the model file, write a minimal Modelfile, then run ollama create, and you can bring a third-party GGUF model into your Ollama workflow.

How to Troubleshoot Slow `ollama pull` Model Downloads

Thu, 09 Apr 2026 10:42:39 +0800

ollama pull model_name:tag can be very slow in some regions, and the download process is not always stable.

If your issue looks like repeated interruptions halfway through a large model download, with errors such as TLS handshake timeout or unexpected EOF, the bottleneck may not be registry.ollama.ai itself, but the actual download path after the redirect.

This article walks through a simple troubleshooting approach: first get the real model file URLs, then confirm where the traffic actually ends up, and finally optimize only the domains that matter.

Get the model file download URLs

You can use the following project to extract the manifest and blob download URLs for an Ollama model directly:

https://github.com/Gholamrezadar/ollama-direct-downloader

Using gemma4:latest as an example, you can extract links like the following.

Manifest URL

`1`	`https://registry.ollama.ai/v2/library/gemma4/manifests/latest`

Blob URLs

https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11
https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2
https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3

If you only want a quick verification, you can also download the manifest and blobs directly with curl:

curl -L "https://registry.ollama.ai/v2/library/gemma4/manifests/latest" -o "latest"
curl -L "https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11" -o "sha256-f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11"
curl -L "https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a" -o "sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a"
curl -L "https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2" -o "sha256-7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2"

The real download URL after the redirect

If you try downloading one of the blobs with wget, you will notice that the request does not stay on registry.ollama.ai. It gets redirected to a Cloudflare R2 object storage URL:

`1`	`wget https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a`

There are a few key details in the log:

registry.ollama.ai returns 307 Temporary Redirect
The final download URL lands on *.r2.cloudflarestorage.com
The large file transfer is actually being served by the object storage domain behind the redirect

This matters because if your proxy or routing rules only cover registry.ollama.ai but not *.r2.cloudflarestorage.com, downloads can still be slow or repeatedly interrupted.

Here is one example of an actual redirect log:

wget https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
--2026-04-09 09:22:04--  https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
Resolving registry.ollama.ai (registry.ollama.ai)... 104.21.75.227, 172.67.182.229, 2606:4700:3034::ac43:b6e5, ...
Connecting to registry.ollama.ai (registry.ollama.ai)|104.21.75.227|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/4c/4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a/data?... [following]
--2026-04-09 09:22:05--  https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/4c/4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a/data?...
Resolving dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com (dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com)... 172.64.66.1, 2606:4700:2ff9::1
Connecting to dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com|172.64.66.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9608338848 (8.9G) [application/octet-stream]

Adjust your network settings

Once you confirm the real download path, the troubleshooting direction becomes much clearer.

If you are using a proxy, split routing, or custom DNS, check these first:

Whether registry.ollama.ai and *.r2.cloudflarestorage.com are using the same stable route
Whether your proxy rules cover only the former but miss the latter
Whether your current outbound path is suitable for sustained multi-GB downloads

The key issue here is not simply whether the official site opens, but whether the redirected object storage path is stable enough for long-running large-file transfers. In many cases, the real bottleneck is the Cloudflare R2 layer rather than the registry domain in front of it.

Before-and-after comparison

Here is one real-world example while downloading gemma4:31b-it-q8_0.

Before adjusting the network path, the download was slow and failed midway:

PS C:\Users\knightli> ollama run gemma4:31b-it-q8_0
pulling manifest
pulling a0feadb736f5:  38% ▕██████████████████████                                    ▏  12 GB/ 33 GB  1.2 MB/s   4h40m
Error: max retries exceeded: unexpected EOF

After the adjustment, the same model download became noticeably faster and more stable:

1
2
3

PS C:\Users\knightli> ollama run gemma4:31b-it-q8_0
pulling manifest
pulling a0feadb736f5:  46% ▕████████████████████████████████████████████████████████████████▏ 15 GB/ 33 GB  8.5 MB/s  35m23s

This does not mean every network environment will see the same improvement, but it does support one useful conclusion: the bottleneck may be the actual large-file download path rather than the Ollama client itself.

A more practical troubleshooting order

If you run into the same issue, this order usually works well:

Run ollama pull or ollama run once and confirm the issue is reproducible.
Test a blob URL with wget or curl -L and confirm whether it redirects to *.r2.cloudflarestorage.com.
Adjust your proxy or routing only for the real download domain, then test speed and stability again.

The benefit of this order is that each step validates one clear hypothesis, so you do not have to troubleshoot blindly.

Conclusion

When ollama pull is slow, the problem is often not that registry.ollama.ai is unreachable, but that the Cloudflare R2 path actually serving the large files is unstable.

So instead of retrying over and over, a better approach is to identify the real download path first and optimize the network route where the traffic actually lands.