Model Deployment on KnightLi Blog

What are Ollama cloud models and how do you use them

Thu, 09 Apr 2026 18:42:32 +0800

If you already use Ollama to run local models, cloud models are easy to understand.

There is only one core difference:
local models run on your own machine, while cloud models run on Ollama’s cloud infrastructure and return the result to you.

What are Ollama cloud models

Ollama cloud models keep the Ollama workflow, but move the actual computation from your local machine to the cloud.

The main benefits are:

Less pressure on local hardware
Easier access to larger models that your machine cannot run well
You can keep using the familiar Ollama workflow

How they differ from local models

Item	Local models	Cloud models
Runtime location	Your machine	Cloud
Hardware requirements	High	Low
Latency	Usually lower	Affected by network
Privacy	Stronger	Requests are sent to the cloud

If you care more about privacy, low latency, and offline use, local models are a better fit.
If your hardware is limited but you still want to use larger models, cloud models are more convenient.

How to identify a cloud model

At the moment, Ollama cloud models are typically labeled with a -cloud suffix, for example:

`1`	`gpt-oss:120b-cloud`

The available model list may change over time, so the official Ollama pages should be treated as the source of truth.

How to use them

First, sign in:

`1`	`ollama signin`

After that, run a cloud model directly:

`1`	`ollama run gpt-oss:120b-cloud`

If you are calling it from code, you can also configure an API key:

`1`	`export OLLAMA_API_KEY=your_api_key`

Python example:

import os
from ollama import Client

client = Client(
    host="https://ollama.com",
    headers={"Authorization": "Bearer " + os.environ["OLLAMA_API_KEY"]},
)

messages = [
    {"role": "user", "content": "Why is the sky blue?"}
]

for part in client.chat("gpt-oss:120b-cloud", messages=messages, stream=True):
    print(part["message"]["content"], end="", flush=True)

Summary

Ollama cloud models can be summarized in one sentence:

the commands are almost the same, but the model is no longer running on your local machine.

If your computer cannot handle large models well, but you still want to keep the Ollama workflow, cloud models are a very direct option.

How to Download a GGUF Model from Hugging Face and Import It into Ollama

Thu, 09 Apr 2026 11:00:07 +0800

If a model is not available in the official Ollama library, or if you want to use a specific GGUF file from Hugging Face, you can download it manually and then import it into Ollama.

Step 1: Download the GGUF file from Hugging Face

First, find the target model’s GGUF file on Hugging Face. You will usually see multiple quantized versions, such as:

Q4_K_M
Q5_K_M
Q8_0

Which version you choose depends on your VRAM, RAM, and your tradeoff between speed and quality. After downloading, place the .gguf file in a fixed directory so you can reference it from the Modelfile.

Step 2: Write the Modelfile

Create a Modelfile in the same directory as the model file. The most basic version looks like this:

`1`	`FROM ./model.gguf`

If the filename is different, replace it with the actual filename, for example:

`1`	`FROM ./gemma-3-12b-it-q4_k_m.gguf`

If your goal is just to get it running, this single FROM line is usually enough.

Step 3: Import it into Ollama

Then run:

`1`	`ollama create myModelName -f Modelfile`

myModelName is the local model name you want to use inside Ollama
-f Modelfile tells Ollama to create the model from that file

Once the creation succeeds, the GGUF file becomes a local model that you can call directly.

Step 4: Run the model

After creation, run:

`1`	`ollama run myModelName`

From that point on, it works much like a model pulled with ollama pull.

How to inspect an existing model’s Modelfile

If you are not sure how to write a Modelfile, you can inspect the configuration of an existing model directly:

`1`	`ollama show --modelfile llama3.2`

This command prints the Modelfile for llama3.2, which is useful as a reference for:

How FROM should be written
How the template and system prompt are structured
How parameters are declared

When this approach makes sense

This manual Hugging Face import flow is useful when:

The model you want is not available in Ollama’s official library
You want a specific quantized variant
You have already downloaded the GGUF file manually
You want finer control over how the model is packaged

If Ollama already provides an official version, using pull is usually simpler. But when you need a specific quantization or a custom wrapper, GGUF + Modelfile gives you more flexibility.

Common notes

The path after FROM must match the actual location of the .gguf file.
If the filename contains spaces or special characters, it is better to rename it first.
Different GGUF quantization levels can greatly affect memory use and speed, so successful import does not guarantee smooth runtime performance.
If the model is a chat model, you may still need to adjust the prompt template later for better results.

Conclusion

Downloading a GGUF file from Hugging Face and importing it into Ollama is not complicated. Prepare the model file, write a minimal Modelfile, then run ollama create, and you can bring a third-party GGUF model into your Ollama workflow.