<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Model Deployment on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/model-deployment/</link>
        <description>Recent content in Model Deployment on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 09 Apr 2026 18:42:32 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/model-deployment/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>What are Ollama cloud models and how do you use them</title>
        <link>https://www.knightli.com/en/2026/04/09/ollama-cloud-models-guide/</link>
        <pubDate>Thu, 09 Apr 2026 18:42:32 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/09/ollama-cloud-models-guide/</guid>
        <description>&lt;p&gt;If you already use &lt;code&gt;Ollama&lt;/code&gt; to run local models, cloud models are easy to understand.&lt;/p&gt;
&lt;p&gt;There is only one core difference:&lt;br&gt;
local models run on your own machine, while cloud models run on Ollama&amp;rsquo;s cloud infrastructure and return the result to you.&lt;/p&gt;
&lt;h2 id=&#34;what-are-ollama-cloud-models&#34;&gt;What are Ollama cloud models
&lt;/h2&gt;&lt;p&gt;Ollama cloud models keep the Ollama workflow, but move the actual computation from your local machine to the cloud.&lt;/p&gt;
&lt;p&gt;The main benefits are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Less pressure on local hardware&lt;/li&gt;
&lt;li&gt;Easier access to larger models that your machine cannot run well&lt;/li&gt;
&lt;li&gt;You can keep using the familiar Ollama workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-they-differ-from-local-models&#34;&gt;How they differ from local models
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Item&lt;/th&gt;
          &lt;th&gt;Local models&lt;/th&gt;
          &lt;th&gt;Cloud models&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Runtime location&lt;/td&gt;
          &lt;td&gt;Your machine&lt;/td&gt;
          &lt;td&gt;Cloud&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Hardware requirements&lt;/td&gt;
          &lt;td&gt;High&lt;/td&gt;
          &lt;td&gt;Low&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Latency&lt;/td&gt;
          &lt;td&gt;Usually lower&lt;/td&gt;
          &lt;td&gt;Affected by network&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Privacy&lt;/td&gt;
          &lt;td&gt;Stronger&lt;/td&gt;
          &lt;td&gt;Requests are sent to the cloud&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you care more about privacy, low latency, and offline use, local models are a better fit.&lt;br&gt;
If your hardware is limited but you still want to use larger models, cloud models are more convenient.&lt;/p&gt;
&lt;h2 id=&#34;how-to-identify-a-cloud-model&#34;&gt;How to identify a cloud model
&lt;/h2&gt;&lt;p&gt;At the moment, Ollama cloud models are typically labeled with a &lt;code&gt;-cloud&lt;/code&gt; suffix, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;gpt-oss:120b-cloud
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The available model list may change over time, so the official Ollama pages should be treated as the source of truth.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-them&#34;&gt;How to use them
&lt;/h2&gt;&lt;p&gt;First, sign in:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama signin
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After that, run a cloud model directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gpt-oss:120b-cloud
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you are calling it from code, you can also configure an API key:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;OLLAMA_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;your_api_key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Python example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;ollama&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Client&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;client&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Client&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;host&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;https://ollama.com&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;headers&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Bearer &amp;#34;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;os&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;environ&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;OLLAMA_API_KEY&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Why is the sky blue?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;part&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gpt-oss:120b-cloud&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;stream&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;part&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;message&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;][&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flush&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Ollama cloud models can be summarized in one sentence:&lt;/p&gt;
&lt;p&gt;the commands are almost the same, but the model is no longer running on your local machine.&lt;/p&gt;
&lt;p&gt;If your computer cannot handle large models well, but you still want to keep the Ollama workflow, cloud models are a very direct option.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>How to Download a GGUF Model from Hugging Face and Import It into Ollama</title>
        <link>https://www.knightli.com/en/2026/04/09/import-huggingface-gguf-into-ollama/</link>
        <pubDate>Thu, 09 Apr 2026 11:00:07 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/09/import-huggingface-gguf-into-ollama/</guid>
        <description>&lt;p&gt;If a model is not available in the official Ollama library, or if you want to use a specific &lt;code&gt;GGUF&lt;/code&gt; file from Hugging Face, you can download it manually and then import it into Ollama.&lt;/p&gt;
&lt;h2 id=&#34;step-1-download-the-gguf-file-from-hugging-face&#34;&gt;Step 1: Download the GGUF file from Hugging Face
&lt;/h2&gt;&lt;p&gt;First, find the target model&amp;rsquo;s &lt;code&gt;GGUF&lt;/code&gt; file on Hugging Face. You will usually see multiple quantized versions, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Q4_K_M&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Q5_K_M&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Which version you choose depends on your VRAM, RAM, and your tradeoff between speed and quality. After downloading, place the &lt;code&gt;.gguf&lt;/code&gt; file in a fixed directory so you can reference it from the &lt;code&gt;Modelfile&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;step-2-write-the-modelfile&#34;&gt;Step 2: Write the Modelfile
&lt;/h2&gt;&lt;p&gt;Create a &lt;code&gt;Modelfile&lt;/code&gt; in the same directory as the model file. The most basic version looks like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;FROM ./model.gguf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the filename is different, replace it with the actual filename, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;FROM ./gemma-3-12b-it-q4_k_m.gguf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If your goal is just to get it running, this single &lt;code&gt;FROM&lt;/code&gt; line is usually enough.&lt;/p&gt;
&lt;h2 id=&#34;step-3-import-it-into-ollama&#34;&gt;Step 3: Import it into Ollama
&lt;/h2&gt;&lt;p&gt;Then run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama create myModelName -f Modelfile
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;myModelName&lt;/code&gt; is the local model name you want to use inside Ollama&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-f Modelfile&lt;/code&gt; tells Ollama to create the model from that file&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once the creation succeeds, the GGUF file becomes a local model that you can call directly.&lt;/p&gt;
&lt;h2 id=&#34;step-4-run-the-model&#34;&gt;Step 4: Run the model
&lt;/h2&gt;&lt;p&gt;After creation, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run myModelName
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;From that point on, it works much like a model pulled with &lt;code&gt;ollama pull&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-to-inspect-an-existing-models-modelfile&#34;&gt;How to inspect an existing model&amp;rsquo;s Modelfile
&lt;/h2&gt;&lt;p&gt;If you are not sure how to write a &lt;code&gt;Modelfile&lt;/code&gt;, you can inspect the configuration of an existing model directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama show --modelfile llama3.2
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This command prints the &lt;code&gt;Modelfile&lt;/code&gt; for &lt;code&gt;llama3.2&lt;/code&gt;, which is useful as a reference for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How &lt;code&gt;FROM&lt;/code&gt; should be written&lt;/li&gt;
&lt;li&gt;How the template and system prompt are structured&lt;/li&gt;
&lt;li&gt;How parameters are declared&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;when-this-approach-makes-sense&#34;&gt;When this approach makes sense
&lt;/h2&gt;&lt;p&gt;This manual Hugging Face import flow is useful when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model you want is not available in Ollama&amp;rsquo;s official library&lt;/li&gt;
&lt;li&gt;You want a specific quantized variant&lt;/li&gt;
&lt;li&gt;You have already downloaded the &lt;code&gt;GGUF&lt;/code&gt; file manually&lt;/li&gt;
&lt;li&gt;You want finer control over how the model is packaged&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Ollama already provides an official version, using &lt;code&gt;pull&lt;/code&gt; is usually simpler. But when you need a specific quantization or a custom wrapper, &lt;code&gt;GGUF + Modelfile&lt;/code&gt; gives you more flexibility.&lt;/p&gt;
&lt;h2 id=&#34;common-notes&#34;&gt;Common notes
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;The path after &lt;code&gt;FROM&lt;/code&gt; must match the actual location of the &lt;code&gt;.gguf&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;If the filename contains spaces or special characters, it is better to rename it first.&lt;/li&gt;
&lt;li&gt;Different &lt;code&gt;GGUF&lt;/code&gt; quantization levels can greatly affect memory use and speed, so successful import does not guarantee smooth runtime performance.&lt;/li&gt;
&lt;li&gt;If the model is a chat model, you may still need to adjust the prompt template later for better results.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;Downloading a &lt;code&gt;GGUF&lt;/code&gt; file from Hugging Face and importing it into Ollama is not complicated. Prepare the model file, write a minimal &lt;code&gt;Modelfile&lt;/code&gt;, then run &lt;code&gt;ollama create&lt;/code&gt;, and you can bring a third-party &lt;code&gt;GGUF&lt;/code&gt; model into your Ollama workflow.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
