<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Hugging Face on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/hugging-face/</link>
        <description>Recent content in Hugging Face on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 09 Apr 2026 11:00:07 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/hugging-face/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>How to Download a GGUF Model from Hugging Face and Import It into Ollama</title>
        <link>https://www.knightli.com/en/2026/04/09/import-huggingface-gguf-into-ollama/</link>
        <pubDate>Thu, 09 Apr 2026 11:00:07 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/09/import-huggingface-gguf-into-ollama/</guid>
        <description>&lt;p&gt;If a model is not available in the official Ollama library, or if you want to use a specific &lt;code&gt;GGUF&lt;/code&gt; file from Hugging Face, you can download it manually and then import it into Ollama.&lt;/p&gt;
&lt;h2 id=&#34;step-1-download-the-gguf-file-from-hugging-face&#34;&gt;Step 1: Download the GGUF file from Hugging Face
&lt;/h2&gt;&lt;p&gt;First, find the target model&amp;rsquo;s &lt;code&gt;GGUF&lt;/code&gt; file on Hugging Face. You will usually see multiple quantized versions, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Q4_K_M&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Q5_K_M&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Which version you choose depends on your VRAM, RAM, and your tradeoff between speed and quality. After downloading, place the &lt;code&gt;.gguf&lt;/code&gt; file in a fixed directory so you can reference it from the &lt;code&gt;Modelfile&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;step-2-write-the-modelfile&#34;&gt;Step 2: Write the Modelfile
&lt;/h2&gt;&lt;p&gt;Create a &lt;code&gt;Modelfile&lt;/code&gt; in the same directory as the model file. The most basic version looks like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;FROM ./model.gguf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the filename is different, replace it with the actual filename, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;FROM ./gemma-3-12b-it-q4_k_m.gguf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If your goal is just to get it running, this single &lt;code&gt;FROM&lt;/code&gt; line is usually enough.&lt;/p&gt;
&lt;h2 id=&#34;step-3-import-it-into-ollama&#34;&gt;Step 3: Import it into Ollama
&lt;/h2&gt;&lt;p&gt;Then run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama create myModelName -f Modelfile
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;myModelName&lt;/code&gt; is the local model name you want to use inside Ollama&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-f Modelfile&lt;/code&gt; tells Ollama to create the model from that file&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once the creation succeeds, the GGUF file becomes a local model that you can call directly.&lt;/p&gt;
&lt;h2 id=&#34;step-4-run-the-model&#34;&gt;Step 4: Run the model
&lt;/h2&gt;&lt;p&gt;After creation, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run myModelName
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;From that point on, it works much like a model pulled with &lt;code&gt;ollama pull&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-to-inspect-an-existing-models-modelfile&#34;&gt;How to inspect an existing model&amp;rsquo;s Modelfile
&lt;/h2&gt;&lt;p&gt;If you are not sure how to write a &lt;code&gt;Modelfile&lt;/code&gt;, you can inspect the configuration of an existing model directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama show --modelfile llama3.2
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This command prints the &lt;code&gt;Modelfile&lt;/code&gt; for &lt;code&gt;llama3.2&lt;/code&gt;, which is useful as a reference for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How &lt;code&gt;FROM&lt;/code&gt; should be written&lt;/li&gt;
&lt;li&gt;How the template and system prompt are structured&lt;/li&gt;
&lt;li&gt;How parameters are declared&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;when-this-approach-makes-sense&#34;&gt;When this approach makes sense
&lt;/h2&gt;&lt;p&gt;This manual Hugging Face import flow is useful when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model you want is not available in Ollama&amp;rsquo;s official library&lt;/li&gt;
&lt;li&gt;You want a specific quantized variant&lt;/li&gt;
&lt;li&gt;You have already downloaded the &lt;code&gt;GGUF&lt;/code&gt; file manually&lt;/li&gt;
&lt;li&gt;You want finer control over how the model is packaged&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Ollama already provides an official version, using &lt;code&gt;pull&lt;/code&gt; is usually simpler. But when you need a specific quantization or a custom wrapper, &lt;code&gt;GGUF + Modelfile&lt;/code&gt; gives you more flexibility.&lt;/p&gt;
&lt;h2 id=&#34;common-notes&#34;&gt;Common notes
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;The path after &lt;code&gt;FROM&lt;/code&gt; must match the actual location of the &lt;code&gt;.gguf&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;If the filename contains spaces or special characters, it is better to rename it first.&lt;/li&gt;
&lt;li&gt;Different &lt;code&gt;GGUF&lt;/code&gt; quantization levels can greatly affect memory use and speed, so successful import does not guarantee smooth runtime performance.&lt;/li&gt;
&lt;li&gt;If the model is a chat model, you may still need to adjust the prompt template later for better results.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;Downloading a &lt;code&gt;GGUF&lt;/code&gt; file from Hugging Face and importing it into Ollama is not complicated. Prepare the model file, write a minimal &lt;code&gt;Modelfile&lt;/code&gt;, then run &lt;code&gt;ollama create&lt;/code&gt;, and you can bring a third-party &lt;code&gt;GGUF&lt;/code&gt; model into your Ollama workflow.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
