<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Multimodality on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/multimodality/</link>
        <description>Recent content in Multimodality on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 23 Apr 2026 15:08:19 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/multimodality/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>What image vectorization is: from pixel images to searchable, analyzable vector representations</title>
        <link>https://www.knightli.com/en/2026/04/23/what-is-image-vectorization-vector-search-vision-workflow/</link>
        <pubDate>Thu, 23 Apr 2026 15:08:19 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/23/what-is-image-vectorization-vector-search-vision-workflow/</guid>
        <description>&lt;p&gt;There are already a huge number of images everywhere, but images do not automatically become something a system can understand or use well.&lt;/p&gt;
&lt;p&gt;For people, it is easy to look at an image and quickly tell whether it contains a cat, whether it shows the same product, or whether it reveals a certain kind of defect. For a system, though, a raw image starts out as a grid of pixels. Without additional processing, it is closer to a pile of colored dots than to a piece of data that can be directly searched, clustered, recommended, or recognized.&lt;/p&gt;
&lt;p&gt;Image vectorization solves that step. It converts images from pixel-based files into vector representations that can be compared and computed efficiently by machines. Many capabilities such as image-to-image search, similar image recommendation, visual retrieval, image clustering, and multimodal understanding rely on this layer underneath.&lt;/p&gt;
&lt;h2 id=&#34;1-what-image-vectorization-actually-means&#34;&gt;1. What image vectorization actually means
&lt;/h2&gt;&lt;p&gt;The shortest way to describe it is this:&lt;/p&gt;
&lt;p&gt;image vectorization turns an image into a numerical vector that captures its visual features.&lt;/p&gt;
&lt;p&gt;That vector is usually not meant for people to read. It is meant for models and retrieval systems. Its value is that an image stops being only a file and becomes an object that can participate in similarity comparison, ranking, and computation.&lt;/p&gt;
&lt;p&gt;Take a photo of a cat as an example. In its raw form, the file stores pixel information. After vectorization, the system gets a fixed-length numerical vector instead. The vector does not literally say &amp;ldquo;this is a cat&amp;rdquo;, but it encodes features such as shape, texture, color distribution, local structure, and higher-level semantics. That makes it possible to calculate distances between it and other images and decide which ones are more similar.&lt;/p&gt;
&lt;p&gt;So image vectorization does not mainly change the image itself. It changes how the image can be processed by a system.&lt;/p&gt;
&lt;h2 id=&#34;2-why-raw-pixels-are-not-enough-for-search-and-analysis&#34;&gt;2. Why raw pixels are not enough for search and analysis
&lt;/h2&gt;&lt;p&gt;Raw pixels can still be compared, but both effectiveness and efficiency are limited.&lt;/p&gt;
&lt;p&gt;The main problems usually fall into three groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the dimensionality is high, so direct comparison is expensive&lt;/li&gt;
&lt;li&gt;pixel similarity is not the same thing as semantic similarity&lt;/li&gt;
&lt;li&gt;lighting, cropping, background, and resolution changes can all distort the result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical example is product image retrieval. Two product photos may still clearly represent the same kind of item to a person, even if the angle, background, or image size is different. But if a system only compares pixels directly, it can easily judge them as completely different images.&lt;/p&gt;
&lt;p&gt;The purpose of vectorization is to move the definition of similarity from raw pixel comparison to something much closer to semantic and structural similarity.&lt;/p&gt;
&lt;h2 id=&#34;3-how-image-vectorization-is-usually-done&#34;&gt;3. How image vectorization is usually done
&lt;/h2&gt;&lt;p&gt;In practice, image vectorization is rarely a single step. It is usually a pipeline:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;preprocess the image&lt;/li&gt;
&lt;li&gt;extract image features&lt;/li&gt;
&lt;li&gt;compress those features into a fixed-length vector&lt;/li&gt;
&lt;li&gt;store the vector in a vector database or retrieval system&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each stage affects the final quality.&lt;/p&gt;
&lt;h3 id=&#34;1-preprocessing&#34;&gt;1. Preprocessing
&lt;/h3&gt;&lt;p&gt;Preprocessing usually includes things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;resizing the image&lt;/li&gt;
&lt;li&gt;normalizing the input&lt;/li&gt;
&lt;li&gt;removing part of the noise&lt;/li&gt;
&lt;li&gt;unifying color format or input structure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal here is not visual beautification. It is to make later model input more stable.&lt;/p&gt;
&lt;h3 id=&#34;2-feature-extraction&#34;&gt;2. Feature extraction
&lt;/h3&gt;&lt;p&gt;This is the core of image vectorization.&lt;/p&gt;
&lt;p&gt;Earlier approaches relied more on handcrafted features such as &lt;code&gt;SIFT&lt;/code&gt;, &lt;code&gt;SURF&lt;/code&gt;, and &lt;code&gt;HOG&lt;/code&gt;, which are good at capturing edges, corners, and local structures. Today, deep learning models are much more common for this step, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ResNet&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;VGG&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Inception&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ViT&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CLIP&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These models encode images into higher-level and more abstract visual features. Compared with traditional feature engineering, they are better at expressing semantics and are more suitable for similarity search, multimodal understanding, and large-scale clustering.&lt;/p&gt;
&lt;h3 id=&#34;3-vector-generation&#34;&gt;3. Vector generation
&lt;/h3&gt;&lt;p&gt;After feature extraction, the system often compresses the internal representation into a fixed-length vector such as &lt;code&gt;512&lt;/code&gt;, &lt;code&gt;768&lt;/code&gt;, or &lt;code&gt;1024&lt;/code&gt; dimensions.&lt;/p&gt;
&lt;p&gt;The point here is not that higher dimensionality is always better. The real problem is finding a balance between representational power, storage cost, and retrieval speed.&lt;/p&gt;
&lt;h3 id=&#34;4-storage-and-retrieval&#34;&gt;4. Storage and retrieval
&lt;/h3&gt;&lt;p&gt;Once the vector is generated, it is usually no longer managed like a normal image file. It enters a system that supports vector retrieval, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Faiss&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Milvus&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;search systems with vector capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At that point, the image can participate in approximate nearest-neighbor search, clustering, and similarity ranking.&lt;/p&gt;
&lt;h2 id=&#34;4-how-the-technical-path-evolved&#34;&gt;4. How the technical path evolved
&lt;/h2&gt;&lt;p&gt;Image vectorization is not a brand-new idea. What changed in recent years is the quality and range of applications.&lt;/p&gt;
&lt;p&gt;You can roughly think of the evolution in three stages:&lt;/p&gt;
&lt;h3 id=&#34;1-traditional-feature-engineering&#34;&gt;1. Traditional feature engineering
&lt;/h3&gt;&lt;p&gt;At this stage, the focus was on manually defined image features such as edges, textures, corners, and local descriptors. The advantage was maturity and interpretability. The weakness was limited semantic understanding in more complex scenes.&lt;/p&gt;
&lt;h3 id=&#34;2-cnn-driven-stage&#34;&gt;2. CNN-driven stage
&lt;/h3&gt;&lt;p&gt;Convolutional neural networks moved image vectorization into a stage where features could be learned automatically. Compared with handcrafted features, they could capture richer and more stable visual representations for classification, recognition, and similarity search.&lt;/p&gt;
&lt;h3 id=&#34;3-transformer-and-multimodal-stage&#34;&gt;3. Transformer and multimodal stage
&lt;/h3&gt;&lt;p&gt;This stage pushed image vectorization beyond visual features alone toward image-text semantic alignment. Models such as &lt;code&gt;ViT&lt;/code&gt; and &lt;code&gt;CLIP&lt;/code&gt; are not only used for recognizing images. They allow images to enter larger multimodal systems and work together with text, labels, and knowledge bases.&lt;/p&gt;
&lt;p&gt;That is why many image retrieval systems today are no longer limited to image-to-image search. They can already support text-to-image search or mixed image-text retrieval.&lt;/p&gt;
&lt;h2 id=&#34;5-the-most-common-application-scenarios&#34;&gt;5. The most common application scenarios
&lt;/h2&gt;&lt;p&gt;Image vectorization is not only for academic research. It is highly practical in real systems.&lt;/p&gt;
&lt;h3 id=&#34;1-similar-image-retrieval&#34;&gt;1. Similar image retrieval
&lt;/h3&gt;&lt;p&gt;This is the most intuitive use case.&lt;/p&gt;
&lt;p&gt;Once images are converted into vectors, systems can do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;image-to-image search&lt;/li&gt;
&lt;li&gt;duplicate image detection&lt;/li&gt;
&lt;li&gt;similar product matching&lt;/li&gt;
&lt;li&gt;visual deduplication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is common in e-commerce, content platforms, and media asset systems.&lt;/p&gt;
&lt;h3 id=&#34;2-recommendation-systems&#34;&gt;2. Recommendation systems
&lt;/h3&gt;&lt;p&gt;Many recommendation problems are really asking whether one image is similar to what a user has just seen.&lt;/p&gt;
&lt;p&gt;After vectorization, the content of the image itself can become part of the recommendation logic, instead of relying only on text labels or manual categories. That is valuable for product recommendation, content recommendation, and ad matching.&lt;/p&gt;
&lt;h3 id=&#34;3-image-clustering-and-automatic-classification&#34;&gt;3. Image clustering and automatic classification
&lt;/h3&gt;&lt;p&gt;When image collections become large, manual organization becomes slow.&lt;/p&gt;
&lt;p&gt;With vectorization, images can first be grouped by similarity, and then used for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;archiving&lt;/li&gt;
&lt;li&gt;scene grouping&lt;/li&gt;
&lt;li&gt;material organization&lt;/li&gt;
&lt;li&gt;automatic tag suggestions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is common in manufacturing, healthcare, education, and media content management.&lt;/p&gt;
&lt;h3 id=&#34;4-anomaly-detection-and-quality-inspection&#34;&gt;4. Anomaly detection and quality inspection
&lt;/h3&gt;&lt;p&gt;If normal samples can already be represented stably as vectors, then images that deviate from the normal distribution become easier to detect.&lt;/p&gt;
&lt;p&gt;Typical examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;industrial defect detection&lt;/li&gt;
&lt;li&gt;surveillance anomaly recognition&lt;/li&gt;
&lt;li&gt;abnormal document or imaging screening&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here, vectorization does not directly produce the final judgment. It turns the image into input that is easier to compare and model.&lt;/p&gt;
&lt;h3 id=&#34;5-multimodal-retrieval-and-image-text-understanding&#34;&gt;5. Multimodal retrieval and image-text understanding
&lt;/h3&gt;&lt;p&gt;This is one of the more important areas today.&lt;/p&gt;
&lt;p&gt;When images and text can both be encoded into nearby vector spaces, systems can support:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;text-to-image search&lt;/li&gt;
&lt;li&gt;image-text alignment&lt;/li&gt;
&lt;li&gt;image content retrieval&lt;/li&gt;
&lt;li&gt;multimodal knowledge retrieval&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities connect naturally with many current generative AI systems, visual question answering pipelines, and enterprise retrieval-augmented workflows.&lt;/p&gt;
&lt;h2 id=&#34;6-what-enterprises-actually-have-to-deal-with&#34;&gt;6. What enterprises actually have to deal with
&lt;/h2&gt;&lt;p&gt;Image vectorization sounds smooth in theory, but the hard part in practice is usually not learning the term. It is dealing with details such as these:&lt;/p&gt;
&lt;h3 id=&#34;1-how-to-balance-vector-dimensionality-and-cost&#34;&gt;1. How to balance vector dimensionality and cost
&lt;/h3&gt;&lt;p&gt;If the vector is too small, representation may be weak. If it is too large, storage and retrieval costs rise. There is no single universal answer. It depends on data size, response time, and target accuracy.&lt;/p&gt;
&lt;h3 id=&#34;2-whether-a-model-can-generalize-across-scenarios&#34;&gt;2. Whether a model can generalize across scenarios
&lt;/h3&gt;&lt;p&gt;A model that works well on public datasets may not work equally well on your own images. Product photos, industrial images, medical images, and surveillance screenshots differ a lot in distribution, so they often need separate evaluation.&lt;/p&gt;
&lt;h3 id=&#34;3-whether-the-retrieval-system-can-scale&#34;&gt;3. Whether the retrieval system can scale
&lt;/h3&gt;&lt;p&gt;When the number of images grows from tens of thousands to millions or tens of millions, vector generation is only the first half. Index design, recall strategy, update behavior, and online query performance become the factors that really define the user experience.&lt;/p&gt;
&lt;h3 id=&#34;4-vectorization-is-not-the-business-loop-by-itself&#34;&gt;4. Vectorization is not the business loop by itself
&lt;/h3&gt;&lt;p&gt;This is easy to overlook.&lt;/p&gt;
&lt;p&gt;Vectorization solves the problem of turning images into computable objects, but it is not a complete solution by itself. After that, you still need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;retrieval logic&lt;/li&gt;
&lt;li&gt;a label system&lt;/li&gt;
&lt;li&gt;evaluation criteria&lt;/li&gt;
&lt;li&gt;human review processes&lt;/li&gt;
&lt;li&gt;integration with business systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without those pieces, vectors alone do not automatically create value.&lt;/p&gt;
&lt;h2 id=&#34;7-how-to-think-about-its-real-value&#34;&gt;7. How to think about its real value
&lt;/h2&gt;&lt;p&gt;From a pure technical perspective, image vectorization can sound like a low-level term. But from a business perspective, its value is actually quite concrete:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it gives images searchability for the first time&lt;/li&gt;
&lt;li&gt;it moves similarity comparison from the pixel layer to the semantic layer&lt;/li&gt;
&lt;li&gt;it allows images to enter recommendation, retrieval, clustering, and recognition pipelines&lt;/li&gt;
&lt;li&gt;it turns visual data into something that can participate in analytics and automation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can think of it as the standard entry point for visual data into AI systems. Without it, many image-related capabilities stay at the file-management level. With it, images begin to turn into data assets that can support decision-making and automation.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;Image vectorization is not an isolated trick. It is a very basic layer in modern vision systems.&lt;/p&gt;
&lt;p&gt;What it does is not mysterious: it turns images from collections of pixels into vector representations that can be searched, compared, and analyzed. But that one step determines whether images can really enter AI, search, recommendation, and multimodal application pipelines.&lt;/p&gt;
&lt;p&gt;If you want to remember just one sentence, make it this:&lt;/p&gt;
&lt;p&gt;the essence of image vectorization is not image compression, but turning images into a representation that machines can actually use.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
