<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>HDD on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/hdd/</link>
        <description>Recent content in HDD on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Sat, 16 May 2026 21:02:33 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/hdd/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Why AI Data Centers Are Driving HDD Demand Again</title>
        <link>https://www.knightli.com/en/2026/05/16/ai-data-center-hdd-storage-demand/</link>
        <pubDate>Sat, 16 May 2026 21:02:33 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/05/16/ai-data-center-hdd-storage-demand/</guid>
        <description>&lt;p&gt;Over the past two years, most AI infrastructure discussions have focused on GPUs, HBM, advanced packaging, and power supply. Behind training and inference systems, however, there is another bottleneck that is easier to overlook: storage.&lt;/p&gt;
&lt;p&gt;A large model does not finish its work with a single computation inside a GPU. During training, it continuously produces checkpoints, optimizer states, training logs, dataset versions, and intermediate results. During inference, it also generates user interaction records, compliance archives, audit data, and system logs. These datasets do not always need to sit on the fastest media, but they often cannot be deleted immediately.&lt;/p&gt;
&lt;p&gt;That is why hard drives are becoming important again.&lt;/p&gt;
&lt;h2 id=&#34;ai-training-creates-massive-cold-data&#34;&gt;AI Training Creates Massive Cold Data
&lt;/h2&gt;&lt;p&gt;Large model training needs to save checkpoints regularly. A checkpoint is essentially a saved state of the training process: if a training run crashes halfway through, the system can resume from a checkpoint instead of starting over.&lt;/p&gt;
&lt;p&gt;For a large model, a single checkpoint can be several terabytes. A full training run may last weeks or even months, producing many checkpoints along the way. Even if some are later cleaned up, experiment replay, reproducibility, rollback, and model audits still require large amounts of data to be retained.&lt;/p&gt;
&lt;p&gt;Training data itself is also expanding. High-quality text, images, videos, and code need to be cleaned, deduplicated, split, and versioned. As synthetic data, reinforcement learning data, and multimodal data become part of training pipelines, storage pressure will keep increasing.&lt;/p&gt;
&lt;p&gt;This kind of data has several traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is enormous in volume;&lt;/li&gt;
&lt;li&gt;It is not always accessed frequently;&lt;/li&gt;
&lt;li&gt;It needs long-term retention;&lt;/li&gt;
&lt;li&gt;It is highly sensitive to cost per unit of capacity.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This data does not make sense to store entirely on expensive high-speed storage.&lt;/p&gt;
&lt;h2 id=&#34;why-not-use-only-ssds&#34;&gt;Why Not Use Only SSDs
&lt;/h2&gt;&lt;p&gt;SSDs are obviously faster, but data centers cannot optimize only for speed. For petabyte-scale cold data and anything beyond that, cost per unit of capacity directly determines whether the system is sustainable.&lt;/p&gt;
&lt;p&gt;Storage in an AI cluster can be divided into several tiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HBM and GPU memory handle the hottest and most urgent data;&lt;/li&gt;
&lt;li&gt;DRAM handles temporary movement and staging;&lt;/li&gt;
&lt;li&gt;SSDs handle frequently accessed data with stronger low-latency requirements;&lt;/li&gt;
&lt;li&gt;HDDs handle massive cold data, backups, logs, checkpoint archives, and long-term retention.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, SSDs are important, but they cannot replace every tier. Truly large-scale systems usually need tiered storage: hot data prioritizes speed, while cold data prioritizes capacity, cost, and reliability.&lt;/p&gt;
&lt;p&gt;As AI companies start retaining training residue, model versions, synthetic data, inference logs, and audit records for longer periods, the value of HDDs becomes more visible again.&lt;/p&gt;
&lt;h2 id=&#34;why-hdd-supply-is-getting-tight&#34;&gt;Why HDD Supply Is Getting Tight
&lt;/h2&gt;&lt;p&gt;The hard drive market has not looked especially exciting for years, and consumer PCs have increasingly shifted to SSDs. Data centers follow a different demand logic.&lt;/p&gt;
&lt;p&gt;Cloud providers and AI companies need high-capacity nearline drives with predictable delivery and low cost per terabyte. For hard drive vendors, these customers usually sign long-term supply agreements and receive higher priority than fragmented consumer channels.&lt;/p&gt;
&lt;p&gt;That leads to several effects:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Capacity for high-capacity enterprise drives is locked in early by large customers.&lt;/li&gt;
&lt;li&gt;Consumer hard drives and ordinary retail channels receive less supply.&lt;/li&gt;
&lt;li&gt;New capacity takes time to come online, so short-term shortages are hard to fix quickly.&lt;/li&gt;
&lt;li&gt;Hard drives move from low-attention hardware into part of AI infrastructure.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;More importantly, the hard drive industry itself is already highly concentrated. There are only a few mainstream suppliers, and ramping production of advanced high-capacity drives is not as simple as building more factories. Technologies such as HAMR can increase capacity per drive, but moving from technical mass production to stable large-scale delivery still takes time.&lt;/p&gt;
&lt;h2 id=&#34;storage-price-increases-can-reach-consumers&#34;&gt;Storage Price Increases Can Reach Consumers
&lt;/h2&gt;&lt;p&gt;AI data centers are not only absorbing GPUs and power. They can also affect the storage supply chain.&lt;/p&gt;
&lt;p&gt;When more enterprise SSD, memory, and HDD capacity flows toward cloud providers and AI infrastructure, the consumer market may begin to feel price pressure. Higher retail prices for SSDs, memory, or hard drives are not always just retail volatility. They may come from upstream capacity being reallocated.&lt;/p&gt;
&lt;p&gt;This effect is usually not linear. Large customers sign long-term agreements with more stable pricing, delivery, and capacity planning. Consumers are more exposed to spot-market fluctuations. The result is a familiar pattern: rising AI data center demand eventually makes storage devices more expensive for ordinary buyers too.&lt;/p&gt;
&lt;h2 id=&#34;the-investment-view-requires-more-caution&#34;&gt;The Investment View Requires More Caution
&lt;/h2&gt;&lt;p&gt;AI-driven storage demand is real, but that does not mean every storage-related company will benefit over the long term.&lt;/p&gt;
&lt;p&gt;Hard drives and flash memory still have cyclical characteristics. Rising prices, tight capacity, and long-term customer contracts can improve short-term performance. But once new capacity comes online or demand growth slows, the industry may return to supply-demand rebalancing. For hardware companies, the most important questions are not about one price increase, but whether demand can persist, margins can improve, capacity expansion becomes excessive, and the customer mix remains healthy.&lt;/p&gt;
&lt;p&gt;A steadier interpretation is that AI is changing the demand structure of the storage industry. In the past, outsiders paid more attention to compute. Now more costs are shifting toward data retention, data governance, and model lifecycle management.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;AI does not only consume compute. It also keeps producing data.&lt;/p&gt;
&lt;p&gt;GPUs handle computation, HBM feeds data at high speed, SSDs support hot data access, and hard drives carry the enormous cold data base. As long as large model training, synthetic data, inference logs, and compliance retention continue to grow, data centers will need large amounts of low-cost, high-capacity storage media.&lt;/p&gt;
&lt;p&gt;Hard drives may not look like the star hardware of the AI era, but they are becoming an indispensable layer of AI infrastructure. The more advanced the model, the more it depends on massive storage systems. The more expensive the compute, the more it needs reliable checkpoints and archives to protect the cost already invested.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
