<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Computer Use on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/computer-use/</link>
        <description>Recent content in Computer Use on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Wed, 29 Apr 2026 11:28:25 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/computer-use/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Codex Is Starting to Control the Computer. What Does That Mean for the Future?</title>
        <link>https://www.knightli.com/en/2026/04/29/codex-computer-use-update/</link>
        <pubDate>Wed, 29 Apr 2026 11:28:25 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/29/codex-computer-use-update/</guid>
        <description>&lt;p&gt;The most important part of this Codex update is not that it added another ordinary button. It is that Codex is starting to move toward &amp;ldquo;controlling the computer.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the past, using AI usually meant asking questions in a chat box, copying, pasting, and then manually operating software.&lt;br&gt;
Now that boundary is expanding: AI does not just answer you. It can operate desktop applications according to your goal.&lt;/p&gt;
&lt;p&gt;In the short term, this is a new feature. In the long term, it may change how many people use computers.&lt;/p&gt;
&lt;h2 id=&#34;what-this-feature-is&#34;&gt;What This Feature Is
&lt;/h2&gt;&lt;p&gt;Simply put, Codex&amp;rsquo;s computer use capability lets it access and operate the desktop environment.&lt;/p&gt;
&lt;p&gt;It can do things such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;select and control an application&lt;/li&gt;
&lt;li&gt;receive tasks in natural language&lt;/li&gt;
&lt;li&gt;open browsers, AI tools, local files, or other software&lt;/li&gt;
&lt;li&gt;enter text, click buttons, and wait for results&lt;/li&gt;
&lt;li&gt;connect multiple steps into one task&lt;/li&gt;
&lt;li&gt;keep running in the background without requiring the user to follow every step manually&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its role is not just to write a piece of text for you, but to complete an operation flow for you.&lt;/p&gt;
&lt;p&gt;That is the key difference between an Agent and an ordinary chatbot:&lt;br&gt;
a chatbot mainly gives answers; an Agent is closer to &amp;ldquo;receiving a goal and then executing it.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;why-this-matters&#34;&gt;Why This Matters
&lt;/h2&gt;&lt;p&gt;In the past, much automation required you to know how to write scripts.&lt;/p&gt;
&lt;p&gt;For example, suppose you want to complete a cross-software workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;open a web page&lt;/li&gt;
&lt;li&gt;find information&lt;/li&gt;
&lt;li&gt;copy content&lt;/li&gt;
&lt;li&gt;pass it to another AI tool&lt;/li&gt;
&lt;li&gt;save a file&lt;/li&gt;
&lt;li&gt;open the local directory and check the result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To automate this traditionally, you might need browser scripts, APIs, local programs, and even window automation.&lt;/p&gt;
&lt;p&gt;But many ordinary users do not know how to write these things.&lt;br&gt;
Even if they do, it may not be worth writing a script for a temporary task.&lt;/p&gt;
&lt;p&gt;This is where computer use matters: it pushes &amp;ldquo;script-like capability&amp;rdquo; toward natural language.&lt;/p&gt;
&lt;p&gt;You do not necessarily need to tell it exactly where to click.&lt;br&gt;
You can tell it what result you want and let it try to complete the task.&lt;/p&gt;
&lt;h2 id=&#34;workflows-it-may-change&#34;&gt;Workflows It May Change
&lt;/h2&gt;&lt;p&gt;I think the first workflows to change will not be extremely serious or high-risk work, but the tasks that are annoying, fragmented, repetitive, and not worth writing a dedicated program for.&lt;/p&gt;
&lt;h3 id=&#34;1-moving-information-across-software&#34;&gt;1. Moving Information Across Software
&lt;/h3&gt;&lt;p&gt;The most typical case is moving information between applications.&lt;/p&gt;
&lt;p&gt;Previously, you might switch back and forth between a browser, a document, a chat window, and a local folder.&lt;br&gt;
In the future, you can hand this kind of task to an Agent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;find a certain kind of information&lt;/li&gt;
&lt;li&gt;summarize it into a document&lt;/li&gt;
&lt;li&gt;save it to a specified directory&lt;/li&gt;
&lt;li&gt;open the result for you to review&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This work is not hard, but it consumes attention.&lt;br&gt;
The value of an Agent is that it absorbs these small operations.&lt;/p&gt;
&lt;h3 id=&#34;2-coordination-between-multiple-ai-tools&#34;&gt;2. Coordination Between Multiple AI Tools
&lt;/h3&gt;&lt;p&gt;Many people&amp;rsquo;s real workflow is no longer based on a single AI tool.&lt;/p&gt;
&lt;p&gt;It may look like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one tool writes code&lt;/li&gt;
&lt;li&gt;one tool researches information&lt;/li&gt;
&lt;li&gt;one tool generates images&lt;/li&gt;
&lt;li&gt;one tool organizes documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Previously, these tools were connected by manual copy and paste.&lt;br&gt;
In the future, an Agent can become the middle layer: it opens tools, passes context, waits for output, and organizes results.&lt;/p&gt;
&lt;p&gt;This can turn &amp;ldquo;multiple AI tools working together&amp;rdquo; from a manual process into a semi-automated process.&lt;/p&gt;
&lt;h3 id=&#34;3-office-software-automation&#34;&gt;3. Office Software Automation
&lt;/h3&gt;&lt;p&gt;Spreadsheets, presentations, documents, and email share one trait: they are powerful, but many operations are fragmented.&lt;/p&gt;
&lt;p&gt;If Agents can reliably control this software, the barrier to office automation will drop noticeably.&lt;/p&gt;
&lt;p&gt;You do not need to remember where a menu is or learn complicated shortcuts.&lt;br&gt;
You only need to describe the goal, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;turn this spreadsheet into a monthly report&lt;/li&gt;
&lt;li&gt;make a one-page summary from this document&lt;/li&gt;
&lt;li&gt;combine these materials into a clearly structured explanation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tedious button operations will gradually be hidden behind natural language.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-ordinary-users&#34;&gt;What It Means for Ordinary Users
&lt;/h2&gt;&lt;p&gt;For ordinary users, this kind of feature may have a more direct impact than &amp;ldquo;the model got a bit smarter.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Because it lowers the operation barrier, not just the knowledge barrier.&lt;/p&gt;
&lt;p&gt;Many people can describe what they want, but they do not know where to click or how to combine features inside software.&lt;br&gt;
If Agents can take over this part, using a computer may become:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;I describe the goal
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Agent operates the software
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;I check the result
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That is closer to real productivity than simple chat.&lt;/p&gt;
&lt;h2 id=&#34;its-impact-on-software&#34;&gt;Its Impact on Software
&lt;/h2&gt;&lt;p&gt;If this kind of Agent capability continues to mature, software itself will also be affected.&lt;/p&gt;
&lt;p&gt;In the past, software design mainly served human clicking.&lt;br&gt;
In the future, software may also need to serve Agent operation.&lt;/p&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;interface elements need to be clearer&lt;/li&gt;
&lt;li&gt;operation feedback needs to be more stable&lt;/li&gt;
&lt;li&gt;local permissions need to be more granular&lt;/li&gt;
&lt;li&gt;software may provide interfaces better suited for Agent calls&lt;/li&gt;
&lt;li&gt;users may care more about whether software can be operated smoothly by AI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the long run, the boundaries between applications may become thinner.&lt;br&gt;
Users may care less about &amp;ldquo;which app should I open&amp;rdquo; and more about &amp;ldquo;what task do I want to complete.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;do-not-overhype-it-yet&#34;&gt;Do Not Overhype It Yet
&lt;/h2&gt;&lt;p&gt;Of course, it is not time to fully let go yet.&lt;/p&gt;
&lt;p&gt;This kind of capability still has several clear limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;stability still needs observation&lt;/li&gt;
&lt;li&gt;complex tasks may fail in the middle&lt;/li&gt;
&lt;li&gt;permission boundaries must be handled carefully&lt;/li&gt;
&lt;li&gt;account, payment, and file deletion operations should not be delegated casually&lt;/li&gt;
&lt;li&gt;quota consumption is not something you can completely ignore&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So at this stage, the best use case is not letting it take over the whole computer, but letting it handle low-risk, reviewable, step-heavy tasks.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;organizing materials&lt;/li&gt;
&lt;li&gt;generating drafts&lt;/li&gt;
&lt;li&gt;moving content across tools&lt;/li&gt;
&lt;li&gt;opening and checking files&lt;/li&gt;
&lt;li&gt;running semi-automated workflows that can be reviewed by a human&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;one-last-line&#34;&gt;One Last Line
&lt;/h2&gt;&lt;p&gt;The real importance of this Codex update is that it pushes AI from &amp;ldquo;answering questions&amp;rdquo; toward &amp;ldquo;operating the environment.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the short term, it is a computer use feature.&lt;br&gt;
In the long term, it may mark a shift in how personal computers are used.&lt;/p&gt;
&lt;p&gt;In the future, we may spend less time remembering buttons, finding menus, and switching windows.&lt;br&gt;
More often, we will describe the goal, let an Agent execute it, and then let humans make the final judgment.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
