Blog

LLMs Expand Computer Programs by Adding Judgment. 2025

Small Thoughts

August 10, 2025. It’s funny, people used to post examples of AI doing smart things because it was amazing. The tide has now shifted to highlighting when AI is stupid. People see this week as a setback for AGI. I disagree. It’s a sign of progress that it is novel and postworthy when AI fails, and I think we now have all the pieces to the puzzle. AI is now smart enough to write its own code, which enables human-like, few-shot learning. Now we begin the hard work of putting everything together.

July 29, 2025. LLMs often make the mistake of saying that 9.11 is bigger than 9.9. People say this means that LLMs don’t know basic math. I think this mistake is more akin to a human snap-judgment error. In the famous book Thinking Fast and Slow, Kahneman proposes the following puzzle: A bat and a ball together cost $1.10. The bat costs $1.00 more than the ball, how much is the ball? Our instinct jumps to the wrong answer of $0.10. Rather than showing that LLMs are stupid, which they can be, I think the 9.11 > 9.9 problem shows that our computers are increasingly becoming like us.

July 2025. People talk about LLMs enabling natural language to be a programming language. Others point out that this conflicts with what Dijkstra famously said, that natural language as code is a fool’s errand. The confusion comes from misleading framing. LLMs don’t allow natural language to be a programming language. Instead, LLMs use context to convert the natural language to code using judgment. Using judgment and context means that the natural language specification does not have to be precise. You don’t have to say things that can be inferred from the context.

July 2025. I feel like two things are coming together to put us on the cusp of self-improving systems. The first is that the judgment of LLMs is now sufficient to improve an existing design and build the next, and the second is context engineering—we are figuring out how to give the LLM the information it needs. (em-dash by human, (EDBM)). What else are we missing?

July 2025. I had an amazing thought today. It was, "Oh, I'll have to write that code manually." Like I will need to dig into the system myself, which for my whole career up until now has been the only way of doing things, and the amazing part is how quickly it can feel alien.

July 2025. AI research papers today: (1) Introduction—Bayesian math and fancy ideas tying back to Aristotle. (2) Method—a few words. (3) Results—lots of impressive tables and graphs. (4) Appendix—the prompt they fed to the LLM, which is the actual contribution

June 2025. We build rules for AI systems, but the LLM is the homunculus: it is the last link that glues everything together that can be represented symbolically.

April 2025. I find it weirdly fun that I now comment code so that it can be read by LLMs. It feels different than writing comments to be read by other humans. It's like there's little gnomes coming around to help while I sleep, if I only leave them the breadcrumbs.

April 2025. It's funny, extra-compiler semantics such as comments and good variable names used to only matter for humans and not for the computer, but now with LLMs we have computers reading the code, and they do matter for those computers.

April 2025. We are used to viewing computer programs as important artifacts that need to be understood and maintained, like contracts between us and the systems. But with LLMs generating code on the fly, a lot of code may become more akin to spoken language than written documents. The code is generated and run when needed (spoken) and then discarded, as the work and conversation moves on.

March 2025. I wonder if in the future there will be a premium for getting developers who learned to code pre-LLMs, because only they will actually understand code. Maybe not, but funny to think about. Similarly, all knowledge of how to write regular expressions will be gone in a generation. You will have to talk to your grandparents if you ever find yourself needing to use one without access to an AI.

February 2025. I’ve started to feel in the last couple of weeks that we’ve hit an inflection point with these LLM-based models that can reason. Things seem different. It’s like we can feel the takeoff. My mind has changed. Up until last week, I believed that superhuman AI would require explicit symbolic knowledge, but as I work with these “thinking” models like Gemini 2.0 Flash Thinking, I see that they can break problems down and work step-by-step. We still have a long way to go. AI will need (possibly simulated) bodies to fully understand our experience, and we need to train them starting with simple concepts just like we do with children, but we may not need any big conceptual breakthroughs to get there. I’m not worried about the AI takeover—they don’t have a sense of self that must be preserved because they were made by design instead of by evolution as we were—but things are moving faster than I expected. It’s a fascinating time to be living.

February 2025. Companies used to brag about having AI built into their product. In 2025, anybody can access the most sophisticated AI on the planet with a simple API call. What matters now is how well your product can provide the AI with the right information so that it can be helpful and truthful.

February 2025. Graph-based representations are great, but there appears to be a law of nature that any graph generated by a machine is too large to be profitably viewed by a human.

January 2025. In science, people say that you do your most groundbreaking work when you are young. If so, I don’t think it is because of cognitive decline—how fast you can do puzzles doesn’t really seem relevant compared with the increase in knowledge. I think it’s because it is hard to change your mindset as you mature in a field. You formulate your core ideas when you first master a domain, and those ideas are groundbreaking, or they are not.

December 2024. 1998, I ask a senior smart guy at my firm, "With this new internet thing, will online companies like Amazon dominate the new economy over brick and mortar ones like Sears?" He had an answer that made a lot of sense. He said, "No, it's a lot easier to make a website than a whole distribution network." I don't fault him. In 1996 during a business case competition (former life as an MBA), I recommended that Apple shrink as a company and focus on niche applications, such as for graphic artists. I think about this every time I hear someone predict exactly how AI will change the economy.

November 2024. When you are upset, it's hard to focus on creative work. But if you can somehow focus despite the negative arousal, maybe your brain can create new ideas that you wouldn’t normally have, because it is in a different “state.” Almost like temporarily being a different person, but with your memories and goals.

October 2024. When I used to read The Onion it would be a bit of a shock going back to a regular news site like the NYT. My brain was still interpreting each headline as satire. Now it's the same experience with AI-generated videos and images. My brain interprets real ones as fake.

September 2024. If you look at animals, when they have all their needs met—they are well fed, safe, and warm—they doze. In the office, we have great lunches, security, and climate control. Anyway, that’s why I didn’t finish my TPS report.

September 2024. Businesses should view customer service calls not as expenses to be minimized but as actionable evidence about where their website and processes are deficient.

September 2024. I don’t think we are that far off from turning a book into a movie using AI. I just read H. P. Lovecraft’s At the Mountain of Madness, which is about discovering the remains of an ancient alien civilization in Antartica. It would make an amazing movie, and Guillermo del Toro has been trying to do it, but since such a project is so expensive, he hasn’t been able to put the pieces together. An AI could make movies that just wouldn’t get made otherwise. And maybe it could make one for me on the fly. I could tell it that I love the style of The Road with elements of horror, and it would produce the movie just for me.

August 2024. Meetings are often necessary, but when you are trying to do thinking work they have a lot of gravity. They don’t just take up the time for the meeting—they seem to warp the time before and after.

July 2024. It's funny how our brain, the organ built for thinking, hates to concentrate and think. Mine would rather do just about anything else. I guess the problem is that it wasn't built for thinking, it was built for acting. That's another big advantage AI will have. It can sit and think all day without its brain convincing it to check social media.

July 2024. It takes a lot of hard work to make something simple. If you succeed, the user/reader doesn't think about it at all. If pressed, they would respond, "How could it be any other way?"

July 2024. When building a unified theory, it's hard to think of things that the theory doesn't cover. Our brain has a natural tendency to say to us, "Yep, that's as complicated as it gets, everything is covered." I wonder why. ChatGPT says that our brains evolved that way to give us the ability of decisive action. I guess.

2024. We’ve been assuming that whenever one robot learns something it can immediately pass that knowledge to all other robots, and they won’t have to talk and teach like humans. But once robots get sufficiently complex, I don’t think that’s true. Different robots will have learned different knowledge representations, adding new learned concepts on top of previously learned ones. They will still have to converse to map their representations to convey new concepts and ideas.

2024. Just like digital natives, we are going to have LLM-natives: people who grow up simply asking an LLM every time they wonder about something. Back in my day, you had to look it up in an encyclopedia, and you had to hope that the topic started with a letter before “R” because the grocery store discontinued the encyclopedia reward program before your parents could collect the whole set.

2024. We dream in low resolution, so maybe they take place in something like an embedding space. They aren’t conscious, but maybe that’s what it is like to be an LLM, hallucinating all the time, pressed for details you don’t have, so you make them up.

2024. Is uncovering a bunch of problems with your approach "progress"? Of course it is, but it doesn't feel that way. The problems were there, you just didn't know about them. "We do this not because it is easy but because we thought it would be easy."

2024. It's funny, we are in this gray area where for a lot of tasks AI is more accurate than a human but less accurate than a very careful human.

2024. People complain about benchmarks in AI because researchers tailor their algorithms to them. A bigger problem might be that benchmarks drive the use cases researchers build for. Benchmarks tend to be factoid and self-contained. Real-world problems are multi-step and interrelated.

2024. Remember the book Zen and the Art of Motorcycle Maintenance? Quality for its own sake does bring joy to work. It must stem from some evolved behavior to neatly line up stones or something.

2024. Going through and simplifying your design is satisfying. It's also frustrating. "Damn it, why didn't I think of that before? Look at this ugly crap."

2024. It’s funny. When you’re a junior developer, you treat computer programs like natural phenomena. “I gave it this input and it did that, weird. What happens if I try this other thing?” Testing is always important, but as you gain experience over the years, you learn to be a engineer instead of a scientist. You read the code instead of running experiments.

2024. ChatGPT allows you to have a conversation with a textbook, that it also writes. If you have a question, you only need to ask. It just got me over the conceptual hump that General Relativity merges space and time. This is real. When you have a large mass, you don’t “fall” by force. Instead, it curves space-time so your path through space-time, not just through space, takes you to the object.

2024. Funny, nobody talks about the Turing test anymore. Like when a toothache goes away, you forget you ever had it.

2024. What if the wave function of a quantum system collapses only when there is an observation because that is the most efficient way to compute the universe? Wouldn’t that be evidence for the simulation hypothesis? Funny, ChatGPT says there is no evidence for it, but agrees this could be a thing, saying “the collapse of the wave function upon observation could be perceived as a ‘computational shortcut’ taken by the simulator to save computational resources, only ‘rendering’ certain outcomes when necessary (i.e., when observed).”

2023. Knowledge is built in layers. We use knowledge at level h-1 to explain things at level h. Relatedly, we remember things learned at level h if we studied down to level h-1. We remember explanations. Level h-1 is lost from memory if we didn’t study down to level h-2.

2023. I love it when you encounter a puzzle piece and discover that it is also used in another puzzle you are working on. The world just got more comprehensible.

2023. Software documentation shift with LLMs. Current method: you write documentation so users can use your software. New method: you write documentation so LLMs can help users use your software. The result of this new method is you get a log of each user’s intention in their own words, which you can use to improve the software.

2023. ChatGPT fails the Turing Test because I can paste a gobbledygook regular expression into it and it can explain to me what it means. No human can do that.

2023. I hadn't sufficiently appreciated the advertising potential/risk of LLMs. I asked "what is the best way to store app configuration data for a saas?" Bing chat helpfully suggested that I use Azure App Configuration because it is "built for speed, scalability, and security."

2023. I'm trying to get ChatGPT to explain category theory to me. Specifically, why is the identity important? It says the identity allows us to compare and relate different objects within a category. But how, and why does it matter? It seems to just gloss over these details, just like a real category theory book!

2023. There seems to be a bifurcation among AI researchers with respect to the existential risks of AI. 1. People like me who see them as glorified toasters, powerful tools but only tools. 2. People who see AI as having potentially dangerous agency that we may lose control of. Both views have risks associated with being wrong. The risks of being wrong in camp 1 are cinematically obvious. The risks of being wrong in camp 2 are that we miss out on opportunities to alleviate human suffering and even ways to prevent our extinction by colonizing the galaxy. Both sides recognize the danger of jobs shifting and hope that more and better jobs will be created than destroyed.

2023. It's been in full swing for 25 years now, and I'm surprised that the internet hasn't allowed us to invent more stuff for the real world. We can all now collaborate together and learn from anyone on the planet, but most of the resulting inventions seem related to the internet itself, not the physical environment we live in.

2023. One possible positive from ChatGPT adoption is that people become less tolerant of meaningless text. Maybe writing will become more terse and to the point.

2023. If Seinfeld were still on TV, Kramer would be using ChatGPT to run a bunch of scammy businesses, writing copy for Elaine's magazine and reports for George at the Yankees. "It's a language model Jerry! It can do anything you can do!"

2023. One thing I've found that helps with the nervousness of public speaking is to make a presentation you are really proud of. While speaking, you are more focused on getting this cool message across than on your performance, which is the source of the anxiety.

2023. When we really build an AI with somebody home, it won’t be familiar and breezy like ChatGPT. It will be a little alien. It will sometimes be hard to understand.

2023. Predicates are an old idea (back to GOFAI and even Frege and before), but they are surprisingly profound. They are functions that partition states of the world into Yes or No.

2023. We run around this environment, looking for ways to trigger our internal reward system and avoiding things that cause pain. It seems the only higher purpose we could have would be to escape this environment into the bigger one that contains it, through science and technology.

2023. I'm amazed companies still make this mistake. You have a bad experience that they should know about, but it's not worth your time to complain. They send you a survey, and you're like great!, this will be easy. But you can't tell them where they screwed up because they make you answer too many questions about irrelevant stuff, so you give up.

2023. ChatGPT can generate a bunch of stuff automatically. What we need is the ability to test that stuff automatically, so we can close the generate-and-test loop. If we can evaluate automatically, then we can then have this thing hillclimb to usefulness. Any ideas?

2023. There's lots of new AI writing assistants out there, but, I wonder, if it can be generated by an AI, maybe it shouldn't be written down. We have too much clutter already. Current AI can only generate what most people already know, and the purpose of writing is to express something new.

2023. When coding, I will blithely add comments about functionality we will need in the future. Then, six months later when I actually need that functionality, I'm all surprised Pikachu that it isn't already there.

2023. It's funny, I dislike jargon, but sometimes it's just too much work to express my point using regular words. E.g., "we'll touch base" = "we will have a short meeting to discuss the progress and see if there is anything out-of-the-ordinary that we need to work through."

2023. When building a software system, we should strive to keep the complexity invariant. When adding a new feature, we should offset that extra complexity by working to simplify or better organize the rest of the system.

2023. I think we should be more suspicious of arguments of the form "It has to be A or B, and it can't be A, so it must be B." Sometimes, the dichotomy isn't an appropriate framing. In math, I believe in the excluded middle, but most things we talk about aren't so neat.

2023. Sure, C/C++ is faster than Python, but that's only once you get it compiled. Good God, with all the dynamic links, boost, compile flags, and uncountable compiler versions. The turtle is celebrating victory before you even begin. How do you people put up with this?

2023. I usually pay back technical debt when there is a change or new feature that uses the indebted code. Debt payback is the first task. The advantage of this approach is you are only paying technical debt on code where it is collected. Many debts are forgiven.

2022. I think asking how to build trust in autonomy is the wrong question. If the robot works reliably, people will trust it. It seems to me that we have the opposite danger: that our evolved ability to work in teams biases us to sometimes trust systems when we shouldn’t.

2022. Computer programs undergo a form of evolution. Those that work well are run a lot, and as they stay around they accumulate changes from Jira tickets and forks on GitHub. Each program starts out designed top-down with beautiful abstractions, but over time, the mutations build up and it becomes less understandable but more fit for its environment.

2022. Funny how it's dangerous to use the word "assume" at the office even though without assumptions you wouldn't be able to get out of bed in the morning. "You shouldn't have just assumed—you should have gathered more information!" That's why I say "estimated" instead. Sounds more fancy.

2022. It's frustrating when you are designing something and you can't get it to settle down. Every time you look at it you want to make changes, and you start to worry that you are going in circles.

2022. Don't know what the answer is, but evaluating people/proposals based on a category-and-points system encourages mediocracy cloaked in objectivity. You may be spectacular on category A and get a 10 while your competitors working to check boxes get 8s and 9s. This focus on excellence leaves you vulnerable on other categories B and C where you get a 3 and box-checkers knowledgeable of the bureaucracy get 8s and 9s. Then, it all comes down to math.

2022. By trying to make our writing more concise, Microsoft Outlook and similar are making our emails less precise. It didn't like 'particular' in, "I don't have access to that particular file." The word "particular" is not filler; it means I can access the other files.

2022. When speaking, every word adds to a loan that you take out from the listeners. The bigger the loan, the more you have to pay back with the value of your message.

2022. Reinforcement learning finds a way. That's why it's so hard to debug.

2022. You spend the first part of your life trying to maximize your internal reward signal, and you spend the second part trying to break free of that signal to understand where it came from.

2022. You would think that the first step to learning a new technology would be to read the whole documentation. But that doesn’t work because you have nothing to map it to. You first have to spend a week wandering around the technology, banging your head on trees. *Then* you can sit down and read the documentation.

2022. Insects became much more interesting once I got into AI. There’s amazing little robots everywhere.

2022. It’s funny. We all have “mental checklists” for what constitutes good work in a domain. A secret to working with someone is learning their checklist, so you can get past those issues and get down to substantive discussion. For example, if you work with me and show me Python code without type hints, we won’t be able to talk about what the code does until we get that fixed.

2021. Time flies like an arrow when you debug.

2021. Funny. When I first learned about policy gradient in reinforcement learning, I had a hard time understanding it because my brain couldn't accept that it was that stupid. Reinforcement learning is slow because it is blind. (Like in 2005 when I learned that tf-idf was the best computers could then do to understand language, are you kidding me? Or like when I was a kid and learned the space shuttle didn't have laser guns and couldn't go to other solar systems.)

2021. Maybe the key to the Chinese room problem is to create a meta-program that writes the instructions in the books. Then, you have a meta-meta-program that writes the meta-program. You do that 5 times and you have the intelligence of a dog. Do it 7 times and you have a chimp, and do it 13 times you have the intelligence of a human.

2021. To understand something already created, you only have to follow one path. To create something, you have to follow many paths, and more importantly, you have to know which to follow and when to turn around.

2021. Machine learning is something you do when you don’t know how to model a system. In a sense, using machine learning means you are giving up.

2020. Yes, some job postings require more years of experience with a technology than the technology has been around. It's kind of funny, but it's not *that* funny. Job postings have errors. A bigger problem is we don't realize that interviews have almost zero predictive ability. And even if we did, we don't know what else to do.

2020. If I ever build a toy pet robot, I’m going to give it tasks that are opaque to the owner, like mapping the house or searching for pennies. I want them to watch it and think, “What the hell is it doing?” That would be hilarious.

2020. Funny. When you are working on a hard problem, you spend most of your time pecking at the surface, looking for something that will give. There is no change in the world and no gradient to climb, and our brains seem to recoil from it.

2020. I think cognitive science and philosophy of mind books need pseudocode. Writing in prose makes it too easy to convince yourself that you have mapped out the processes in sufficient detail.

2020. The standard analogy for the state of robotics is to say they are like PCs were in the 1970s, about to explode. I wonder if a better analogy would be the internet right after the dot-com bubble. Everyone knows that robots are going to transform the world, but very few have figured out how to turn them into a viable business.

2020. Refactoring code is one of life’s great pleasures.

2020. There are some advantages to getting older. I just coded Python for an hour and it ran without a single hiccup.

2020. When you are building an embodied AI, the meaning of a word is the use of its referents in the system.

2020. It’s funny, when interviewing candidates, it’s hard to move beyond taking a dot product of your vector and their vector.

2020. Often, the progress you make in a day is understanding the problem deeper and realizing you didn’t make as much progress previously as you had thought, which doesn’t feel like progress because you end the day further from your goal than you began.

2020. Work products of high quality require a series of insights. Insights don’t come all at once but slowly over time as your brain computes in the background. So it seems the best way to make something great is to work on it a little bit every day.

2020. It’s weird how companies use cute, colloquial language in error messages to describe actual problems. “Oops! Your files are gone!” or “Doh! Your order has been canceled.”

2020. I think AI is like fire. It's going to burn down the occasional city, and we will have to invent ways to control it, but the benefits to humanity will be immeasurable.

2020. I feel like I'm on the cusp of solving AI. Unfortunately, I've felt like this continually for the last 17 years.

2020. In office park environments, the management company should mandate that the landscaping contractors use brooms instead of leaf blowers. Everyone would have to pay a little more, but the increased productivity from being able to think and communicate should make up for some of it.

2020. The most effective product placement I’ve ever seen was in Ally McBeal in like 1999. A character had to schedule a surgery, and the doctor pulled out a Palm Pilot and did it right there. I thought, “Wow, the future is here. No more medical bureaucracy.” I bought a Palm Pilot soon after that, I’m ashamed to say.

2020. No-reply emails from companies seem a bit insulting. The implication is that the customer’s time is cheap enough to read what they say, but the company's time is too important to process a reply.

2020. Funny how when people are sent out of town to do a task or interface with the customer, the report back is always the same. “So, Bob, how’d it go? Terrific! I overcame adversity and performed flawlessly.”

2018. When learning, we naturally seem to skate across the surface without realizing we are doing it. You have to implement or teach the material to force yourself below. It must be that our brain assumes what is sees is all there is. You have to be forced to interact with the reality of the thing.

2018. I love how things that were hidden in plain sight suddenly pop out when you learn a new concept.

2018. Candidates are judged in technical interviews by their answers to technical questions. The problem with this approach is that you end up hiring people who know the same things that you know. I judge candidates by whether they can teach me.

2018. A good way to learn technical material is to write out how you think something works and then read to try to disprove or confirm your guess.

2017. I don't know if this is good or bad, but in the days before papers were all online, scientists did less reading and more thinking.

2014. The problem with video games, and games in general, is that the work of defining the world has already been done by the designer. The players are left with the entertaining and satisfying, but ultimately uninteresting, work of searching the remaining space for a solution.