Rendered at 02:43:09 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
mfro 1 days ago [-]
I think you're misunderstanding the paradigm shift completely -- AI does not just generate code N(x) more quickly. It thinks N(x) faster, it researches N(x) faster, it tests N(x) faster. There are hundreds of tasks that you'll find engineers are offloading to AI every day. The major hurdle right now is actually pivoting LLMs from just generating code: integrating those tasks into workflows. This is why tool-use and agentic workflows have taken engineering by storm.
michaelchisari 1 days ago [-]
Debugging, sanity checking, testing, etc. are the best uses of LLMs. Much better than writing code.
Developers should write their own code and use LLMs to design and verify. Better, faster architecture and planning, pre-cleaned PRs and no skill atrophy or loss of understanding on the part of the developer.
jb1991 1 days ago [-]
Funny, I have the complete opposite impression after using claude code for a while. I would never trust it to design anything. Never again. But it can code pretty well given a very tight and limited scope.
michaelchisari 1 days ago [-]
To clarify, AI should not do the design itself. You develop the design in conversation with AI.
I come in knowing what I need to build and at least one idea or more of how it should be done. I present the problem, constraints, potential solutions, and ask for criticisms and alternatives. I can keep it as broad as possible or I can get more granular like struct layouts, api endpoints, etc. I go back and forth until there's an approach I prefer and then I code that approach.
| it can code pretty well given a very tight and limited scope.
It's wildly better at tight and limited scope than large scale changes but even then I would rather code it myself.
radarsat1 1 days ago [-]
> It's wildly better at tight and limited scope than large scale changes but even then I would rather code it myself.
One thing I would like to see is the use of LLMs for smarter semi-manual editing.
While programming I often need to make very similar changes in several places. If the instances are similar enough I can get away with recording a one-off keyboard macro to repeat, but if there are differences that are too difficult to handle this way I end up needing to do a lot of manual editing.
It would be nice to see LLMs tightly integrated into the editor so I can do a simple "place the cursor at things like this" based on an example or two. I'm sure more ideas for using LLMs more quickly perform semantic changes you intended are possible, instead of just prompting for a big diff. I feel there's a lot more innovation possible in this direction, where you're still "coding it yourself" but just faster.
strange_quark 1 days ago [-]
I've had a similar thought. A super refactor feature would be amazing, but wouldn't fit into the current zeitgeist of agent everything. Hopefully as the hype starts to die down and prices go up, we'll get some of these smaller, more targeted features.
empthought 1 days ago [-]
You don't need a special feature for this. Just tell the coding assistant what to do.
soco 19 hours ago [-]
Then watch it f'up half your codebase because it thinks it's slightly related to your examples. The alternative, giving it 10 examples, is actually more work.
empthought 8 hours ago [-]
I don’t think you’ve actually used any of these tools. 10 different examples in the same session would almost certainly make them perform worse.
empthought 1 days ago [-]
You should try using the existing agents for your semi-manual editing. You don't need editor support. The coding agent can find "things like this" faster than you can. Just tell it what to look for and how to change it.
What I did was make one commit by hand (involving multiple files), and then told Codex (last year's Codex!) to make the equivalent changes to other instances in the code base.
xixixao 21 hours ago [-]
Wait, have you been using Cursor? This is exaclty what it does fairly well.
skydhash 1 days ago [-]
> I come in knowing what I need to build and at least one idea or more of how it should be done. I present the problem, constraints, potential solutions, and ask for criticisms and alternatives
Never understood that argument. Because there’s two steps in design. Finding a good solution (discussing prior art, tradeoffs,…) and then nailing the technical side of that solution (data structures, formula,…). Is it the former, the latter or both?
redsocksfan45 16 hours ago [-]
[dead]
dyauspitr 1 days ago [-]
They’re actually really good at both. Writing code and all the paraphernalia around it.
prmph 20 hours ago [-]
> It thinks faster
It does not actually, and not any faster.
Again I've lost count of how many times I've had an in-depth architectural discussion with ChatGPT, with it giving me the final mark of approval ("This is excellent"), only for me to discover a flaw in my approach or a radically simpler and better approach, go back to it with it, and for it to proclaim "Yeah this is a much better approach".
These LLMs are in many cases sycophantic confirmation machines. Yes, they are useful to some extent in helping you refine your ideas and think of edge cases. But they are nowhere close to actually thinking better and faster. Faster in the wrong direction is not just slow, you are actually going backward.
gerdesj 1 days ago [-]
"paradigm shift"
A paradigm shift is an earth shattering, very important change - a complete change in thinking etc. LLMs are not that. They are simply some pretty new tools. Nice tools but they will whip off your metaphorical thumb just as quickly as a miss-used table saw.
You'll note that you mention "engineers are offloading": that's not a paradigm shift. That's a bunch of engineers discovering a better slide rule.
I'm old enough to remember moving on from slide rules (I still have mine) through calculators (ditto) to using fag packets and napkins for their real intended purpose.
The drill-driver also took engineering by storm but no-one ever used the term paradigm shift (to be fair, I don't think it was invented at the time and I can't be arsed to look it up).
aspenmartin 1 days ago [-]
I would argue LLMs are possibly the largest paradigm shift the world has ever seen, and we are only at the beginning. The entire scaffolding and structure of programming is in the process of changing — coding has moved to orchestration and testing and governance of how to manage and productionalize code that has surpassed the capacity of human review.
If this sounds melodramatic it’s likely that it hasn’t fully taken root where you are yet.
I see opinions split on like “it’s just a dirty untrustworthy tool that is making our lives and the world a living hell” and “this is the second coming of Christ”. The reality is that right now we lie on that first part of the spectrum, but I am looking over the hill and seeing 4 horses and they are stampeding this way
gerdesj 2 hours ago [-]
"I would argue LLMs are possibly the largest paradigm shift the world has ever seen"
LLMs are next token guessers with knobs on. That is not a paradigm shift.
nothinkjustai 23 hours ago [-]
No offense but this reads like AI psychosis
aspenmartin 14 hours ago [-]
Well I’m not offended but it sounds like you may not be paying attention? Do you know the capital outlay that has gone into infra buildouts? several people here have described “6 months” of AI mania—-the fact that people are saying 6 months is exactly the point. Development has been going on since 2010s. All of the “boosters” as HN likes to say have been saying “hey this thing is huge and the performance trends are startling, get ready” and people then say “that’s psychotic I can’t even get Siri to understand my name”. Sure enough, 6 months ago we hit a performance inflection point where “madness” has begun. That’s just when you started paying attention, the rate of change has not stopped. Pretty easy to predict what happens next…
taneq 14 hours ago [-]
Maybe it’s time to ask Siri again, “hey are you smart yet or are you still just a script?”
If she says “I’m sorry, I don’t know how to are you still just a script” then I have my answer. :P
LLMs are remarkable these days but they’re still missing a some essential insight. I’m far less confident now, though, that this will require another big breakthrough and not just a combination of tweaks.
aspenmartin 13 hours ago [-]
I accept your skepticism is all I can say but just consider we’re not talking about the most important numbers and topics in this conversation. We have a lot of mileage left in the current stack. Nothing is plateauing though you wouldn’t know it if you read HN.
bigtex88 11 hours ago [-]
Siri is not the same as an LLM in this context but thank you.
nothinkjustai 10 hours ago [-]
What does Siri have to do with anything?
Honestly your posts read like satire. You’re treating LLMs like a religion and the release of Opus 4.6 as like some type of rapture. Idk man, if it’s some sort of bit or false flag thing well played, if not…well good luck.
aspenmartin 8 hours ago [-]
What exactly do you find satirical?
- obviously LLMs are not a religion I’m using it to illustrate a point
- 5-6 months ago was when agent perf hit a meaningful inflection point where adoption has exploded. It’s why people in this thread reference “the past 6 months” whether or not they realize we’ve been on the same path for years now
So to overextend the metaphor, opus 4.5 was really kind of the right fit for the rapture.
I mean no need to take any of this seriously, I have worked on benchmarks and measurement in an AI lab professionally for over 4 years now, in software and data science for 8 and before got a PhD in Astro, like I’m not some sort of armchair person with no understanding of this field. Though I do find it entertaining when my background in an AI lab is people’s favorite reason to dismiss this :)
I find that when people find stuff like this satirical they often don’t really know the industry or underlying mechanics that well. Not saying that’s you but as ridiculous as I apparently sound to you, do consider that sounds even more ridiculous to not understand the tsunami that is coming right for you…
lajsfoihasdf 5 hours ago [-]
[dead]
gerdesj 1 days ago [-]
[How did you bang out this: — on your keyboard? Why did you decide to use backticks and 66/99 for quotes - nice but its not you is it?]
Engage as a person, please.
aspenmartin 1 days ago [-]
I typed this out, character by painful human character, on an iPhone. It is indeed me!
aspenmartin 1 days ago [-]
Oh also! Two dashes on my phone converts to an EN dash I think (not em dash!)
taneq 17 hours ago [-]
Let’s test that — hmm, I think it did?
taneq 17 hours ago [-]
iOS automatically ‘replaces’ “quotes” with open/close quotes… and triple full stops with ellipses.
viking123 20 hours ago [-]
lmao
fatata123 16 hours ago [-]
[dead]
raincole 15 hours ago [-]
In the past half a century, product design went from making precise diagrams on paper to CAD. Do you think it's fair to call this paradigm shift or CAD is just some pretty new tool?
Leynos 21 hours ago [-]
When the nature of your job changes fundamentally in the space of a year, "paradigm shift" feels unsettlingly appropriate.
slopinthebag 23 hours ago [-]
"paradigm shift"
And it's literally just a black box that generates more Javascript for their Next.js app
oytis 1 days ago [-]
The article addresses exactly this objection. Most importantly, it quotes that AI coding tools have a detrimental effect on software stability - which is basically raison d'etre for our profession. When it produces more robust software and handles on-call shifts better than humans, I will consider programming done.
tptacek 1 days ago [-]
I'm excited to read the first cogent piece making this point that doesn't devolve to gatekeeping, a detached and vaguely hostile professional software developer telling people with a newfound capability to solve practical problems for themselves with new software that they don't or shouldn't want the thing that they want, because whatever it is they come up with won't be "fit for purpose" until blessed by the guild, which has bylaws extrapolated from Brooks about the fundamental "limitations of LLMs".
sov 1 days ago [-]
I think you've misread the article, specifically the purpose of the Brooks quotes. They're clearly not to denigrate anyone for wanting a more useful or convenient way of generating software--they're specifically about the "LLMs will obliterate software engineering as a profession" claims put forth by many LLM marketers. In fact, Brooks is never once mentioned in the "Power to the people?" section of the essay.
Generally, the whole point of the "Power to the people?" (and to some extent the "On being left behind") section(s) is to underscore the two antithetical claims made by many LLM marketers:
1. LLMs are so powerful and so natural and easy that someone with no experience can create amazing software, and
2. LLM usage is a core skill, one that if you don't begin training now you'll be left behind.
Obviously, both of these can't be simultaneously 100% true--either it's easy enough for the non-programming layperson to successfully generate software for an intentional purpose, or, LLM assisted programming is a skill you need to train to avoid professional obsolescence in modern society. So, the article disagrees with the majority of both claims, and accepts a weakened/minor portion of each:
1. LLM output is easy to generate but accurate prompting matters, and
2. when used for software development professionally, some amount of skilled human intervention does indeed seem necessary.
And now these two claims do align.
However, if professional software engineers who work with and read code constantly, armed with the best software practices to aid LLMs we can determine, cannot use modern AI tools without shooting their feet off at relatively frequent rates, certainly you'd expect the layperson who must put an even greater amount of undue faith in the validity of the results to be at extremely high-risk of foot-shooting. It's not "gatekeeping" to forewarn people against unwarranted trust in LLM output, nor is it "gatekeeping" to suggest that modern tech communicators/marketers describing an overly flowery LLM tooling landscape might be doing people a disservice.
oytis 1 days ago [-]
I am less sure about his argument about democratising software indeed. The only problem in my own life that I solve with software is a problem of getting paid, so what do I know. If someone can generate a piece of code for their needs, and they don't risk harming anyone but themselves, then it's a great application of LLMs.
ekidd 1 days ago [-]
The unfortunate reality is that a lot of software does have hard constraints. And a lot of these constraints are "gatekept" by regulators, compliance policies, insurance companies, etc. If someone slops together a medical record system, and leaks a bunch of PHI, there will be consequences, even in the US. Similarly, good luck getting insurance against cyber attacks without a SOC2 audit or equivalent.
I've had this conversation with managers in multiple organizations this year: "Yes, you could totally vibe code that instead of paying for a SaaS. But you have strict contractual and professional obligations about data security. Do you want to be deposed and asked, 'So, did you really just vibe code the system that led to the data leak? Did the vibe coders have any professional qualifications? Did they even look at the code?'"
Similarly, a backend server that handles 8 million users a day is expected to stay up.
Now, there are 10,000 things that have less demanding requirements. I'm actually really delighted that people are able to vibe code their own tools with minimal knowledge of software engineering! We have been chronically underproducing niche software all along.
But if your software already has on-call shifts (and SLAs, etc) like the GP, then I think you want to be smart about how you combine human expertise with LLMs.
tptacek 1 days ago [-]
OK, I have no idea who you are, and this isn't personal, I'm responding to a comment and not a person --- but this is an argument that posits that one of the big problems with LLM software is "SOC2 audits". Since SOC2 audits are basically not a meaningful thing, I'm left wondering if the rest of your argument is similarly poorly supported.
It feels like a dunk to write that. But I genuinely do think there's so much motivated reasoning on both sides of this issue, and one signal of that is when people tip their hands like this.
ekidd 1 days ago [-]
No offense taken.
I was going to argue that companies got to choose their own auditors, so of course there were some bad ones out there. But looking at the market, it seems like (1) the race to the bottom has gotten ridiculous, and (2) the insurance companies do not currently trust the auditors in any meaningful way. So, yeah, point to you.
Once upon a time, I went through SOC2 audits where the auditors asked lots of questions about Vault and really tried to understand how credentials got handled. Sure, that was exceptional even at the time.
But that still leaves a whole pile of other audits and regulatory frameworks I need to comply with. Probably most of these frameworks will eventually accept "The code was written by an LLM and reviewed by an actual programmer." I am less certain that you'll be able to get away with vibe coding regulated systems any time soon.
tptacek 1 days ago [-]
SOC2 has never been about software resilience. You can create a set of attestations that will require you to present evidence to your auditors (who are ~accountants and will not know what the dotted quads of an IP address mean) about software quality, but there is no reason to do that and most organizations don't. SOC2 cares a great deal more about access management (in the "plotting on spreadsheet" sense) than it does about vulnerabilities.
My thing here is: you want to summon some kind of deus ex machina reason why the unpredictability (say) of agent-generated software will fail in the real world, but the concrete one you came up with fails to make that argument, pretty abruptly. Which makes me think the argument is less about the world as it is and more about the world as you'd hope it would be, if that makes sense.
yellowapple 1 days ago [-]
Since when are SOC audits not a meaningful thing?
kasey_junk 1 days ago [-]
If soc audits are driving your development process you are doing it backwards. And _certainly_ a time is coming when just using the llm will be soc compliant.
threecheese 1 days ago [-]
I’d think any company big enough or working in certain markets which has a Compliance Officer cares about this; regulations are a legitimate business risk, and software integration contracts have security control compliance requirements which very much impact the sdlc.
Would you have the same reaction to requiring an approval for a production deployment? That’s driving the development process.
—-
Also jfc I need to cool it with the buzzwords, sorry I just got home from “talk like this all day” $job
tptacek 1 days ago [-]
SOC2 is generally regarded as a joke and has in fact almost nothing to do with software resilience even on its own terms.
skydhash 1 days ago [-]
That’s why the biggest proponent of LLM tooling are managers and entrepreneurs (aka people that are incentivized to reduce costs due to salary costs). But anyone that has to keep the system running and doesn’t want to wake up in the middle of the night is rightly cautious.
kasey_junk 1 days ago [-]
I’m literally tasked with reliability engineering and llms are far and away the biggest boost in that in my career.
threecheese 1 days ago [-]
To be fair, that’s a role which most companies don’t have; even if they have a titled “SRE”, many times it’s a sysadmin in a hat, looking very tired and nervous. It must be fun right now tho
cfloyd 1 days ago [-]
Nailed it
paganel 1 days ago [-]
> , it tests N(x) faster.
It does? You mean "it tests itself faster", which is not really a test now, is it?
cfloyd 1 days ago [-]
I use one model for coding and another writing tests for that very reason. It’s surprisingly good at TDD
guille_ 1 days ago [-]
I find this fascinating because it's the sort of anthropomorphism that betrays a fundamental understanding of what an LLM is. Language models are not people. You can just achieve the same thing with a fresh context window. The only solid technical reason you'd want a different model is if you find a certain model produces better code and another produces better reviews. Nobody has really tested this, of course.
Izkata 22 hours ago [-]
I believe the theory isn't that one is better than the other, but that different models would make different mistakes, so you can be more confident in the places where the code and tests agree.
kefirlife 1 days ago [-]
I read that to mean you can arm it with a harness that you design informing the user that tests pass. A LLM can leverage this to run tests faster than I would run the same harness myself. You can then have any programmatic logic needed to support that usage sufficient to cover your use case and have a degree of certainty that the product at least passed those tests.
imiric 1 days ago [-]
> The major hurdle right now is actually pivoting LLMs from just generating code: integrating those tasks into workflows.
Funny, I thought that the major hurdle is improving accuracy and reliability, as it's always been. Engineering is necessary and useful, but it's a much simpler problem, which is why everyone is jumping on it.
mfro 1 days ago [-]
As much as that’s true it’s clear a huge amount of people have accepted the current state and are working around it, successfully(in terms of ticking an executive’s checkbox) in a lot of cases. And it’s worth considering we’re seeing strong strides outside of model quality in the tooling and integration
zapataband1 1 days ago [-]
I think you are misunderstanding something, AI does not think, it is a token prediction algorithm.
pingou 1 days ago [-]
Not sure why you are downvoted but I agree. Additionally, perhaps LLMs are just like another higher programming language as the author said, and they still need someone to steer them.
I'm sure it was very difficult to program in machine code, but if now (or soon) anyone can just write software using a LLM without any sort of learning it changes everything. LLMs can plan and create something usable from simple instructions or ideas, and they will only get better.
I think LLMs will be (and already are) useful for many more things than programming anyway.
smartmic 1 days ago [-]
> I'm sure it was very difficult to program in machine code, but if now (or soon) anyone can just write software using a LLM without any sort of learning it changes everything. LLMs can plan and create something usable from simple instructions or ideas, and they will only get better.
Did you read the section "Power to the People?" ? In it, the author dismantles your thesis with powerful, highly plausible arguments.
hombre_fatal 1 days ago [-]
I read that section but I disagree with it.
1. You don't have to be an LLM expert to get good, consistent results with LLMs.
My best vibe-code process after years of using LLMs is to have Claude Code create a plan file and then cycle it through Codex until Codex finds nothing more to review, then have an agent implement it. This process is trivial yet produces amazing results.
It's solved by better and better harnesses.
2. You don't have to write technical specs. The LLM does that for you. You just tell it "I want the next-tab button to wrap back to the first one" and it generates a technical plan. Natural language is fine.
3. Software that seems to work only to fail down the line in production is already how software works today. With LLMs you can paste the stacktrace or user bug email and it will fix it.
This is why vibe-coding works. Instead of simulating how an app will run in your head looking at its code, you run the app and tell the LLM what isn't working correctly. The app spec is derived iteratively through a UX feedback look.
4. I don't understand TFA's goalposts, but letting people create software that are only interested in the LLM process (rather than the software craftsmanship) would be a huge democratization of software.
prmph 20 hours ago [-]
This sounds like someone who have never had to write serious software.
> 1. You don't have to be an LLM expert to get good, consistent results with LLMs.
You don't get good consistent results with LLMs, expert or not
> 2. You don't have to write technical specs. The LLM does that for you. You just tell it "I want the next-tab button to wrap back to the first one" and it generates a technical plan. Natural language is fine.
Try this, have Claude write a section in your specs titled "Performance Optimizations" and see the gibberish it will come up with. Fluffy lists with no actually useful content specific to the project. This is a severe problem with LLM-driven speccing I have encountered uncountable times. I now rarely allow them to touch the specs document.
> 3. Software that seems to work only to fail down the line in production is already how software works today. With LLMs you can paste the stacktrace or user bug email and it will fix it.
And pretty soon you have a big ball of mud. But I guess if the rate of bugs accelerate, the LLMs can also "fix" them faster
> This is why vibe-coding works. Instead of simulating how an app will run in your head looking at its code, you run the app and tell the LLM what isn't working correctly. The app spec is derived iteratively through a UX feedback look.
I should tell you about the markdown viewer with specific features I want, that I have wanted to build only with LLM vibe-coding, and how none of them are able to do it.
hombre_fatal 12 hours ago [-]
> This sounds like someone who have never had to write serious software.
Why the insult? You never know who you're talking to on HN.
Your points have to do with process failure, not intractable LLM limitations. Most of which already apply to human-conceived software.
Your "Performance Optimizations" bit exemplifies this since you baked in the assumption that it will have no connection with your project. Well, why not? You need to figure out how to use your source code and relevant data as ground truth when working with LLMs.
A markdown viewer is on the simpler side of things I've built with LLMs, so this too suggests that you have a weak process. A common mistake is to expect LLMs to one-shot everything (the spec, the plan, or the actual impl). Instead you should use LLMs to review-revise-cycle one of those until it's refined, ideally the spec/plan since impl is derived from it. You will have much better and consistent results.
I recommend finding an engineer you respect/trust that has found a way to build good software with LLMs, and then tap them for their process.
prmph 11 hours ago [-]
Thanks for your response. I did not mean to insult; my mild jab was meant to draw attention to the idea that using LLMs for serious production software is a whole different game than using them for casual software.
You said
> Your "Performance Optimizations" bit exemplifies this since you baked in the assumption that it will have no connection with your project. Well, why not?
OK, I am talking from experience. Using LLMs for speccing is almost useless above certain complexity levels; what you get is an assemblage of the most average points you can imagine, the kinds of things almost every project in the category you are working on will address without any thought. Ask it to spec auth for a specific design, and all you'll get is: cookie-based login, input validation, password hashing, etc, etc. Which you don't need an LLM for. Nothing like an actual in-depth design. Even asking them to update specs based on discussions is hit or miss.
> A markdown viewer is on the simpler side of things I've built with LLMs, so this too suggests that you have a weak process. A common mistake is to expect LLMs to one-shot everything (the spec, the plan, or the actual impl). Instead you should use LLMs to review-revise-cycle one of those until it's refined, ideally the spec/plan since impl is derived from it. You will have much better and consistent results.
But what you are describing is NOT vibe-coding. I have no doubt I could build the viewer I want (which by the way is not your usual plain vanilla markdown viewer, but one with some very specific features) with LLM assistance. My point is: if you can't even vibe code your way to this specific viewer, how are you supposed to vibe code serious software?
Indeed, the declining quality of Claude Code is, I suspect, testament to the fact that vibe-coding any sufficienly complex piece of software does not work in the long run.
hombre_fatal 11 hours ago [-]
Oh, I see. I'll grant whatever you take vibe-code to mean since that seems to be the hang-up -- vibe-code prob suggests there's no process at all.
My point is that the planning phase and implementing phase are basically unsupervised, and all the work goes into the planning phase.
Yet I've noticed that over time, I'm not even needed in the planning phase because a simple revision loop on a plan file produces a really good plan. My role is mostly to decide what the agents should do next and driving the revision loop by hand (mostly because it's the best place for me to follow what's happening).
I've been getting really good results, though I've also developed a simple process that ensures that LLMs aren't relying on their model but rather external resources which is critical.
ianhxu 17 hours ago [-]
[dead]
mfro 1 days ago [-]
While I think the author is entirely right about 'natural language programming' in the current day, if LLMs (or some other AI architecture) continue to improve, it is easy to believe touching code could become unnecessary for even large projects. Consider that this is what software co. executives do all the time: outline a high level goal (software product) to their engineering director, who largely handles the details. We just don't yet know if LLMs will ever manage a level of intelligence and independence in open-ended tasks like this. And, to expand on that, I don't know that intelligence is necessarily the bottleneck for this goal. They can clearly tackle even large engineering tasks, but often complaints are that they miss on important architectural context or choose a suboptimal solution. Maybe with better training, context handling, documentation, these things will cease to be problems.
pingou 1 days ago [-]
I have indeed missed the arguments that are so powerful that they dismantles my thesis.
Would there even be a debate in the tech community if such unassailable arguments existed?
The author is entirely entitled to his opinion, just as I am allowed to disagree with him (not sure why I am also downvoted). The good thing is, if I'm right, we will see it in less than 10 years.
fragmede 1 days ago [-]
> they will only get better.
I don't buy that's true. The "only" part, anyway. Look at how UX with software has evolved. This is gonna be an old man yells at clouds take, but before smartphones, there were hotkeys. And man, you could fly with those things. The computers running things weren't as fast as they are today, but you could mash in a a whole sequence thru muscle memory, and just wait for it to complete. Now, you have to poke at your phone, wait for it to respond, poke at it some more. It's really not great for getting fast at it. AI advancement is going to be like that. Directionally generally it will be better, but there's going to be some niche where, y'know what, ChatGPT-4o really had it in a way that 5.5 does not. (Rose colored glasses not included.)
Animats 1 days ago [-]
> they will only get better.
Then came the new Claude update, which many people say is worse. Even Anthropic says it got worse.[1] HN discussion back on April 15th: [2]
Some of this is a pricing issue. Turning "default reasoning effort" down from "high" to "medium" was a form of shrinkflation. Maybe this technology is hitting a price/performance wall.
Claude connected to a Postgres (readonly obviously) and Datadog MCP servers in addition to access to the codebase can debug prod issues so quickly. That’s easily a 10x win compared to a senior engineer doing the exact same debugging steps. IMHO that’s where the actual productivity boost is
1 days ago [-]
marcus_holmes 1 days ago [-]
I was waiting for the "so I tried coding something with an LLM myself, and I found..." paragraph. But apparently the author never did try it, or at least if they did, they didn't write about it.
This is a very academic approach to the subject - read what other people have written about it without ever doing it yourself. Study what someone said about LLM coding 50 years ago, before they were even invented, to see what you think about it.
I would strongly suggest to the author that you just give it a go, and see what you think, without the preconception of other people's opinions.
My experience has been remarkable, and, like others, I'm finding real joy in being able to move past the code to actually design and play with whole systems and architectures.
It gets to the essence of code; which is not about the code, but about the system that code implements. Being able to write code in 3 minutes not 30 minutes does not bog us down in review (the LLM is perfectly capable of reviewing code too). It frees us to explore systems and architectures without worrying about the sunk cost of the existing code, or the effort of changing it.
spopejoy 1 days ago [-]
> I was waiting for the "so I tried coding something with an LLM myself, and I found..."
Why? Most of the article was about the productivity of teams.
> This is a very academic approach to the subject - read what other people have written about it
Meta-studies have tremendous value. He's asking a simple question: if LLMs are changing the world, let's look at what studies are showing.
> My experience has been remarkable, and, like others, I'm finding real joy in being able to move past the code to actually design and play with whole systems and architectures
Great! What does that have to do with the age-old problem that software development doesn't scale to teams well? It is indeed a "50 year old problem", so please tell us how LLMs solve it.
marcus_holmes 24 hours ago [-]
I had to go re-read the article to make sure, but it doesn't address teams or scaling to teams at all, so I'm not sure why you're asking about that?
The article is talking about inherent vs accidental complexity, amongst other points, and if the author had actually tried developing with an LLM, they might have worked out how LLM coding does address some of this.
spopejoy 10 hours ago [-]
- The DORA report is about organizations not individuals
- Mythical man-month is about organizations not individuals
- No Silver Bullet: "I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation." Clearly he's NOT talking about the 10x dev building the whole thing themselves, which everybody knows is faster, better, probably doesn't even need a spec. Organizations are who need specs -- they have clients, business people etc. An organization with a single developer moves at light speed -- but this doesn't scale.
Nobody's disputing that LLMs give multiples for certain development tasks. The main thrust of the argument centers on how unimportant coding time is ... for organizations. Coding time is a HUGE lever if you're the one dev building everything, but that's not a repeatable pattern.
marcus_holmes 3 hours ago [-]
Meh, I'll concede that Fred Brooks was mostly writing about developing software within an organisation, and therefore writing about teams.
Coding time is important if it gates experiments and spikes. If you have to work out your architecture on paper because actually coding it up is a serious expense, then it becomes harder to experiment with different designs. In an LLM world where coding time is very cheap, it becomes easier to experiment and try things out. Developing an entire architecture and then abandoning it because it turned out that it didn't scale too well, or couldn't handle some edge cases, is not a major mistake or problem any more. There's no pressure to keep old code because it cost a lot of money to develop. You can spike an entire system, decide that it was a useful experiment, but didn't work, delete the repo, and go get lunch. This is new, and important.
zouhair 1 days ago [-]
The problem I have with it is the price (I am not talking about the money). I don't know if the price is worth it. For example we are literally witnessing the death of the personal computers, it will soon become a rich people's hobby. I don't know how the whole Free Software/Open source will survive that.
At best we will end up not owning nothing, not even the programming skills as everyone will be at the mercy of AI companies for their coding.
We are still in the honey moon phase of AI coding, I have a very pessimistic view of the future.
marcus_holmes 23 hours ago [-]
I'm not sure what LLMs have to do with the death of personal computers? Can you explain, please?
zouhair 20 hours ago [-]
Prices of RAM, GPUs, SSDs and even HDDs are now way out of reach for many people [0]. An SSD I bought 2 years ago at $300 CAD now cost $1K CAD for example and it's not gonna go down any time soon.
This feels like classic economics, though - if the price of something goes up because of demand, then more suppliers enter the market and supply increases.
Also, the AI thing is a bubble, and bubbles burst. Sooner or later all that demand is going to disappear and we'll be oversupplied.
But yes, interesting times indeed.
weakfish 14 hours ago [-]
Are you asking, essentially, to move past the data and evidence and get anecdotes? That seems the opposite of useful, and tbh LLM coding has wayyyy too much anecdotal 'evidence' going on.
marcus_holmes 3 hours ago [-]
I'm not sure what "the evidence" is in this case?
I mean, we have lots of people using LLMs to write software in different ways, as we explore this space. I don't really see how "the evidence" can be different from "anecdotes" at this stage of the exploration?
There have been a couple of studies done on LLM-assisted dev vs non-LLM-assisted dev, but the author doesn't cite them.
kelnos 1 days ago [-]
>> Within just this group the ratios between best and worst performances averaged about 10:1 on productivity measurements and an amazing 5:1 on program speed and space measurements!
> (although I’m personally skeptical of the “10x programmer” concept, the software industry overall does seem to accept it as true)
To be fair, this statement from Brooks doesn't entirely match with the "10x programmer" we talk about. My take on it is when someone says "10x programmer" today, they mean 10x more productive than the average, not 10x more productive than the worst. Brooks' statement is about the latter. If he'd looked at the difference between average and best, I would assume you'd get something more like a 2x or 4x programmer.
leptons 1 days ago [-]
There's no such thing as a "10x" programmer, and anyone who uses it doesn't know what they're talking about.
10x relative to what exactly? It's not a statement grounded in any kind of reality.
bonesss 9 hours ago [-]
There are 10x developers, your assertions are overly broad and unanchored from the insights large developer stables can create.
10x relative to <drumroll> other developers. Developers in the same place doing the same-ish things, only on metrics and outcomes you objectively have someone(s) making outputs that are outstripping teams and all their players.
Not 10x LoC, 10x problem to solution time, maintenance costs, and time/cost to create. At everything always? No, at relevant things. In fact, some of those nerds arguably go past that by being able to solve things the -4x to 5x’ers cannot. Any fulfillment is infinitely faster than non-fulfilment.
I’ve worked with several and have seen their projects numbers year on year in a large pool. SQL gurus who could get there inconceivably fast because their guruship let them conceive better. Independently created solutions that obviate existing systems and components, and got there >10x cheaper and were >10x cheaper to maintain.
Never outbid someone by planning to do way less, smarter and faster? 10x ain’t that much if the other dudes are average consultant houses.
simonw 1 days ago [-]
10x makes sense only in terms of specific technology platforms.
I'm a 10x programmer at building Django apps compared to a developer who has never worked with Django before.
Someone who developers against WordPress on a daily basis will easily 10x my own attempts at building things on that platform.
randallsquared 16 hours ago [-]
I don't think it's only about specific technologies. I have occasionally worked with someone who was 10x (or more) the average in the org, and it wasn't just producing new code: it was debugging faster, reviewing faster, providing an insight to another dev with a moment's thought that unblocks their whole sprint, and, yes, still producing many times as many PRs as typical. In a modern corporate environment, the main problem is giving such a person enough to do in enough variety so that they don't get bored.
jay_kyburz 1 days ago [-]
I've never understood it to be literal, but from my experience there is a big difference between the folks that show up to work on time, jump right into their work, they pay attention in meetings, know code base, and have the ability "lock in" as my kids say. On the other hand you have folks that show up late, spend all day chatting at the water cooler, get distracted with home stuff, comment on hacker news all day, and only manage to squeeze in a few hours of actual work a day.
hackable_sand 1 days ago [-]
It's a transparent exercise in ego-stroking to justify one's commitment to capital incentives.
atleastoptimal 1 days ago [-]
"LLM's Aren't Going to Fundamentally Change Software Development" Says Increasingly Nervous Man For Seventh Time This Year
slopinthebag 1 days ago [-]
I didn't get the sense that the author is nervous. What I tend to see are people who are nervous that going all-in on LLM workflows might not have the payoff they are expecting, and are becoming increasingly fanatical as a result.
Just one more harness bro. Just one more agentic swarm. Please bro, just one more Claude Max subscription. Please bro.
atleastoptimal 1 days ago [-]
Complaining about every one off issue with LLM's ignores the bigger picture: they are getting better every month and there is no fundamental reason why they wouldn't surpass humans in coding. Everything else is secondary.
All I would need from an LLM doubter is evidence that at tractable software engineering task LLM's are not improving. The strongest argument against the increasing general capabilities of LLM's are the ARC-AGI tasks, however the creators admit that each generation of LLM's exceed their expectations, and that AGI will be achieved within the decade.
wavemode 1 days ago [-]
Your logic is flawed because, a thing can improve for an infinite amount of time while never surpassing a certain limit. It's called an asymptote.
That being said, I don't even think that arguing about this from a mathematical perspective is a worthwhile use of time. Calling something an asymptote in the first place requires defining a quantifiable "X" and "Y", which we don't even have. What we have are a bunch of synthetic benchmarks. Even ignoring the fact that the answers to the questions are known to regularly leak into the training data (in other words, it's possible for scores to increase while capabilities remain the same), there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world. And being able to answer some arbitrary set of arbitrary questions on a benchmark which the previous model couldn't, does not have a quantifiable correlation to some specific amount of real-world improvement.
The OP article focuses on research papers which assess real-world impact of LLMs within software organizations, which I think are more representative.
I wouldn't call myself an "AI doubter" - I use LLMs every day. When you say "doubter" you're not referring to "AI" in general, or the fact that AI is helpful or boosts productivity (which I believe it does). You're rather referring to the very specific, very extraordinary claim, that LLMs will surpass humans in coding. If that's the case then yeah I'm a doubter, at least on any foreseeable timescale.
atleastoptimal 1 days ago [-]
1. There’s no reason to believe AI capability improvement is approaching an asymptote, METR timelines, improvements on benchmarks, ARC-AGI are all at least linear
2. Even if it were asymptotic, it would be a huge assumption to assert that the asymptote is below general human intelligence, like human pattern recognition and cognition is some sort of universal limit like c
Also if LLM’s weren’t really getting better in general but just benchmaxxing, then it would be extremely lucky that this also happens to be leading to a general increase in coding capabilities that have been observed in more recent models.
AI has already surpassed 99% of humans in coding in narrow domains. The question is, how wide does the domain have to be before models no longer ever surpass humans? I’d wager we’d have to wait until scaling of compute infrastructure stops, wait 6 months, then see.
handoflixue 1 days ago [-]
> Your logic is flawed because, a thing can improve for an infinite amount of time while never surpassing a certain limit. It's called an asymptote.
> there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world
Again, yes, you're correct in the general case but it has very little to do with the specific case.
Would you find it convincing if I simply said "some internet arguments are wrong"? It's certainly a true statement, and you've made an internet argument here, so clearly you should accept that you're wrong, right?
wavemode 1 days ago [-]
You're scoring rhetorical points while talking past my entire comment. Hard to say if you even read it.
I'm not "convincing" anyone of anything. I'm stating the reasons that I, personally, am unconvinced of a specific claim being made to me.
handoflixue 15 hours ago [-]
I mean, I quoted multiple passages and established why I think your logic is flawed. If you're convinced by bad logic, so be it.
wavemode 5 hours ago [-]
If you read my entire comment and thought that showing me a benchmark chart remotely addresses the point I'm making, well... I don't know what to tell you.
aspenmartin 1 days ago [-]
You say this as though performance has not followed a very clear and extremely rapid improvement in a startlingly short amount of time.
You’re definitely right that people adopt agentic workflows and are disappointed or worse, but the point is the disappointment has already reduced substantially and will continue to do so. We know this because we know the scaling laws, and also because learning theory has been around for many decades.
strange_quark 1 days ago [-]
What rapid improvement has occurred, because in this six month AI coding fever dream we've been living in, I really haven't seen anything new in awhile, both in terms of new ideas for AI coding or in new consumer products or services.
I'll give you the coding harnesses themselves are better because that was a new product category with a lot of low-hanging fruit, but have the models actually improved in a way that isn't just benchmaxxing? I'd argue the models seem to be regressing. Even the most AI-pilled people at my company have all complained that Opus 4.7 is a dud. Anecdotally, GPT 5.5 seems decent, but it's rumored to be a 10T parameter model, isn't noticeably better than 5.4 or 5.3, is insanely expensive to use, and seems to be experiencing model collapse since the system prompt has to beg the thing to not talk about goblins and raccoons.
jatora 1 days ago [-]
Uninformed opinion of someone who clearly doesnt consistently use AI coding tools, clearly. And why are you limiting it to 6 months? Whats wrong with you?
weakfish 14 hours ago [-]
Why does this _always_ happen in agentic coding convos?
> I don't find $MODEL useful
> CLEARLY you're doing it wrong
It's so dumb.
(I write code w/ agents btw, I'm just also skeptical)
zapataband1 1 days ago [-]
How many years of real-life, in-production problem solving/coding have you done? That's what I base how informed you are not how much you use your favorite new $100/month token-prediction subscription
jatora 1 days ago [-]
15 years. But that's irrelevant to this point. The person im replying to clearly doesnt use the tools if they think there hasnt been constant improvement. "token-prediction subscription" is funny, coming from a glorified biological token predictor
slopinthebag 18 hours ago [-]
I'm starting to think the AI maxis are just misanthropes.
zapataband1 22 hours ago [-]
ah yes another feeble fool that thinks his 100$ subscription is equivalent to 400 billion years of evolution simply because he is stupid and watches a lot of scifi.
jatora 21 hours ago [-]
Nope not at all, but it's most certainly superior to the tokens your neural net outputs
viking123 19 hours ago [-]
say that you are alive
"i am alive"
OH MY GOD!!
1 days ago [-]
aspenmartin 1 days ago [-]
I’m going to parrot back what you’re saying and you tell me if I’m getting close
- AI coding is a disappointing fad (“fever dream?”).
- that has not made meaningful progress in…6 months?
- coding harness is improving
- model improvements are lies: it’s just businesses “benchmaxxing” and misleading people. Real performance has not meaningfully improved
- “opus 4.7 is a dud”
- 5.5 suffering from “system collapse” (I’ve never heard this term before)
Since you asked and I assume you are rational and really are interested to know:
- we have many measures of performance and have studied how one particularly important but unintuitive measure (pertaining perplexity) scales with data, compute, and model size. These laws continue to hold and have satisfying theoretical origins.
- whatever the scale of 5.5, consider we have far more room to go on the scaling front. Probably another 2-3 orders of magnitude before we hit limiting bottlenecks.
- that’s also fine because scaling is only part of the puzzle. RL on verifiable rewards is virtually guaranteed to get you optimal performance and that’s the entirety of the excitement around coding agents
- while you are right about benchmarks and measurement science having a ton of weaknesses, they are not at all garbage. There are probably around 40,000 benchmarks in the literature (this is not a made up number by the way it really is around that many). Epoch made a great composite measure using good stats (IRT) called their epoch capability index, METR has done and redone their time horizon measure and it holds up beautifully. There is a ton of signal in many benchmarks and they all tell a pretty compelling story.
- additionally, this is not some unknowable thing. It strikes me as odd that people’s prior on HN a lot of time is “it’s all dumb rich people putting way too much dumb money in this”. Sorry but the world is not that dumb. Trillions of CapEx is usually pretty rationally allocated. And it is!
- why? Because this is already known what happens when you do what we’re doing. When you have a verifiable reward system, have a certain amount of compute available, have seed data to get you to where you can do RL, you will be almost guaranteed to get superhuman performance
jatora 21 hours ago [-]
I'm pretty sure their mindset is pure cope. All top AI labs are agentically coding 100% now. There's a reason for that. Anyone not on that paradigm yet is either slow acting or purposefully resistant. (excluding workplace policies that hamstring you of course)
aspenmartin 14 hours ago [-]
Yea that’s what I just can’t wrap my mind around. It’s a cacophony of engineers with authoritative sounding blog posts explaining a subject they seem to have barely a tenuous grasp on. It’s hard to watch a population of tech people I used to really revere getting things so wrong. I thought “surely once we’re <literally where we are today which is what you describe> no one with any self respect would still claim AI is a useless fad or that it shouldn’t be used” and yet to my disappointment that’s where we seem to be.
cyclopeanutopia 1 days ago [-]
Perhaps you are confusing performance with instability?
aspenmartin 1 days ago [-]
No. Time horizon I’m talking about spans years. “We don’t know” is just wrong, we’ve had scaling laws for many years and they continue to hold up. Benchmarks, in all their ugliness, tell a consistent story.
paganel 1 days ago [-]
> very clear and extremely rapid improvement in a startlingly short amount of time.
We're almost 6 months into all this AI-code madness and I've yet to see that "rapid improvement" you mention. As in software products that are genuinely better compared to 6 months ago, or new software products (and good software products at that) which would have not existed had this AI craze not happened.
aspenmartin 1 days ago [-]
Way more than six months. You may be talking about how the world looks from your vantage point, as well you should. But there’s a reason why the world doesn’t allocate trillions of dollars of capital based on that.
I really value skeptical people and skepticism generally. But what I think skeptical people would prefer to consider themselves is: rational and reasonable, with their beliefs well calibrated.
You’re not the only one to think that literally nothing major or significant has happened with AI but that’s simply wrong. Every major tech company - the ones poised to get the first best rewards, have already gotten good incremental revenue from AI via ads ranking/recommendations (Google, Meta, etc.), good productivity increases due to scale of workforce and advanced in house tooling. You won’t see these numbers and you don’t have to believe them. But I have seen them and I believe them, and I, like you, hate bullshit.
ThrowawayR2 24 hours ago [-]
Classic argumentum ad populum fallacy. The world allocated the equivalent of trillions to the dotcom bubble shortly before it became the dotcom bust, mortgage CDOs before the 2008 debt crisis, and the cryptocurrency mania before its bubble popped. The world has allocated vast sums of money to rather stupid things many, many times in the past.
brazukadev 24 hours ago [-]
> Every major tech company - the ones poised to get the first best rewards, have already gotten good incremental revenue from AI via ads ranking/recommendations (Google, Meta, etc.)
That's just software evolving. It happened before LLMs, it would happen without LLMs.
> good productivity increases due to scale of workforce and advanced in house tooling.
Exactly same case.
aspenmartin 14 hours ago [-]
But I don’t really understand: the ask is for evidence AI is generating meaningful returns and it demonstrably is, even while we have integrated these tools only partially. “Just software evolving” um yes, I agree, just that now this happens faster and more efficiently. It is also more than that: models that power advertising and content recommendation at TikTok, Google, Facebook, Instagram, etc are not just “software evolving” it is meaningful improvements to models that are only possible with good AI.
brazukadev 13 hours ago [-]
Yes, it is meaningful improvements but AI changes working profitable software. It is more difficult to create exponential value from that compared to new platforms like the internet or mobile. Git and then github, for example, have a much bigger impact on increasing software development productivity than AI, with a fraction of the investment.
aspenmartin 12 hours ago [-]
Are you saying AI is tackling the wrong bottlenecks? I’m not sure what you mean by “AI changes profitable software”. Maybe you mean: AI will not create something new, only do the existing things we do?
I agree the foundations: git, GitHub, compilers, etc. are arguably are “a fraction of the price” and today they have arguably more impact (though not sure by which measure). But literally since January we have been rolling out our replacements, I don’t really see how that wouldn’t be an earth shattering impact. You talk about GitHub and that’s fine but ignore the fact that huge swaths of the profession aren’t even directly using any of these tools anymore.
I’m not sure what you imagine the promise of AI to be, and without that I can’t really be specific in any refutation I would just say coding is only the beginning. It is the most powerful and also the easiest thing to solve first. Improved coding performance also improves generalization and performance on non-coding tasks, so that’s a nice bonus, and we’re maybe 5 years away from decent embodied systems which after an inflection point of consumer adoption will quickly get better via data flywheels and on policy learning. Basically there are very few bottlenecks that will not be touched.
handoflixue 1 days ago [-]
Can you name literally any other technology that had hundreds of millions of users within the first six months of being invented?
Six months after the internet was invented, you could send email between a few universities.
Six months after the computer was invented, they still hadn't actually built one.
The first transcontinental railroad, took about six YEARS just to build.
nothinkjustai 23 hours ago [-]
GPT did not have hundreds of millions of users when it was invented almost a decade ago…
handoflixue 15 hours ago [-]
If you want to move the goal posts, that's fine, acknowledge it: the original claim I'm responding to was "We're almost 6 months into all this AI-code madness"
If you want to set GPT as the target, that's even easier! In that decade it has passed the Turing Test, solved novel open math problems, generates audio, video, and music, and can write coherent code. Again, there is no technology that has improved more rapidly than LLMs.
nothinkjustai 10 hours ago [-]
I’d say the internet did, since it literally connected people across the globe in real time, which actually provided the technology that allows LLMs and other similar tech to exist in the first place.
I think it’s pretty clear the internet has had 10x the impact of LLMs so far. Maybe 100x
aspenmartin 8 hours ago [-]
The internet has been around since the 80s…ChatGPT came out 4 years ago. The internet took decades to build out the infrastructure. Inflation adjusted capex for AI infrastructure already far surpasses that of the internet. You’re talking about a technology that doesn’t just make things easier it replaces entire swaths of work. Under some weird measurement you may be right but I mean cmon.
leptons 1 days ago [-]
You say this as though AI company debt has not followed a very clear and extremely rapid ballooning in a startlingly short amount of time.
It's the "YOLO" of business strategies.
aspenmartin 1 days ago [-]
Always amazes me how we’re on a platform with “ycombinator” in the url and people don’t understand how private companies scale to capture market share. You’re right Uber was that company that ran at a loss for so long and collapsed, another YOLO business strategy. Or maybe it was Amazon or…hmm I forget
leptons 9 hours ago [-]
I can't afford to take Ubers anymore. A trip that used to cost $7 now costs $40. AI is going to be the same to cover all the massive amounts of money already spent. You like your $200/mo plan now? How about when it's $2000/mo?
aspenmartin 8 hours ago [-]
You are not at all wrong. Also get ready for the insidious advertising!
slopinthebag 1 days ago [-]
Yes but we don't know the shape of the curve and where we are on it.
aspenmartin 1 days ago [-]
See chinchilla scaling laws, we have the functional form of the curve and know the constants (though they change and are domain and model specific):
L(N,D) ~= 1.69 + 406 / N^0.339 + 411 / D^0.285
L is loss (pre training test loss)
D is the scale of the data
N is the number of model parameters
slopinthebag 18 hours ago [-]
You need to touch grass dude, seriously.
aspenmartin 14 hours ago [-]
Why deflect from the conversation and attempt to insult someone? What I’m saying is literally canonical and extremely well known literature.
1 days ago [-]
1 days ago [-]
1 days ago [-]
dabedee 1 days ago [-]
It was a welcome change to have a deliberate, well thought, and well-written article that tries to bring readers through a rational journey. Thank you
ilia-a 1 days ago [-]
Even without writing code LLMs are a huge help, analyzing code, doing code reviews, documenting code, etc... Even without writing a line of "code" LLM hugely speed up development and take away the annoying/boring work.
nijave 1 days ago [-]
Been using Claude Code for cost ops and reporting at work and it's saved an insane amount of time. I can generate a report in 10-15 minutes that would have taken 2-3 days of scripting/SQL and CC can even spit out a script to repro later.
It's not terribly hard to check either. You can do some spot checks with cost dashboards in AWS, Datadog, etc and see if the numbers line up
Can also tell Claude "go right size the environment, pull p95 usage metrics for the last 3 months" and a couple hours later, a bunch of money is saved. Much easier than manually pulling trend data and also easier than installing/configuring/managing tools that do it for you.
riknos314 1 days ago [-]
In pretty much every case where I've previously thought "I wish we had a tool for this but I can't get the time funded to make it" I now just get ai to work on the tool in the background and check in on it whenever I have a few minutes of deadtime before / after meetings.
The benefits of the time savings of having progressily better tooling over time add up quickly.
zapataband1 1 days ago [-]
I can read and write code why do I need an LLM to produce something I still have to read and test
furyofantares 1 days ago [-]
Considering some project I'm able to accomplish today by hand, it is much larger than it was 25 years ago, but back then it would have mostly been inherent difficulty and now the work is mostly accidental difficulty. The libraries available have solved almost all of the inherent difficulty for many small (solo-dev) projects, and so for many small projects I already know every I need to know to do it (even though I don't know everything I would have needed to do it 25 years ago.) All that's left is doing it.
In fact, AI might be the opposite of managerial "silver bullet". The more we automate what is repetitive, the less predictability remains overall. Things can get more productive on average but the managing it becomes harder, as productivity amplifies risks.
moritzwarhier 7 hours ago [-]
No, not again. Unless you had a point, which would probably apparent from the headline.
rglover 1 days ago [-]
We are obsessed with fortune telling.
Use the damn thing or don't.
It's that simple.
zapataband1 1 days ago [-]
it would be that simple if a mental-health altering token-predictor wasn't being consistently shoved down our collective throats.
rglover 1 days ago [-]
Fair. But it's worth considering that anything "they" need to shove is a tell that they need you more than you need them. The sky-hath-fallen narrative is just what top-dollar marketing gets you these days. Clever, but mostly bark.
mwaddoups 1 days ago [-]
This was a great read - thanks so much for taking the time to write this. Well researched and thought provoking. Long live the em dash.
exographicskip 12 hours ago [-]
I've been a fan of the em dash since college. Only recently learned what the shortcut on macos is (opt-shift-dash) but setup text expansion a decade ago.
Think we need better AI tells than that
andai 1 days ago [-]
On my device the article displayed 5 words per line, so I switched to "Desktop site" in the hamburger menu which made it much more readable.
(Sorry for bikeshedding, but you can't discuss an article if you can't read it.)
jwpapi 1 days ago [-]
Its the biggest swindle...
You could fetch some unfinished github repos or download free templates. It’s actually faster than LLMs, still no body would do it.
I don’t start my project with the ecommerce nextjs starter repo. I build it from scratch, because it’s faster...
empthought 1 days ago [-]
> But ultimately, the only situation in which LLMs could meaningfully democratize access to software development is one where they achieve a true silver bullet, by significantly reducing or removing essential difficulty from the software development process.
The author didn't seem to read the Brooks essay for comprehension. There is an entire section about expert systems that foreshadows agents. While there is no singular silver bullet, Brooks explores the most promising techniques to reduce essential complexity that were anticipated in 1986.
> The most powerful contribution of expert systems will surely be to put at the service of the inexperienced programmer the experience and accumulated wisdom of the best programmers. This is no small contribution.
Furthermore, his objection to automatic programming was simply an argument from incredulity, which is an understandable opinion at the time, yet quite vacuous in hindsight.
1 days ago [-]
slopinthebag 1 days ago [-]
I really enjoyed this article, it's well written and does a good job of dismantling the flawed arguments by language model maxis' while presenting a more realistic outlook on where we are now and where we are going.
I think the biggest benefit language models have provided me is in the auxiliary aspects to programming: search, debugging, rubber ducking, planning, refactoring. The actual code generation has been mixed.
I had an LLM try and implement a fairly involved feature the other day, providing it with API spec details, examples from other open source libraries, and plenty of specifications. It's also something readily available in training data as well, but still fairly involved.
On first glance it looked great, and had I not spent the time to investigate deeper I would have missed some glaring deficiencies and omissions that render its implementation worthless. I am now going back and writing it by hand, but with language models providing assistance along the way, and it's going much better.
I think people are being unrealistic by thinking that the usage of language models in their side projects represent something broader. It's almost the perfect situation for language models: small, greenfield code bases, no review, no responsibility, and no users. It goes up on GitHub with a pretty readme, and then off to social media where they post about how developers are "cooked". It's just not a very realistic test.
In the end we will probably see large productivity increases by integrating language models, but they won't be replacing developers but rather augmenting them.
trwhite 1 days ago [-]
A well researched and written piece
1 days ago [-]
senko 1 days ago [-]
The accidental vs essential difficulty argument ignores the fact that you can abstract away (some) essential difficulty if you're willing to take a performance hit.
Design patterns in an older (programming) language become core language features in a newer one. As we internalize and abstract away the best patterns for something, it becomes accidental but it's only obvious in retrospect.
The article quotes Brooks (quoting Parnas) about just that (later, in context of LLMs):
> automatic programming always has been a euphemism for programming with a higher-level language than was presently available to the programmer. [...] Once those accidents have been removed, the remaining ones are smaller, and the payoff from their removal will surely be less.
Considering this was written when C was the hot new stuff, let's compare the ability to code a CRUD web app in Python/Django vs C. What Brooks and Parnas are saying that Python/Django cannot bring big improvements in building a CRUD web app when compared to C because they can only make it easier to program, reducing accidental complexity. But we've since redefined "accidental" and I would argue that you can write a CRUD web app in Python/Django at least 100x faster than in C (and probably at least 100x more secure), although it may take 1000x as more CPU and RAM while running.
So "we removed most of the accidental difficulties and the most that remains is essential" is a kind of "end of history" argument.
> I’d be surprised if there’s even a doubling of productivity still available from a complete elimination of remaining accidental difficulty.
It's good that this statement has a conditional subjective guard, because that's just punditry.
> LLM coding does not represent a silver bullet
Here I agree with the author completely, but probably not for the same reasons. The definition of "silver bullet" the article uses (quoting Brooks):
> There is no single development, in either technology or management technique, which by itself promises even a single order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.
AI-assisted development is not a single technique, the same way "devops" or "testing" or "agile" is not a single technique. But more importantly, I agree it will take time to find best practices, for the technology change to slow down, and for the best approaches to diffuse across the industry.
The article's conclusion:
> You should be adopting and perfecting solid foundational software development practices like version control, comprehensive test suites, continuous integration, meaningful documentation, fast feedback cycles, iterative development, focus on users, small batches of work… things that have been known and proven for decades, but are still far too rare in actual real-world software shops.
These are great and I'm gonna let him/her finish, but it's curious actual coding isn't mentioned anywhere. The author doesn't suggest "polish your understanding of C pointer semantics" or "Rust ownership model" or "Django ORM" or to really, deeply, understand B-trees. Looks like pedestrian detailes like those are left as an excercise for the reader ... or the reader's LLM.
rambojohnson 1 days ago [-]
that's all we been doing now.
cpharsh410 1 days ago [-]
[flagged]
lacymorrow 22 hours ago [-]
[dead]
AIorNot 1 days ago [-]
the problem with this article is that he is right of course, but only right now. There is no reason to believe that future AI platforms won't be able to review code themselves and manage some aspects of themselves with minimal human oversight, yes we likely will always need a few experts
We're done. I for one welcome our new AI Overlords, or more accurately still welcome the tech bro billionares who are pulling the strings
frizlab 1 days ago [-]
> There is no reason to believe that future AI platforms won't be able to review code themselves and manage some aspects of themselves with minimal human oversight
There are, IMHO, fewer reasons to believe they will be able to do that rather than not, though.
CamperBob2 1 days ago [-]
LLMs became much better at both reviewing and writing code over the last 12-18 months. Did you?
The current state of the art is irrelevant. Only the first couple of time derivatives matter.
paulhebert 1 days ago [-]
> Did you?
I would say I got better at both of those over the last 12-18 months. Are your skills static?
eiekeww 1 days ago [-]
My brain got better at thinking deeper when I stopped using llms.
Lmao why does it seem outlandish to other people? Perhaps they never thought too deeply in the first place to recognise it.
CamperBob2 1 days ago [-]
Compared to Claude or GPT 5.5? Yeah, my skills are static relative to the progress seen recently. So are yours, unless your grandpa was named von Neumann or Szilard.
slopinthebag 1 days ago [-]
> There is no reason to believe that future AI platforms won't be able to review code themselves and manage some aspects of themselves with minimal human oversight
Really? That's like someone during an economic boom saying "The economy is the worst it'll ever be. There is no reason to expect things to not continue to improve".
pheaded_while9 11 hours ago [-]
That simile breaks down because - unlike the state of the economy - the collective human capacity to understand, design, and produce these systems essentially only goes one way, barring the apocolyptic.
keybored 1 days ago [-]
I have no stake in Fred Brooks. But No Silver Bullet seemed to be taken as gospel on this board. Sufficiently productivity-enhancing technology? Gimme a break man. Maybe you’ll get a 30% boost. Not a 10X boost.
Until recently. dramatic pause
And then AI happened.
taormina 1 days ago [-]
Great! So all of this 10x boosting is visible in which economic indicator?
slopinthebag 1 days ago [-]
Debt.
stackghost 1 days ago [-]
Let's actually not talk about LLMs.
I honestly couldn't force myself to finish yet another blog post about how "we're not yet sure what impact LLMs will have on society" or whatever beleaguered point the author was attempting to make.
"Some random person's take on LLMs" was maybe interesting in 2024. Today it is not even remotely interesting.
There are a gazillion more interesting things happening today that ought to be of interest to the median HN reader. Can we talk about those instead?
jubilanti 1 days ago [-]
I'm confused. If you don't want to talk about LLMs then why didn't you just flag the post and move on? Submit something interesting, upvote and comment on interesting posts, instead of feeding the engagement on this thread.
It sounds like you actually do want to talk about how much you don't want other people to talk about LLMs.
famouswaffles 1 days ago [-]
You're not supposed to flag a post for something like that. Ideally you downvote and move on if you feel that strongly about it. Flagging is meant to be reserved for stuff that breaks the rules or guidelines.
WolfeReader 1 days ago [-]
Stories can't be downvoted.
stackghost 1 days ago [-]
Oh, I definitely flagged the post also.
mettamage 1 days ago [-]
I am an AI engineer and I honestly agree. Talking about LLMs feels like the new crypto, with some nuances (i.e. many innovative things being possible and done with LLMs whereas crypto innovations were… few and far between).
dijksterhuis 1 days ago [-]
it’s felt like the new crypto to me for about 2-3 years now.
i was doing an ML Sec phd a year or two before all this hype took off. i took one of the OG transformer papers along to present at our official little phd reading group when the paper was only a few months old (the details of this might be a bit sketchy here, was years ago now).
now i want nothing to do with the field in any way shape or form. i’m just done.
edit -- i got incredibly angry after writing this comment. pure hatred and spite for all the charlatans and accompanying bullshit.
eiekeww 1 days ago [-]
Sadly investing is all about making money… you should be more pissed at the naive people who have contributed to the effort and in particular those who don’t care about truth, but about cash flow potential.
dijksterhuis 14 hours ago [-]
everyone involved is responsible, just to different degrees.
keybored 1 days ago [-]
Tedious LLM discourse isn’t aimed at AI engineers. It’s doomscrolling fodder for regular programmers.
gizajob 1 days ago [-]
Actually can we not thanks.
cadamsdotcom 1 days ago [-]
> If its two empirical premises—that the accidental/essential distinction is real and that the accidental difficulty remaining today does not represent 90%+ of total—are true, then the conclusion which rules out an order-of-magnitude gain from reducing accidental difficulty follows automatically.
The article goes on to assume there’s no 10x gain to be had but misses one big truth.
Needing to type the code is an enormous source of accidental difficulty (typing speed, typos, whether you can be arsed to put your hands on the keyboard today…) and it is gone thanks to coding agents.
Developers should write their own code and use LLMs to design and verify. Better, faster architecture and planning, pre-cleaned PRs and no skill atrophy or loss of understanding on the part of the developer.
I come in knowing what I need to build and at least one idea or more of how it should be done. I present the problem, constraints, potential solutions, and ask for criticisms and alternatives. I can keep it as broad as possible or I can get more granular like struct layouts, api endpoints, etc. I go back and forth until there's an approach I prefer and then I code that approach.
| it can code pretty well given a very tight and limited scope.
It's wildly better at tight and limited scope than large scale changes but even then I would rather code it myself.
One thing I would like to see is the use of LLMs for smarter semi-manual editing.
While programming I often need to make very similar changes in several places. If the instances are similar enough I can get away with recording a one-off keyboard macro to repeat, but if there are differences that are too difficult to handle this way I end up needing to do a lot of manual editing.
It would be nice to see LLMs tightly integrated into the editor so I can do a simple "place the cursor at things like this" based on an example or two. I'm sure more ideas for using LLMs more quickly perform semantic changes you intended are possible, instead of just prompting for a big diff. I feel there's a lot more innovation possible in this direction, where you're still "coding it yourself" but just faster.
What I did was make one commit by hand (involving multiple files), and then told Codex (last year's Codex!) to make the equivalent changes to other instances in the code base.
Never understood that argument. Because there’s two steps in design. Finding a good solution (discussing prior art, tradeoffs,…) and then nailing the technical side of that solution (data structures, formula,…). Is it the former, the latter or both?
It does not actually, and not any faster.
Again I've lost count of how many times I've had an in-depth architectural discussion with ChatGPT, with it giving me the final mark of approval ("This is excellent"), only for me to discover a flaw in my approach or a radically simpler and better approach, go back to it with it, and for it to proclaim "Yeah this is a much better approach".
These LLMs are in many cases sycophantic confirmation machines. Yes, they are useful to some extent in helping you refine your ideas and think of edge cases. But they are nowhere close to actually thinking better and faster. Faster in the wrong direction is not just slow, you are actually going backward.
A paradigm shift is an earth shattering, very important change - a complete change in thinking etc. LLMs are not that. They are simply some pretty new tools. Nice tools but they will whip off your metaphorical thumb just as quickly as a miss-used table saw.
You'll note that you mention "engineers are offloading": that's not a paradigm shift. That's a bunch of engineers discovering a better slide rule.
I'm old enough to remember moving on from slide rules (I still have mine) through calculators (ditto) to using fag packets and napkins for their real intended purpose.
The drill-driver also took engineering by storm but no-one ever used the term paradigm shift (to be fair, I don't think it was invented at the time and I can't be arsed to look it up).
If this sounds melodramatic it’s likely that it hasn’t fully taken root where you are yet.
I see opinions split on like “it’s just a dirty untrustworthy tool that is making our lives and the world a living hell” and “this is the second coming of Christ”. The reality is that right now we lie on that first part of the spectrum, but I am looking over the hill and seeing 4 horses and they are stampeding this way
LLMs are next token guessers with knobs on. That is not a paradigm shift.
If she says “I’m sorry, I don’t know how to are you still just a script” then I have my answer. :P
LLMs are remarkable these days but they’re still missing a some essential insight. I’m far less confident now, though, that this will require another big breakthrough and not just a combination of tweaks.
Honestly your posts read like satire. You’re treating LLMs like a religion and the release of Opus 4.6 as like some type of rapture. Idk man, if it’s some sort of bit or false flag thing well played, if not…well good luck.
- obviously LLMs are not a religion I’m using it to illustrate a point
- 5-6 months ago was when agent perf hit a meaningful inflection point where adoption has exploded. It’s why people in this thread reference “the past 6 months” whether or not they realize we’ve been on the same path for years now
So to overextend the metaphor, opus 4.5 was really kind of the right fit for the rapture.
I mean no need to take any of this seriously, I have worked on benchmarks and measurement in an AI lab professionally for over 4 years now, in software and data science for 8 and before got a PhD in Astro, like I’m not some sort of armchair person with no understanding of this field. Though I do find it entertaining when my background in an AI lab is people’s favorite reason to dismiss this :)
I find that when people find stuff like this satirical they often don’t really know the industry or underlying mechanics that well. Not saying that’s you but as ridiculous as I apparently sound to you, do consider that sounds even more ridiculous to not understand the tsunami that is coming right for you…
Engage as a person, please.
And it's literally just a black box that generates more Javascript for their Next.js app
Generally, the whole point of the "Power to the people?" (and to some extent the "On being left behind") section(s) is to underscore the two antithetical claims made by many LLM marketers: 1. LLMs are so powerful and so natural and easy that someone with no experience can create amazing software, and 2. LLM usage is a core skill, one that if you don't begin training now you'll be left behind.
Obviously, both of these can't be simultaneously 100% true--either it's easy enough for the non-programming layperson to successfully generate software for an intentional purpose, or, LLM assisted programming is a skill you need to train to avoid professional obsolescence in modern society. So, the article disagrees with the majority of both claims, and accepts a weakened/minor portion of each: 1. LLM output is easy to generate but accurate prompting matters, and 2. when used for software development professionally, some amount of skilled human intervention does indeed seem necessary. And now these two claims do align.
However, if professional software engineers who work with and read code constantly, armed with the best software practices to aid LLMs we can determine, cannot use modern AI tools without shooting their feet off at relatively frequent rates, certainly you'd expect the layperson who must put an even greater amount of undue faith in the validity of the results to be at extremely high-risk of foot-shooting. It's not "gatekeeping" to forewarn people against unwarranted trust in LLM output, nor is it "gatekeeping" to suggest that modern tech communicators/marketers describing an overly flowery LLM tooling landscape might be doing people a disservice.
I've had this conversation with managers in multiple organizations this year: "Yes, you could totally vibe code that instead of paying for a SaaS. But you have strict contractual and professional obligations about data security. Do you want to be deposed and asked, 'So, did you really just vibe code the system that led to the data leak? Did the vibe coders have any professional qualifications? Did they even look at the code?'"
Similarly, a backend server that handles 8 million users a day is expected to stay up.
Now, there are 10,000 things that have less demanding requirements. I'm actually really delighted that people are able to vibe code their own tools with minimal knowledge of software engineering! We have been chronically underproducing niche software all along.
But if your software already has on-call shifts (and SLAs, etc) like the GP, then I think you want to be smart about how you combine human expertise with LLMs.
It feels like a dunk to write that. But I genuinely do think there's so much motivated reasoning on both sides of this issue, and one signal of that is when people tip their hands like this.
I was going to argue that companies got to choose their own auditors, so of course there were some bad ones out there. But looking at the market, it seems like (1) the race to the bottom has gotten ridiculous, and (2) the insurance companies do not currently trust the auditors in any meaningful way. So, yeah, point to you.
Once upon a time, I went through SOC2 audits where the auditors asked lots of questions about Vault and really tried to understand how credentials got handled. Sure, that was exceptional even at the time.
But that still leaves a whole pile of other audits and regulatory frameworks I need to comply with. Probably most of these frameworks will eventually accept "The code was written by an LLM and reviewed by an actual programmer." I am less certain that you'll be able to get away with vibe coding regulated systems any time soon.
My thing here is: you want to summon some kind of deus ex machina reason why the unpredictability (say) of agent-generated software will fail in the real world, but the concrete one you came up with fails to make that argument, pretty abruptly. Which makes me think the argument is less about the world as it is and more about the world as you'd hope it would be, if that makes sense.
Would you have the same reaction to requiring an approval for a production deployment? That’s driving the development process.
—-
Also jfc I need to cool it with the buzzwords, sorry I just got home from “talk like this all day” $job
It does? You mean "it tests itself faster", which is not really a test now, is it?
Funny, I thought that the major hurdle is improving accuracy and reliability, as it's always been. Engineering is necessary and useful, but it's a much simpler problem, which is why everyone is jumping on it.
I'm sure it was very difficult to program in machine code, but if now (or soon) anyone can just write software using a LLM without any sort of learning it changes everything. LLMs can plan and create something usable from simple instructions or ideas, and they will only get better.
I think LLMs will be (and already are) useful for many more things than programming anyway.
Did you read the section "Power to the People?" ? In it, the author dismantles your thesis with powerful, highly plausible arguments.
1. You don't have to be an LLM expert to get good, consistent results with LLMs.
My best vibe-code process after years of using LLMs is to have Claude Code create a plan file and then cycle it through Codex until Codex finds nothing more to review, then have an agent implement it. This process is trivial yet produces amazing results.
It's solved by better and better harnesses.
2. You don't have to write technical specs. The LLM does that for you. You just tell it "I want the next-tab button to wrap back to the first one" and it generates a technical plan. Natural language is fine.
3. Software that seems to work only to fail down the line in production is already how software works today. With LLMs you can paste the stacktrace or user bug email and it will fix it.
This is why vibe-coding works. Instead of simulating how an app will run in your head looking at its code, you run the app and tell the LLM what isn't working correctly. The app spec is derived iteratively through a UX feedback look.
4. I don't understand TFA's goalposts, but letting people create software that are only interested in the LLM process (rather than the software craftsmanship) would be a huge democratization of software.
> 1. You don't have to be an LLM expert to get good, consistent results with LLMs.
You don't get good consistent results with LLMs, expert or not
> 2. You don't have to write technical specs. The LLM does that for you. You just tell it "I want the next-tab button to wrap back to the first one" and it generates a technical plan. Natural language is fine.
Try this, have Claude write a section in your specs titled "Performance Optimizations" and see the gibberish it will come up with. Fluffy lists with no actually useful content specific to the project. This is a severe problem with LLM-driven speccing I have encountered uncountable times. I now rarely allow them to touch the specs document.
> 3. Software that seems to work only to fail down the line in production is already how software works today. With LLMs you can paste the stacktrace or user bug email and it will fix it.
And pretty soon you have a big ball of mud. But I guess if the rate of bugs accelerate, the LLMs can also "fix" them faster
> This is why vibe-coding works. Instead of simulating how an app will run in your head looking at its code, you run the app and tell the LLM what isn't working correctly. The app spec is derived iteratively through a UX feedback look.
I should tell you about the markdown viewer with specific features I want, that I have wanted to build only with LLM vibe-coding, and how none of them are able to do it.
Why the insult? You never know who you're talking to on HN.
Your points have to do with process failure, not intractable LLM limitations. Most of which already apply to human-conceived software.
Your "Performance Optimizations" bit exemplifies this since you baked in the assumption that it will have no connection with your project. Well, why not? You need to figure out how to use your source code and relevant data as ground truth when working with LLMs.
A markdown viewer is on the simpler side of things I've built with LLMs, so this too suggests that you have a weak process. A common mistake is to expect LLMs to one-shot everything (the spec, the plan, or the actual impl). Instead you should use LLMs to review-revise-cycle one of those until it's refined, ideally the spec/plan since impl is derived from it. You will have much better and consistent results.
I recommend finding an engineer you respect/trust that has found a way to build good software with LLMs, and then tap them for their process.
You said > Your "Performance Optimizations" bit exemplifies this since you baked in the assumption that it will have no connection with your project. Well, why not?
OK, I am talking from experience. Using LLMs for speccing is almost useless above certain complexity levels; what you get is an assemblage of the most average points you can imagine, the kinds of things almost every project in the category you are working on will address without any thought. Ask it to spec auth for a specific design, and all you'll get is: cookie-based login, input validation, password hashing, etc, etc. Which you don't need an LLM for. Nothing like an actual in-depth design. Even asking them to update specs based on discussions is hit or miss.
> A markdown viewer is on the simpler side of things I've built with LLMs, so this too suggests that you have a weak process. A common mistake is to expect LLMs to one-shot everything (the spec, the plan, or the actual impl). Instead you should use LLMs to review-revise-cycle one of those until it's refined, ideally the spec/plan since impl is derived from it. You will have much better and consistent results.
But what you are describing is NOT vibe-coding. I have no doubt I could build the viewer I want (which by the way is not your usual plain vanilla markdown viewer, but one with some very specific features) with LLM assistance. My point is: if you can't even vibe code your way to this specific viewer, how are you supposed to vibe code serious software?
Indeed, the declining quality of Claude Code is, I suspect, testament to the fact that vibe-coding any sufficienly complex piece of software does not work in the long run.
My point is that the planning phase and implementing phase are basically unsupervised, and all the work goes into the planning phase.
Yet I've noticed that over time, I'm not even needed in the planning phase because a simple revision loop on a plan file produces a really good plan. My role is mostly to decide what the agents should do next and driving the revision loop by hand (mostly because it's the best place for me to follow what's happening).
I've been getting really good results, though I've also developed a simple process that ensures that LLMs aren't relying on their model but rather external resources which is critical.
Would there even be a debate in the tech community if such unassailable arguments existed? The author is entirely entitled to his opinion, just as I am allowed to disagree with him (not sure why I am also downvoted). The good thing is, if I'm right, we will see it in less than 10 years.
I don't buy that's true. The "only" part, anyway. Look at how UX with software has evolved. This is gonna be an old man yells at clouds take, but before smartphones, there were hotkeys. And man, you could fly with those things. The computers running things weren't as fast as they are today, but you could mash in a a whole sequence thru muscle memory, and just wait for it to complete. Now, you have to poke at your phone, wait for it to respond, poke at it some more. It's really not great for getting fast at it. AI advancement is going to be like that. Directionally generally it will be better, but there's going to be some niche where, y'know what, ChatGPT-4o really had it in a way that 5.5 does not. (Rose colored glasses not included.)
Then came the new Claude update, which many people say is worse. Even Anthropic says it got worse.[1] HN discussion back on April 15th: [2]
Some of this is a pricing issue. Turning "default reasoning effort" down from "high" to "medium" was a form of shrinkflation. Maybe this technology is hitting a price/performance wall.
[1] https://www.anthropic.com/engineering/april-23-postmortem
[2] https://news.ycombinator.com/item?id=47778035
This is a very academic approach to the subject - read what other people have written about it without ever doing it yourself. Study what someone said about LLM coding 50 years ago, before they were even invented, to see what you think about it.
I would strongly suggest to the author that you just give it a go, and see what you think, without the preconception of other people's opinions.
My experience has been remarkable, and, like others, I'm finding real joy in being able to move past the code to actually design and play with whole systems and architectures.
It gets to the essence of code; which is not about the code, but about the system that code implements. Being able to write code in 3 minutes not 30 minutes does not bog us down in review (the LLM is perfectly capable of reviewing code too). It frees us to explore systems and architectures without worrying about the sunk cost of the existing code, or the effort of changing it.
Why? Most of the article was about the productivity of teams.
> This is a very academic approach to the subject - read what other people have written about it
Meta-studies have tremendous value. He's asking a simple question: if LLMs are changing the world, let's look at what studies are showing.
> My experience has been remarkable, and, like others, I'm finding real joy in being able to move past the code to actually design and play with whole systems and architectures
Great! What does that have to do with the age-old problem that software development doesn't scale to teams well? It is indeed a "50 year old problem", so please tell us how LLMs solve it.
The article is talking about inherent vs accidental complexity, amongst other points, and if the author had actually tried developing with an LLM, they might have worked out how LLM coding does address some of this.
- Mythical man-month is about organizations not individuals
- No Silver Bullet: "I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation." Clearly he's NOT talking about the 10x dev building the whole thing themselves, which everybody knows is faster, better, probably doesn't even need a spec. Organizations are who need specs -- they have clients, business people etc. An organization with a single developer moves at light speed -- but this doesn't scale.
Nobody's disputing that LLMs give multiples for certain development tasks. The main thrust of the argument centers on how unimportant coding time is ... for organizations. Coding time is a HUGE lever if you're the one dev building everything, but that's not a repeatable pattern.
Coding time is important if it gates experiments and spikes. If you have to work out your architecture on paper because actually coding it up is a serious expense, then it becomes harder to experiment with different designs. In an LLM world where coding time is very cheap, it becomes easier to experiment and try things out. Developing an entire architecture and then abandoning it because it turned out that it didn't scale too well, or couldn't handle some edge cases, is not a major mistake or problem any more. There's no pressure to keep old code because it cost a lot of money to develop. You can spike an entire system, decide that it was a useful experiment, but didn't work, delete the repo, and go get lunch. This is new, and important.
At best we will end up not owning nothing, not even the programming skills as everyone will be at the mercy of AI companies for their coding.
We are still in the honey moon phase of AI coding, I have a very pessimistic view of the future.
[0]: https://www.tomshardware.com/pc-components/storage/perfect-s...
This feels like classic economics, though - if the price of something goes up because of demand, then more suppliers enter the market and supply increases.
Also, the AI thing is a bubble, and bubbles burst. Sooner or later all that demand is going to disappear and we'll be oversupplied.
But yes, interesting times indeed.
I mean, we have lots of people using LLMs to write software in different ways, as we explore this space. I don't really see how "the evidence" can be different from "anecdotes" at this stage of the exploration?
There have been a couple of studies done on LLM-assisted dev vs non-LLM-assisted dev, but the author doesn't cite them.
> (although I’m personally skeptical of the “10x programmer” concept, the software industry overall does seem to accept it as true)
To be fair, this statement from Brooks doesn't entirely match with the "10x programmer" we talk about. My take on it is when someone says "10x programmer" today, they mean 10x more productive than the average, not 10x more productive than the worst. Brooks' statement is about the latter. If he'd looked at the difference between average and best, I would assume you'd get something more like a 2x or 4x programmer.
10x relative to what exactly? It's not a statement grounded in any kind of reality.
10x relative to <drumroll> other developers. Developers in the same place doing the same-ish things, only on metrics and outcomes you objectively have someone(s) making outputs that are outstripping teams and all their players.
Not 10x LoC, 10x problem to solution time, maintenance costs, and time/cost to create. At everything always? No, at relevant things. In fact, some of those nerds arguably go past that by being able to solve things the -4x to 5x’ers cannot. Any fulfillment is infinitely faster than non-fulfilment.
I’ve worked with several and have seen their projects numbers year on year in a large pool. SQL gurus who could get there inconceivably fast because their guruship let them conceive better. Independently created solutions that obviate existing systems and components, and got there >10x cheaper and were >10x cheaper to maintain.
Never outbid someone by planning to do way less, smarter and faster? 10x ain’t that much if the other dudes are average consultant houses.
I'm a 10x programmer at building Django apps compared to a developer who has never worked with Django before.
Someone who developers against WordPress on a daily basis will easily 10x my own attempts at building things on that platform.
Just one more harness bro. Just one more agentic swarm. Please bro, just one more Claude Max subscription. Please bro.
All I would need from an LLM doubter is evidence that at tractable software engineering task LLM's are not improving. The strongest argument against the increasing general capabilities of LLM's are the ARC-AGI tasks, however the creators admit that each generation of LLM's exceed their expectations, and that AGI will be achieved within the decade.
That being said, I don't even think that arguing about this from a mathematical perspective is a worthwhile use of time. Calling something an asymptote in the first place requires defining a quantifiable "X" and "Y", which we don't even have. What we have are a bunch of synthetic benchmarks. Even ignoring the fact that the answers to the questions are known to regularly leak into the training data (in other words, it's possible for scores to increase while capabilities remain the same), there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world. And being able to answer some arbitrary set of arbitrary questions on a benchmark which the previous model couldn't, does not have a quantifiable correlation to some specific amount of real-world improvement.
The OP article focuses on research papers which assess real-world impact of LLMs within software organizations, which I think are more representative.
I wouldn't call myself an "AI doubter" - I use LLMs every day. When you say "doubter" you're not referring to "AI" in general, or the fact that AI is helpful or boosts productivity (which I believe it does). You're rather referring to the very specific, very extraordinary claim, that LLMs will surpass humans in coding. If that's the case then yeah I'm a doubter, at least on any foreseeable timescale.
Also if LLM’s weren’t really getting better in general but just benchmaxxing, then it would be extremely lucky that this also happens to be leading to a general increase in coding capabilities that have been observed in more recent models.
AI has already surpassed 99% of humans in coding in narrow domains. The question is, how wide does the domain have to be before models no longer ever surpass humans? I’d wager we’d have to wait until scaling of compute infrastructure stops, wait 6 months, then see.
Have you ever once looked at a METR chart? https://files.civai.org/assets/METR_Chart.jpg
That's not an asymptote.
> there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world
Again, yes, you're correct in the general case but it has very little to do with the specific case.
Would you find it convincing if I simply said "some internet arguments are wrong"? It's certainly a true statement, and you've made an internet argument here, so clearly you should accept that you're wrong, right?
I'm not "convincing" anyone of anything. I'm stating the reasons that I, personally, am unconvinced of a specific claim being made to me.
You’re definitely right that people adopt agentic workflows and are disappointed or worse, but the point is the disappointment has already reduced substantially and will continue to do so. We know this because we know the scaling laws, and also because learning theory has been around for many decades.
I'll give you the coding harnesses themselves are better because that was a new product category with a lot of low-hanging fruit, but have the models actually improved in a way that isn't just benchmaxxing? I'd argue the models seem to be regressing. Even the most AI-pilled people at my company have all complained that Opus 4.7 is a dud. Anecdotally, GPT 5.5 seems decent, but it's rumored to be a 10T parameter model, isn't noticeably better than 5.4 or 5.3, is insanely expensive to use, and seems to be experiencing model collapse since the system prompt has to beg the thing to not talk about goblins and raccoons.
> I don't find $MODEL useful
> CLEARLY you're doing it wrong
It's so dumb.
(I write code w/ agents btw, I'm just also skeptical)
"i am alive"
OH MY GOD!!
- AI coding is a disappointing fad (“fever dream?”). - that has not made meaningful progress in…6 months? - coding harness is improving - model improvements are lies: it’s just businesses “benchmaxxing” and misleading people. Real performance has not meaningfully improved - “opus 4.7 is a dud” - 5.5 suffering from “system collapse” (I’ve never heard this term before)
Since you asked and I assume you are rational and really are interested to know:
- we have many measures of performance and have studied how one particularly important but unintuitive measure (pertaining perplexity) scales with data, compute, and model size. These laws continue to hold and have satisfying theoretical origins.
- whatever the scale of 5.5, consider we have far more room to go on the scaling front. Probably another 2-3 orders of magnitude before we hit limiting bottlenecks.
- that’s also fine because scaling is only part of the puzzle. RL on verifiable rewards is virtually guaranteed to get you optimal performance and that’s the entirety of the excitement around coding agents
- while you are right about benchmarks and measurement science having a ton of weaknesses, they are not at all garbage. There are probably around 40,000 benchmarks in the literature (this is not a made up number by the way it really is around that many). Epoch made a great composite measure using good stats (IRT) called their epoch capability index, METR has done and redone their time horizon measure and it holds up beautifully. There is a ton of signal in many benchmarks and they all tell a pretty compelling story.
- additionally, this is not some unknowable thing. It strikes me as odd that people’s prior on HN a lot of time is “it’s all dumb rich people putting way too much dumb money in this”. Sorry but the world is not that dumb. Trillions of CapEx is usually pretty rationally allocated. And it is!
- why? Because this is already known what happens when you do what we’re doing. When you have a verifiable reward system, have a certain amount of compute available, have seed data to get you to where you can do RL, you will be almost guaranteed to get superhuman performance
We're almost 6 months into all this AI-code madness and I've yet to see that "rapid improvement" you mention. As in software products that are genuinely better compared to 6 months ago, or new software products (and good software products at that) which would have not existed had this AI craze not happened.
I really value skeptical people and skepticism generally. But what I think skeptical people would prefer to consider themselves is: rational and reasonable, with their beliefs well calibrated.
You’re not the only one to think that literally nothing major or significant has happened with AI but that’s simply wrong. Every major tech company - the ones poised to get the first best rewards, have already gotten good incremental revenue from AI via ads ranking/recommendations (Google, Meta, etc.), good productivity increases due to scale of workforce and advanced in house tooling. You won’t see these numbers and you don’t have to believe them. But I have seen them and I believe them, and I, like you, hate bullshit.
That's just software evolving. It happened before LLMs, it would happen without LLMs.
> good productivity increases due to scale of workforce and advanced in house tooling.
Exactly same case.
I agree the foundations: git, GitHub, compilers, etc. are arguably are “a fraction of the price” and today they have arguably more impact (though not sure by which measure). But literally since January we have been rolling out our replacements, I don’t really see how that wouldn’t be an earth shattering impact. You talk about GitHub and that’s fine but ignore the fact that huge swaths of the profession aren’t even directly using any of these tools anymore.
I’m not sure what you imagine the promise of AI to be, and without that I can’t really be specific in any refutation I would just say coding is only the beginning. It is the most powerful and also the easiest thing to solve first. Improved coding performance also improves generalization and performance on non-coding tasks, so that’s a nice bonus, and we’re maybe 5 years away from decent embodied systems which after an inflection point of consumer adoption will quickly get better via data flywheels and on policy learning. Basically there are very few bottlenecks that will not be touched.
Six months after the internet was invented, you could send email between a few universities.
Six months after the computer was invented, they still hadn't actually built one.
The first transcontinental railroad, took about six YEARS just to build.
If you want to set GPT as the target, that's even easier! In that decade it has passed the Turing Test, solved novel open math problems, generates audio, video, and music, and can write coherent code. Again, there is no technology that has improved more rapidly than LLMs.
I think it’s pretty clear the internet has had 10x the impact of LLMs so far. Maybe 100x
It's the "YOLO" of business strategies.
L(N,D) ~= 1.69 + 406 / N^0.339 + 411 / D^0.285
L is loss (pre training test loss) D is the scale of the data N is the number of model parameters
It's not terribly hard to check either. You can do some spot checks with cost dashboards in AWS, Datadog, etc and see if the numbers line up
Can also tell Claude "go right size the environment, pull p95 usage metrics for the last 3 months" and a couple hours later, a bunch of money is saved. Much easier than manually pulling trend data and also easier than installing/configuring/managing tools that do it for you.
The benefits of the time savings of having progressily better tooling over time add up quickly.
Use the damn thing or don't.
It's that simple.
Think we need better AI tells than that
(Sorry for bikeshedding, but you can't discuss an article if you can't read it.)
You could fetch some unfinished github repos or download free templates. It’s actually faster than LLMs, still no body would do it.
I don’t start my project with the ecommerce nextjs starter repo. I build it from scratch, because it’s faster...
The author didn't seem to read the Brooks essay for comprehension. There is an entire section about expert systems that foreshadows agents. While there is no singular silver bullet, Brooks explores the most promising techniques to reduce essential complexity that were anticipated in 1986.
> The most powerful contribution of expert systems will surely be to put at the service of the inexperienced programmer the experience and accumulated wisdom of the best programmers. This is no small contribution.
Furthermore, his objection to automatic programming was simply an argument from incredulity, which is an understandable opinion at the time, yet quite vacuous in hindsight.
I think the biggest benefit language models have provided me is in the auxiliary aspects to programming: search, debugging, rubber ducking, planning, refactoring. The actual code generation has been mixed.
I had an LLM try and implement a fairly involved feature the other day, providing it with API spec details, examples from other open source libraries, and plenty of specifications. It's also something readily available in training data as well, but still fairly involved.
On first glance it looked great, and had I not spent the time to investigate deeper I would have missed some glaring deficiencies and omissions that render its implementation worthless. I am now going back and writing it by hand, but with language models providing assistance along the way, and it's going much better.
I think people are being unrealistic by thinking that the usage of language models in their side projects represent something broader. It's almost the perfect situation for language models: small, greenfield code bases, no review, no responsibility, and no users. It goes up on GitHub with a pretty readme, and then off to social media where they post about how developers are "cooked". It's just not a very realistic test.
In the end we will probably see large productivity increases by integrating language models, but they won't be replacing developers but rather augmenting them.
Design patterns in an older (programming) language become core language features in a newer one. As we internalize and abstract away the best patterns for something, it becomes accidental but it's only obvious in retrospect.
The article quotes Brooks (quoting Parnas) about just that (later, in context of LLMs):
> automatic programming always has been a euphemism for programming with a higher-level language than was presently available to the programmer. [...] Once those accidents have been removed, the remaining ones are smaller, and the payoff from their removal will surely be less.
Considering this was written when C was the hot new stuff, let's compare the ability to code a CRUD web app in Python/Django vs C. What Brooks and Parnas are saying that Python/Django cannot bring big improvements in building a CRUD web app when compared to C because they can only make it easier to program, reducing accidental complexity. But we've since redefined "accidental" and I would argue that you can write a CRUD web app in Python/Django at least 100x faster than in C (and probably at least 100x more secure), although it may take 1000x as more CPU and RAM while running.
So "we removed most of the accidental difficulties and the most that remains is essential" is a kind of "end of history" argument.
> I’d be surprised if there’s even a doubling of productivity still available from a complete elimination of remaining accidental difficulty.
It's good that this statement has a conditional subjective guard, because that's just punditry.
> LLM coding does not represent a silver bullet
Here I agree with the author completely, but probably not for the same reasons. The definition of "silver bullet" the article uses (quoting Brooks):
> There is no single development, in either technology or management technique, which by itself promises even a single order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.
AI-assisted development is not a single technique, the same way "devops" or "testing" or "agile" is not a single technique. But more importantly, I agree it will take time to find best practices, for the technology change to slow down, and for the best approaches to diffuse across the industry.
The article's conclusion:
> You should be adopting and perfecting solid foundational software development practices like version control, comprehensive test suites, continuous integration, meaningful documentation, fast feedback cycles, iterative development, focus on users, small batches of work… things that have been known and proven for decades, but are still far too rare in actual real-world software shops.
These are great and I'm gonna let him/her finish, but it's curious actual coding isn't mentioned anywhere. The author doesn't suggest "polish your understanding of C pointer semantics" or "Rust ownership model" or "Django ORM" or to really, deeply, understand B-trees. Looks like pedestrian detailes like those are left as an excercise for the reader ... or the reader's LLM.
I'm reminded of this scene from the Matrix: https://www.youtube.com/watch?v=cD4nhYR-VRA where the older wise man discusses societies reliance on AI
"Nobody cares how it works, as long as it works"
We're done. I for one welcome our new AI Overlords, or more accurately still welcome the tech bro billionares who are pulling the strings
There are, IMHO, fewer reasons to believe they will be able to do that rather than not, though.
The current state of the art is irrelevant. Only the first couple of time derivatives matter.
I would say I got better at both of those over the last 12-18 months. Are your skills static?
Lmao why does it seem outlandish to other people? Perhaps they never thought too deeply in the first place to recognise it.
Really? That's like someone during an economic boom saying "The economy is the worst it'll ever be. There is no reason to expect things to not continue to improve".
Until recently. dramatic pause
And then AI happened.
I honestly couldn't force myself to finish yet another blog post about how "we're not yet sure what impact LLMs will have on society" or whatever beleaguered point the author was attempting to make.
"Some random person's take on LLMs" was maybe interesting in 2024. Today it is not even remotely interesting.
There are a gazillion more interesting things happening today that ought to be of interest to the median HN reader. Can we talk about those instead?
It sounds like you actually do want to talk about how much you don't want other people to talk about LLMs.
i was doing an ML Sec phd a year or two before all this hype took off. i took one of the OG transformer papers along to present at our official little phd reading group when the paper was only a few months old (the details of this might be a bit sketchy here, was years ago now).
now i want nothing to do with the field in any way shape or form. i’m just done.
edit -- i got incredibly angry after writing this comment. pure hatred and spite for all the charlatans and accompanying bullshit.
The article goes on to assume there’s no 10x gain to be had but misses one big truth.
Needing to type the code is an enormous source of accidental difficulty (typing speed, typos, whether you can be arsed to put your hands on the keyboard today…) and it is gone thanks to coding agents.