Rendered at 11:58:19 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
cobolcomesback 20 hours ago [-]
This “mandatory meeting” is just the usual weekly company-wide meeting where recent operational issues are discussed. There was a big operational issue last week, so of course this week will have more attendance and discussion.
This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
davidclark 20 hours ago [-]
The article claims:
>He asked staff to attend the meeting, which is normally optional.
Is that false? It also discusses a new policy:
>Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Is that inaccurate? It is good context that this is a regularly scheduled meeting. But, regularly scheduled meetings can have newsworthy things happen at them.
djb_hackernews 15 hours ago [-]
When an SVP asks you to do something in a mass email, it's very much optional. Dave Treadwell is an SVP, his org is likely in the 10's of thousands, there is no way to even have a mandatory meeting for that many people.
My SVP asks me to do things all the time, indirectly. I do probably 5% of them.
MikeTheGreat 13 hours ago [-]
> org is likely in the 10's of thousands, there is no way to even have a mandatory meeting for that many people.
Ok, this is pretty off-topic, but is this still true? I get that you can't have 10K people all actively participate in the meeting at the same time, but doesn't Zoom have a feature where you can broadcast to thousands and thousands?
Doesn't X/Twitter have a feature like this? (Although, to be fair, the last time I heard about that it was part of a headline like "DeSantis announcement of Presidential run on X/Twitter delayed for hours as X/Twitter's tech stack collapses under 200K viewers")
But still - nowadays it seems like it should be possible to have 10K employees all tune in at the same time and then call it a meeting, yes?
hibikir 13 hours ago [-]
Yes, but at that point it's an all-hands presentation, and you are basically doing a very careful presentation, thinking about every minute, because of how many hours the "meeting" is costing you.
Very different from the typical weekly/montly outage meeting, where discussion is actually expected, instead of being a ritual.
helsinkiandrew 5 hours ago [-]
> but doesn't Zoom have a feature where you can broadcast to thousands and thousands?
They have webinar/event support for 5000+ participants, viewers can raise hands/use chat feedback for questions etc. and the meeting host can invite people to be visible.
sheept 13 hours ago [-]
The meeting isn't the hard part—after all, shareholder meetings have huge audiences too. Enforcing mandatory attendance for myriads of employees is the hard part, so it's more likely mandatory in name only.
javcasas 15 hours ago [-]
With tens of thousands in a meeting, cracking a 30-second stupid joke is probably costing several thousand dollars.
hyperpape 14 hours ago [-]
Right, but if you say something essential in a meeting with 10 people and it has to percolate through five levels of management to reach the front-lines and gets watered down, that could be much more lost, even millions.
Scale cuts both ways.
What matters isn't how big the meeting is, it's how important the material is, and how well presented it is.
wolvoleo 13 hours ago [-]
I don't think I've ever heard a top leader say anything essential in such a meeting. The stuff they work on is not related to my job at all. It's all gartner level strategy stuff. In our company they do take time talking about it in large calls but it's always boring and never relevant. And a lot of political spin you have to poke through to see the real message.
If I ever attend it just put it on mute and look at the slides while I do some real work. That way my attendance gets registered and it doesn't stress me out later with too much stuff left hanging.
That percolation is also translation of what they say to things that are relevant at my level. Like what we will be working on next year, if there's going to be bonus or job losses.
I couldn't give a crap about the company's strategy as a whole and that's not my job anyway. Why should I. I'm not here because I believe in some holy mission. I just wanna do something I like and get paid.
hyperpape 13 hours ago [-]
Most of those meetings are pretty damn fluffy. No one goes back to their desk and does anything different because they've introduced new company values and the acronym is S.M.I.L.E.
But this meeting is a course correction for how they're using AI, which is a huge initiative. He'll be trying to sell the right balance of "keep using the technology, but don't fuck anything up."
Too cautious, everyone freezes and there's a slowdown[0]. Too soft, everyone thinks it's "another empty warning not to fuck up" and they go right back to fucking everything up because the real message was "don't you dare slow down." After the talk, people will have conversations about "what did they really mean?"
[0] If you hate AI, feel free to flip the direction of the effect.
wolvoleo 12 hours ago [-]
Well this is the main problem with AI right now isn't it? How to use it successfully without having it fuck up.
How are they expecting some juniors to do this when the industry as a whole doesn't know where to begin yet?
Like that Meta AI expert who wiped her whole mailbox with openclaw. These are the people who should come up with the answers.
Ps I mostly hate AI but I do see some potential. Right now it feels like we're entering a fireworks bunker looking for a pot of gold and having only a box of matches for illumination.
What we need to know from management is exactly what you mention. Do we go all out and accept that shit will hit the fan once in a while (the old move fast and break things) or do we micromanage and basically work manually like old. And that they accept the risk either way. That kind of strategy is really business leader kind of work. Blaming it on your techs when it inevitably goes wrong is not.
Because the tech as it is right now is very non-deterministic. One day it works magic and the next day it blows up.
And yes that SMILE thing was a good example. Been in too many of those time wasters.
swader999 14 hours ago [-]
It's worth 10x that because they are all AI powered super devs now /sarc
tmoertel 13 hours ago [-]
Unless that 30-second stupid joke is what gets the audience to take your request seriously. Sometimes people will help you when you don't come across like a self-interested corporate tool.
encom 12 hours ago [-]
I have never in my long life heard a joke from upper management during a meeting/presentation that wasn't awkward and cringe. Just get to the point - tell us how many people are getting fired, so the people who aren't fired can get back to work, and you go back to running this company into the ground.
Sorry, I got flashbacks...
FuckButtons 14 hours ago [-]
If you assume everyone is making 100k it only takes 20 people in a meeting for it to cost 1k.
airstrike 13 hours ago [-]
Wasn't it Shopify who had a system for tracking how much each meeting cost based on attendees? I may be misremembering the company though
LPisGood 11 hours ago [-]
I was thinking about this in recent weeks and I think I’ve actually changed my mind on it.
It’s not really possible to measure how much it would cost to not have a meeting, and I think it’s pretty obvious that if there were no meetings ever, it would hurt a company a lot
airstrike 11 hours ago [-]
Yeah, I agree it's a silly metric. But it's kinda also a good reminder that meetings do have a cost associated with them, so they should stay short, focused, and held only when necessary.
"This could have been an e-mail" should never need to be said.
tibbar 15 hours ago [-]
i think closer to tens-of-thousands-of-dollars, by my napkin math!
RealityVoid 14 hours ago [-]
Worth it!
ljm 13 hours ago [-]
Is that because you delegate or descope?
Why is an SVP doing this if it's just gonna be ignored?
hnguyen1412 11 hours ago [-]
are you saying SVP’s words are not important and should be ignored? This is not what I remember back in the day when Bezos sent his email with a question mark (or maybe !)
messh 13 hours ago [-]
so.... is RTO optional
skeeter2020 19 hours ago [-]
That's not really what the headline attempts to communicate though. It specifically emphasizes "Mandatory" and "AI breaking things". Nobody was going to click on "Regularly scheduled Amazon staff meeting will include discussion on operational improvement"
ceejayoz 16 hours ago [-]
> He asked staff to attend the meeting, which is normally optional.
If I get a note from my boss like that, I consider it mandatory.
mock-possum 6 hours ago [-]
Yeah I don’t understand why people are pretending not to understand this -
> He asked staff to attend the meeting, which is normally optional.
Clearly means that while normally the meeting would be optional, this time it’s not
idiotsecant 14 hours ago [-]
But it gets less mandatory the more layers up you go. If I get an email from an SVP that is CC: the entire division saying everyone should go to a meeting I will almost certainly be able to ascertain the contents of that meeting in 10 seconds from someone else who did attend
brewdad 13 hours ago [-]
Surely your boss notices your non-attendance.
delecti 13 hours ago [-]
If it's actually really mandatory, my manager will probably also relay that directly to me. And that resets the count for "less mandatory the more layers up you go".
dpark 13 hours ago [-]
Starting to wonder if some people who complain about all day meetings just don’t realize they are optional.
the_arun 8 hours ago [-]
Days are not far, where my agents are going to attend meetings & share my opinions, collect summary for me. If everyone do same - agents run meetings & share summary with parent (humans). Each of us have LLMs/Agents with our contextual data. It is another level of multi tasking.
xp84 6 hours ago [-]
Then I spin up another agent to listen to the agent who went to the meeting and make any necessary adjustments to the output of my coding agents based on the new rules it heard about from the meeting agent.
s3p 15 hours ago [-]
>>He asked staff to attend the meeting, which is normally optional.
>Is that false?
Judging from the comment above, no, the meeting happens every week, and this week they were asked to attend.
cobolcomesback 19 hours ago [-]
It’s not false. But it’s also weaselly worded.
Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”.
As for the second quote: senior engineers have always been required to sign off on changes from junior engineers. There’s nothing new there. And there is nothing specific to AI that was announced.
This entire meeting and message is basically just saying “hey we’ve been getting a little sloppy at following our operational best practices, this is a reminder to be less sloppy”. It’s a massive nothingburger.
BigTTYGothGF 18 hours ago [-]
> It says he “asked” staff to attend the meeting
Being "asked" by your boss to attend an optional meeting is pretty close to being required, it's just got a little anti-friction coating on it.
cobolcomesback 15 hours ago [-]
That really isn’t the culture at Amazon. There are all-team meetings that happen all the time, and every now and then there is a reminder that “hey we’re gonna be talking about an interesting topic so you might want to join”, but it is certainly not a mandate or expectation that everyone will join.
Different companies have different cultures. Weird that people can’t grok this.
ryandrake 14 hours ago [-]
"If you could just go ahead and attend that meeting, that would be greaaaaaaat..."
"Did ya get the memo... about that meeting? I'll just have my secretary forward you another copy of that memo, OK? Yeaaaaaaah..."
ragall 15 hours ago [-]
Exactly. It's just West coast passive aggressive managerial behavior.
i_cannot_hack 18 hours ago [-]
Your characterization of the event as a simple reminder to follow established best practices is directly contradicted by the briefing note of the meeting, which specifically mentions a lack of best practices related to AI. Which makes me skeptical of your assessment of the situation in general.
> Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.
8note 19 hours ago [-]
> senior engineers have always been required to sign off on changes from junior engineers.
definitely a team by team question. if it was required it would be a crux rule that the code review isnt approved without an l6 approver.
BikiniPrince 14 hours ago [-]
It’s part of the change management process that all code is reviewed. This is needed as per several different compliance agreements. What’s probably happened is poor peer reviews from other junior engineers gets missed. That’s a lot of code reviews to send upstream.
CoolGuySteve 20 hours ago [-]
It didn't seem to make the news but at least in NYC the entire Amazon storefront was broken all afternoon on Friday.
Items weren't displaying prices and it was impossible to add anything to your cart. It lasted from about 2pm to 5pm.
It's especially strange because if a computer glitch brought down a large retail competitor like Walmart I probably would have seen something even though their sales volume is lower.
malfist 19 hours ago [-]
Over the weekend I was trying to return a pair of shoes and get a different size and I kept getting 500s trying to go to the store page for the shoes.
chatmasta 14 hours ago [-]
Funny, I was automatically refunded for a pair of shoes that Amazon thought I never received even though I’m wearing them right now. I couldn’t even find a way to dispute the refund so I just took the win…
BikiniPrince 14 hours ago [-]
That explains why it kept changing the estimated received date. It was doing weird things.
A little birdie told me someone pushed duplicate data into one of Amazon’s core noSQL systems that runs most of e-commerce. The front end of the site broke in weird ways but it certainly wasn’t taking orders.
groundzeros2015 11 hours ago [-]
It’s always sobering to see a news story about something you have insider perspective on.
belval 20 hours ago [-]
I am not in that specific meeting but it made me chuckle that a weekly ops meeting will somehow get media attention. It's been an Amazon thing forever. Wait until the public learns about CoEs!
cmiles74 14 hours ago [-]
A weekly ops meeting where they talk about ensuring PRs with AI contributions get extra scrutiny? I think that's significant news.
osigurdson 14 hours ago [-]
Exactly. This is real world pushback on the "software is solved" narrative from AI labs. Also, most orgs try to copy Amazon for some reason more than big tech firms. "At our org, we disagree and commit" - yeah you made that one up yourself. Anyway, this is going to have a lot of impact in my view.
cobolcomesback 12 hours ago [-]
There was nothing mentioned in the meeting or messaging about PRs with AI contributions. There are no extra requirements for review or scrutiny of AI-generated-code. The media reports about this have been excessively misleading about this.
falsemyrmidon 13 hours ago [-]
It's not extra scrutiny. Doing code reviews for every commit is a standard practice at Amazon and has been for a decade plus.
8note 19 hours ago [-]
id.expect COEs to be coming up with AI code action items though, not to have more thorough human checks
coredog64 15 hours ago [-]
There's an explicit tension: SWEs would love that as a "get out of jail free" card, but their management chain is being evaluated by ajassy on AI/ML adoption. Admitting AI code as the root cause of a CoE is gonna look really bad unless/until your peers are also copping to it.
8note 13 hours ago [-]
I think its a question 2 or 3 in a why chain, but 4 and 5 need to be why the agent screwed up, and there needs to be action items that are around giving the ai better guardrails, context, or tooling.
"get a person to look at it" is a cop-out action item, and best intentions only. nothing that you could actually apply to make development better across the whole company
otterley 20 hours ago [-]
> Feels like the media is making a mountain out of a mole hill here.
That's been their job ever since cable news was invented.
It probably goes back as long as they have been shouting news in the town square in Rome or before that even.
lukan 15 hours ago [-]
Word around the campfire is, telling stories and exaggerating them to get people attention, is as old as humanity.
But good journalism is still something else.
otterley 19 hours ago [-]
True enough!
furyofantares 15 hours ago [-]
This reply chain is confusing but I'm guessing got merged from another thread that had a different title?
Must have as the comments are hours older than OP.
embedding-shape 20 hours ago [-]
> This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
Are you completely missing the point of the submission? It's not about "Amazon has a mandatory weekly meeting" but about the contents of that specific meeting, about AI-assisted tooling leading to "trends of incidents", having a "large blast radius" and "best practices and safeguards are not yet fully established".
No one cares how often the meeting in general is held, or if it's mandatory or not.
skeeter2020 19 hours ago [-]
>> Are you completely missing the point of the submission
no, and that's what people are noting: the headline deliberately tries to blow this up into a big deal. When did you last see the HN post about Amazon's mandatory meeting to discuss a human-caused outage, or a post mortem? It's not because they don't happen...
ummonk 13 hours ago [-]
Amazon has had a really bad string of various outages recently. Assuming they're internally treating this as business as usual in post-mortems then perhaps the newsworthy thing is actually that they aren't taking their outages seriously enough.
thepasch 15 hours ago [-]
> the headline deliberately tries to blow this up into a big deal
I do not understand how “company that runs half the internet has had major recent outages and now explicitly names lax/non-existent LLM usage guidelines as a major reason” can possibly not be a big deal in the midst of an industry-wide hype wave over how the world’s biggest companies now run agent teams shipping 150 pull requests an hour.
The chain of events is “AWS has been having a pretty awful time as far as outages go”, and now “result of an operational meeting is that the company will cut down on the use of autonomous AI.” You don’t need CoT-level reasoning to come to the natural conclusion here.
If we could, as a species, collectively, stop measuring the relevance of a piece of news proportionally by how much we like hearing it, please?
mattgreenrocks 15 hours ago [-]
The defensiveness is almost as interesting as the meeting itself.
emp17344 14 hours ago [-]
Way too many people have tied their egos to the success of AI.
cobolcomesback 12 hours ago [-]
And too many people have their egos tied to its failure, too.
Im a massive AI skeptic. If anyone were to be jumping up and down on the corpse of AI and this incessant drive to use it everywhere, it’d be me. But I also work at Amazon. I got the email. I attended the meeting. I can personally attest that there are no new requirements for AI-generated code. The articles about this in the meeting at extremely misleading, if not outright wrong. But instead of believing the person that was actually there in the room, this thread is full of people dismissing my first-hand account of the situation because it doesn’t align with the “haha AI failed” viewpoint.
autoexec 13 hours ago [-]
Not just their egos, but their paychecks. This place is either going to get very quiet or really weird when the hype train derails and the AI bubble bursts.
shermantanktop 9 hours ago [-]
The subject of the media coverage is not AWS, it is a peer organization to AWS that runs using significant amounts of non-AWS infrastructure. They are both part of an umbrella called Amazon but are not at all the same thing.
Maybe your CoT-level reasoning isn’t so robust.
saghm 6 hours ago [-]
It's hard to that this objection seriously. The publication is literally called the Financial Times. It's not exactly crazy for them to think that their readers might care about the entity that shows up the stock ticker rather than how the company happens to divide up things internally.
Even if it weren't a finance publication, I have trouble imagining you making this argument if a headline said something like "Google deals with outages in the cloud" because of the idea that it's misleading to refer to it as anything other than GCP. I think you're fundamentally not understanding how people communicate about this sort of thing if you actually think that someone saying "Amazon" is misleading in any meaningful way.
cobolcomesback 15 hours ago [-]
The message and meeting being discussed here have nothing to do with AWS or any outages AWS has faced recently. I think you’re missing the point of the discussion.
I don’t blame you, because this is just bad reporting (and potentially intentionally malicious to make you think it’s about AWS). But the meeting and discussion was with the Amazon retail teams, talking about Amazon retail processes, and Amazon retail services. The teams and processes that handle this are entirely separate from any AWS outages you are thinking of.
The outages that Amazon retail has faced also have nothing to do with AI, and there was no “explicit call out” about AI causing anything.
rahbert 5 hours ago [-]
This is correct. We ran them on Wednesday’s in Alexa. Jessy actually used to come and sit in ours once a quarter or so when he was running AWS.
18 hours ago [-]
cmiles8 19 hours ago [-]
The core message of the article is that Amazon has been having issues with AI slop causing operational reliability concerns, and that seems to be 100% accurate.
coredog64 15 hours ago [-]
/with AI slop//
inquirerGeneral 14 hours ago [-]
[dead]
age1mlclg6 15 hours ago [-]
What has really happened is that those employees were made into "reverse centaurs":
Who is the media you're accusing here? This is a twitter post. As far as I can tell they do not work a media company.
What is worth being pointed out is how quickly people blame "The Media" for how people use, consume and spread information on social networks.
otterley 19 hours ago [-]
The source is not a Twitter post, it's a Financial Times article (that the poster failed to cite).
18 hours ago [-]
niwtsol 20 hours ago [-]
I believe it is by group - AWS started the weekly operations meeting, effectively every service's oncall from the last week had to attend. Then it grew massive, so they made it optional. Alexa had a similar meeting that tried to replicate what AWS did. A lot of time spent reviewing load tests getting ready for holiday season, prime day, and the superbowl (super bowl ads used to cause crazy TPS spikes for Alexa). And a lot of finger pointing if there was an outage from one team. While it probably did help raise the operational bar, so much time wasted by engineers on busywork/paperwork documenting an error or fix vs improving the actual service.
happytoexplain 20 hours ago [-]
>Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off
Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.
I.e. senior review is valuable, but it does not make bad code good.
This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof").
ardeaver 20 hours ago [-]
When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect, but unless you have a lot of people review each pull request, it's basically just gambling.
The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues.
VorpalWay 13 hours ago [-]
This depends on the industry. I work on industrial machine control software, and we spend a huge amount of time on tests. We have to for some parts (human safety crtitical), but other parts would just be expensive if they failed (loss of income for customers, and possibly damaged equipment).
The key to making this scalable is to make as few parts as possible critical, and make the potential bad outcomes as benign as possible. (This lets you go to a lower rating in whatever safety standard applies to your industry.) You still need tests for the less critical parts though, while downtime is better than injury, if you want to sell future machines to your customers you need to have a good track record. At least if you don't want to compete on cost.
happyghost 13 hours ago [-]
> make as few parts as possible critical, and make the potential bad outcomes as benign as possible
This is a good lesson for anyone I think. Definitely something I’m going to think more about. Thanks for sharing!
asdfman123 13 hours ago [-]
One of the major things code review does is prevent that one guy on your team who is sloppy or incompetent from messing up the codebase without singling him out.
If you told someone "I don't trust you, run all code by me first" it wouldn't go well. If you tell them "everyone's code gets reviewed" they're ok with it.
behehebd 9 hours ago [-]
Everyone is sloppy sometimes. I wonder if what code review does is prevent velocity (acts a a brake) so that things dont change too fast (which is often a good thing).
You don't get paid for features or code shipped. People don't pay $200 a head for fine dining based on the number of carrot chops or garlic crushes. The chops and crushes are necessary but not what you should be optimizing for.
bluGill 19 hours ago [-]
> people don't get promoted for preventing issues.
they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices.
still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing.
tartoran 19 hours ago [-]
That could work but plenty of quiet heros weren’t promoted for fixing critical bugs.
recursive 19 hours ago [-]
They fixed it too soon. You have to wait until the effect is visible on someone's dashboard somewhere.
marcta 18 hours ago [-]
Goodhart's Law strikes again... "When a measure becomes a target, it ceases to be a good measure."
bluGill 19 hours ago [-]
You have to make sure it doesn't arrive at you before it is on the dashboard. Otherwise you are why it is blowing up the time to fix a bug metric. Unless you can make the problem so obscure other smart people asked to help you can't figure it out thus making you look bad.
joquarky 13 hours ago [-]
That is in no way guaranteed. Sometimes finding too many security issues makes you unpopular.
Two years afterward, we got hit with ransomware. And obviously "I told you so" isn't a productive discussion topic at that point.
johnnyanmac 17 hours ago [-]
That's not preventing the issue, though. The closest you can get to this is to have another competitor be burned hard and demonstrate how your code base has the exact same issue. But even that isn't guaranteed. "that can't happen here" is a hard mindset to disrupt unless you yourself are already a C suite.
I think of code review more about ensuring understandability. When you spend hours gathering context, designing, iterating, debugging, and finally polishing a commit, your ability to judge the readability of your own change has been tainted by your intimate familiarity with it. Getting a fresh pair of eyes to read it and leave comments like "why did you do it this way" or "please refactor to use XYZ for maintainability", you end up with something more that will be easier to navigate and maintain by the junior interns who will end up fixing your latent bugs 5 years later.
brianwawok 13 hours ago [-]
Alternately, have a small team where you trust everyone.
8note 19 hours ago [-]
> The problem, I think, is people don't get promoted for preventing issues.
cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times
wiseowise 5 hours ago [-]
> When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect
This bs is what I say my juniors when I want them to fuck off with their reviews and focus on my actual work.
Sounds very insightful though.
marginalia_nu 20 hours ago [-]
Expert reviews are just about the only thing that makes AI generated code viable, though doing them after the fact is a bit sketchy, to be efficient you kinda need to keep an eye on what the model is doing as its working.
Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture.
In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good.
If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase.
locusofself 20 hours ago [-]
If you spend 5-15x the time reviewing what the LLM is doing, are you saving any time by using it?
happytoexplain 19 hours ago [-]
No, but that's the crux of the AI problem in software. Time to write code was never the bottleneck. AI is most useful for learning, either via conversation or by seeing examples. It makes writing code faster too, but only a little after you take into account review. The cases where it shines are high-profile and exciting to managers, but not common enough to make a big difference in practice. E.g AI can one-shot a script to get logs from a paginated API, convert it to ndjson, and save to files grouped by week, with minimal code review, but only if I'm already experienced enough to describe those requirements, and, most importantly, that's not what I'm doing every day anyway.
brandensilva 18 hours ago [-]
I'm finding it in some cases I'm dealing with even more code given how much code AI outputs. So yeah, for some tasks I find myself extremely fast but for others I find myself spending ungodly amounts of time reviewing the code I never wrote to make sure it doesn't destroy the project from unforseen convincing slop.
ritlo 19 hours ago [-]
A related Dirty Secret that's going to become clear from all this is that a very large proportion of code in the wild (yes, even in 2026—maybe not in FAANG and friends, IDK, but across all code that is written for pay in the entire economy) has limited or no automated test coverage, and is often being written with only a limited recorded spec that's usually fleshed out only to the degree needed (very partial) as a given feature is being worked on.
What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.
I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".
We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.
On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.
ansibsha 17 hours ago [-]
> better-specified software
Code is the most precise specification we have for interfacing with computers.
xp84 6 hours ago [-]
Sure, but if you define the code as the only spec, then it is usually a terrible spec, since the code itself specifies bugs too. And one of the benefits of having a spec (or tests) is that you have something against which to evaluate the program in order to decide if its behavior is correct or not.
Incidentally, I think in many scenarios, LLMs are pretty great at converting code to a spec and indeed spec to code (of equal quality to that of the input spec).
tmaly 16 hours ago [-]
There are some cases where AI is generating binary machine code, albeit small amounts. What do we have when we don't have the code?
marginalia_nu 15 hours ago [-]
Machine code is still code, even if the representation is a bit less legible than the punch cards we used to use.
14 hours ago [-]
interestpiqued 13 hours ago [-]
You’re missing the point of a spec
unselect5917 7 hours ago [-]
The spec is as much for humans as it is the machine, yes?
interestpiqued 6 hours ago [-]
Spec should be made before hand and agreed on by stakeholders. It says what it should do. So it’s for whoever is implementing, modifying, and/or testing the code. And unfortunately devs have a tendency of poor documentation
slopinthebag 18 hours ago [-]
Re. productivity, if LLM's are a genuine boost with 1/3 of the work, neutral 1/3 of the time, and actually worse 1/3 of the time, it's likely we aren't really seeing performance improvements as 1) people are using them for everything and b) we're still learning how to best use them.
So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting.
dboreham 15 hours ago [-]
Bingo. Hopefully there are some business opportunities for us in that truth.
_wire_ 19 hours ago [-]
> because the time will just go into tests that wouldn't otherwise have been written
Writing tests to ensure a program is correct is the same problem as writing a correct program.
Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.
Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.
You can push uncertainty around and but you can't eliminate it.
This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.
As Douglas Adams noted: ultimately you've got to know where your towel is.
layer8 14 hours ago [-]
A competent programmer proves the program he writes correct in his head. He can certainly make mistakes in that, but it’s very different from writing tests, because proofs abstract (or quantify) over all states and inputs, which tests cannot do.
14 hours ago [-]
shimman 19 hours ago [-]
These companies don't care about saving time or lowering operating costs, they have massive monopolies to subsidize their extremely poor engineering practices with. If the mandate is to force LLM usage or lose your job, you don't care about saving time; you care about saving your job.
One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.
It has to end.
SchemaLoad 13 hours ago [-]
The submitter with their name on the Jira ticket saves time, the reviewer who has to actually verify the work loses a lot of time and likely just lets issues slip through.
marginalia_nu 19 hours ago [-]
To be honest, some times it's still beneficial.
For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.
misnome 18 hours ago [-]
And spotting stuff in review! Sometimes it’s false positives but on several occasions I’ve spent ~15-30 minutes teaching-reviewing a PR in person, checked afterwards and it matched every one of the points.
bluGill 19 hours ago [-]
Some, but not very much. Writing code is hard. Ai will do a lot of tedious code that you procrastinate writing.
hard24 19 hours ago [-]
Also when you are writing code yourself you are implicitly checking it whilst at the back of your mind retaining some form of the entire system as a whole.
People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.
bonesss 19 hours ago [-]
That’s the reverse-centaur issue I see: humans are not great at repetitive nuanced similar seeming tasks, putting the onus on humans to retroactively approve high volumes of critical code has them managing a critical failure mode at their weakest and worst. Automated reviews should be enhancing known good-faith code, manual reviews of high volume superficially sound but subversive code is begging for issues over time.
Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.
An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.
demosito666 15 hours ago [-]
> Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place
The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions.
A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through.
Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time.
bluGill 14 hours ago [-]
I have found ai extreemly good at finding all those really hard bugs though. Ai is a greater force multiplier when there is a complex bug than in gneen field code.
bluGill 19 hours ago [-]
Sortof. I work on a system too large for anyone to know the whole thing. Often people who don't know each other do something that will break the other. (Often because of the number of different people - most individuals go years between this)
raw_anon_1111 18 hours ago [-]
No I’m keeping up with the system as a whole because I’m always working at a system level when I’m using AI instead of worrying about the “how”
ansibsha 17 hours ago [-]
No you’re not. The “how” is your job to understand, and if you don’t you’ll end up like the devs in the article.
We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this.
raw_anon_1111 17 hours ago [-]
No in my case the “how” is
1. I spoke to sales to find out about the customer
2. I read every line of the contract (SOW)
3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks
3. I designed every single bit of AWS architecture and code
4. I did the design review with the client
5. I led the customer acceptance testing
> We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs
I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands
ansibsha 13 hours ago [-]
Yes, and trusting an LLM here is not a good idea. You know it will make important mistakes.
I’m not saying trusting cheap devs is a good idea either. I do think cheap devs are actually at risk here.
raw_anon_1111 12 hours ago [-]
I am not “trusting” either - I’m validating that they meet the functional and non functional requirements just like with an LLM. I have never blindly trusted any developer when my neck was the one on the line in front of my CTO/director or customer.
I didn’t blindly trust the Salesforce consultants either. I also didn’t verify every line of oSql (not a typo) they wrote.
icedchai 11 hours ago [-]
Actually, it's SOQL. I did Salesforce crap for many years.
14 hours ago [-]
rectang 19 hours ago [-]
> Expert reviews are just about the only thing that makes AI generated code viable
I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.
* Work in small increments
* Explicitly instruct the LLM to make minimal changes
* Think through possible failure modes
* Build in error-checking and validation for those failure modes
* Write tests which exercise all paths
This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.
marginalia_nu 19 hours ago [-]
By the time you're working in increments small enough that it doesn't introduce significant issues, you really might as well write the code yourself.
rectang 19 hours ago [-]
That's not my experience — I'm significantly faster while guiding an LLM using this methodology.
The gains are especially notable when working in unfamiliar domains. I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.
johnnyanmac 17 hours ago [-]
> I'm significantly faster while guiding an LLM using this methodology.
>When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.
If we're being honest with ourselves, it's not making devs work faster. It at best frees their time up so they feel more productive.
Fair point. I have definitely caught myself taking longer to revise a prompt repeatedly after the AI gets things wrong several times than it would have taken to write the code myself.
I'd like to think that I have this under control because the methodology of working in small increments helps me to recognize when I've gotten stuck in an eddy, but I'll have to watch out for it.
I still maintain that the LLM is saving me time overall. Besides helping in unfamiliar domains, it's also faster than me at leaf-node tasks like writing unit tests.
tmaly 16 hours ago [-]
How long will that 19% hold as models grow in capability?
johnnyanmac 16 hours ago [-]
I'm a bit tired of waiting for "tomorrow", so I'll just live in today's world. We'll burn that bridge when we get to it.
rafaelmn 5 hours ago [-]
The study you quoted is sonnet 3.5/3.7 era. You could see the promise with those models but the agentic/task performance of Opus 4.5/4.6 makes a huge difference - the models are pretty amazing at building context from a mid size codebase at this point.
otabdeveloper4 2 hours ago [-]
Google's latest research shows AI coding increases speed by 3% while also increasing bugs by 9%. (I.e., a net negative.)
AI doesn't make you code faster, it just makes the boring stretches somewhat more exciting.
rsynnott 14 hours ago [-]
> I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.
... Errr... Yeah, that's not a great approach, unless you are defining 'work' extremely vaguely.
rectang 12 hours ago [-]
Haha I have usually found myself on the conservative side of any engineering team I’ve been on, and it’s refreshing to catch some flak for perceived carelessness.
I still make an effort to understand the generated code. If there’s a section I don’t get, I ask the LLM to explain it.
Most of the time it’s just API conventions and idioms I’m not yet familiar with. I have strong enough fundamentals that I generally know what I’m trying to accomplish and how it’s supposed to work and how to achieve it securely.
For example, I was writing some backend code that I knew needed a nonce check but I didn’t know what the conventions were for the framework. So I asked the LLM to add a nonce check, then scanned the docs for the code it generated.
marginalia_nu 19 hours ago [-]
That's where the Gell-Mann amnesia will get you though. As much it trips up on the domains you're familiar with, it also trips up in unfamiliar domains. You just don't see it.
rectang 19 hours ago [-]
You're not telling me anything I don't know already. Only a person who accepts that they're fallible can execute this methodology anyway, because that's the kind of mentality that it takes to think through potential failure modes.
Yes, code produced this way will have bugs, especially of the "unknown unknown" variety — but so would the code that I would have written by hand.
I think a bigger factor contributing to unforeseen bugs is whether the LLM's code is statistically likely to be correct:
* Is this a domain that the LLM has trained on a lot? (i.e. lots of React code out there, not much in your home-grown DSL)
* Is the codebase itself easy to understand, written with best practices, and adhering to popular conventions? Code which is hard for humans to understand is also hard for an LLM to understand.
marginalia_nu 19 hours ago [-]
Right, I think the latter part is my concern with AI generated code. Often it isn't easy to read (or as easy to read as it could be), and the harder it is to navigate, the more code problems the AI model introduces.
It introduces unnecessary indirection, additional abstractions, fails to re-use code. Humans do this too, but AI models can introduce this type of architectural rot much faster (because it's so fast), and humans usually notice when things start to go off the rails, whereas an AI model will just keep piling on bad code.
rectang 19 hours ago [-]
I agree that under default settings, LLMs introduce way too many changes and are way too willing to refactor everything. I was only able to get the situation under control by adding this standing instruction:
---
applyTo: '**'
---
By default:
Make the smallest possible change.
Do not refactor existing code unless I explicitly ask.
Under this, Claude Opus at least produces pretty reliable code with my methodology even under surprisingly challenging circumstances, and recent ChatGPTs weren't bad either (though I'm no longer using them). Less powerful LLMs struggle, though.
raw_anon_1111 18 hours ago [-]
Besides building web apps for internal use, I’m never going to let AI architect something I’m not familiar with. I could care less whether it uses “clean code” or what design pattern it uses. Meaning I will go from an empty AWS account to fully fledged app + architecture because I’ve been coding for 30 years and dealing with every book and cranny of AWS for a decade.
But I would never do the same for Azure.
jonnycoder 18 hours ago [-]
I tend to agree. I spent a lot of time revising skills for my brownfield repo, writing better prompts to create a plan with clear requirements, writing a skill/command to decompose a plan, having a clear testing skill to write tests and validate, and finally having a code reviewer step using a different model (in my case it's codex since claude did the development). My last PR was as close to perfect as I have got so far.
Skidaddle 19 hours ago [-]
Just lead with “You are an expert software engineer…”, easy!
UncleMeat 15 hours ago [-]
Sadly, the way people become expert in a codebase is through coding. The process of coding is the process of learning. If we offload the coding to AI tools we will never be as expert in the codebase, its complexity, its sharp corners, or its unusual requirements. While you can apply general best practices for a code review you can never do as much as if you really got your hands dirty first.
"Seniors will do expert review" will slowly collapse.
raw_anon_1111 19 hours ago [-]
In my experience, inefficient code is rarely the issue outside of data engineering type ETL jobs. It’s mostly architectural. Inefficient code isn’t the reason your login is taking 30 seconds. Yes I know at Amazon/AWS scale (former employee) every efficiency matters. But even at Salesforce scale, ringing out every bit of efficiency doesn’t matter.
No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.
I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.
For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.
The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.
sarchertech 18 hours ago [-]
It’s not about hand crafted code or even code performance.
We know from experimentation that agents will change anything that isn’t nailed down. No natural language spec or test suite has ever come close to fully describing all observable behaviors of a non-trivial system.
This means that if no one is reviewing the code, agents adding features will change observable behaviors.
This gets exposed to users as churn, jank, and broken work flows.
raw_anon_1111 18 hours ago [-]
Thats easy enough to prevent with modular code that’s what “plan mode” is for. But you probably never worked with a bunch of C# developers using R#
sarchertech 17 hours ago [-]
1. Preventing agents from crossing boundaries, creating implicit and explicit dependencies, and building false layers requires much more human control over every PR and involvement with the code than you seem to espouse.
2. Assuming that techniques that work with human developers that have severely impaired judgement but are massively faster at producing code is a bad idea.
3. There’s no way you have enough experience with maintaining code written in this way to confidently hand wave away concerns.
rixed 6 hours ago [-]
I solve this issue (agent looking at too much and changing too much) with the best abstraction ever invented : files and permissions.
One task is usually composed of 2 input files, a specification and a header file, and the task is to output the implementation and nothing more. Agent user has no other permissions in the file system, has no tools, just output the code that's directed into a file. I run ´make' whenever I update a specification. Token count is minimal.
Do I save time? Not much, but having to specify and argue about everything is interesting, and I trust myself that I'm not loosing any knowledge this way; be it the why or the how.
raw_anon_1111 17 hours ago [-]
Absolutely no one in the value chain cares about “how many layers of abstractions your code has - not your management or your customers. They care about functional and none functional requirements
sarchertech 14 hours ago [-]
Of course they don’t. Please reread what I said, give it the slightest bit of thought, and re-respond if you want a response from me.
raw_anon_1111 14 hours ago [-]
By definition, coding agents are right now the worse they will ever be and the industry as a whole by definition is the least experienced it will ever be at using then.
So many people on HN are so insulted that the people who put money in our bank accounts and in some cases stock in our brokerage accounts ever cared about their bespoke clean code, GOF patterns and they never did. LLM just made it more apparent.
It’s always been dumb for PR to be focused on for loops vs while loops instead of focusing on whether functional and non functional requirements are met
sarchertech 11 hours ago [-]
Wow you have completely lost the plot. It’s like you’re a bot that’s mixing up who he’s replying to.
raw_anon_1111 11 hours ago [-]
Just maybe you aren’t making the strong argument you think you are making
sarchertech 14 minutes ago [-]
I wouldn’t know sir, because you didn’t address a single lick of it. You went off and argued with some other argument you constructed in your head. I believe we have a name for that. It is an awfully good way to win arguments in your own mind and retain absolute confidence in your position if that’s your goal though.
hard24 19 hours ago [-]
"No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements"
Speak for yourself. I don't hire people like you.
raw_anon_1111 19 hours ago [-]
And guess what? You probably don’t pay as much as I make now either…
Even in late 2023 with the shit show of the current market, I had no issues having multiple offers within three weeks just by reaching out to my network and companies looking for people with my set of skills.
hard24 19 hours ago [-]
I field a small team of experts who are paid upwards of a million GBP in cold-hard cash in London. Not stock. Cash.
You sound like a bozo, I can sniff it through my screen.
zepolen 14 hours ago [-]
This sounds like a place I want to work at.
YCpedohaven 19 hours ago [-]
[flagged]
raw_anon_1111 19 hours ago [-]
Yes because I didn’t check to see if Claude code used a for loop instead of a while loop? Or that it didn’t use my preferred GOF pattern and didn’t use what I read in “Clean Code”?
Guess what? I also stopped caring how registers are used and counting clock cycles in my assembly language code like it’s the 80s and I’m still programming on a 1Mhz 65C02
icedchai 17 hours ago [-]
I can see the argument both ways. Some code is just not worth looking at...
But do you look at any of the AI output? Or is it just "it works, ship it"?
raw_anon_1111 16 hours ago [-]
My last project was basically an ETL implementation on AWS starting with an empty AWS account and a internal web admin site that had 10 pages. I am yada yada yadaing over a little bit.
What I checked.
1. The bash shell scripts I had it write as my integration test suite
2. To make sure it wasn’t loading the files into Postgres the naive way -loading the file from S3 and doing bulk inserts instead of using the AWS extension that lets it load directly from S3. It’s the differ xe between taking 20 minutes and 20 seconds.
3. I had strict concurrency and failure recovery requirements. I made sure it was done the right way.
4. Various security, logging, log retention requirements
What I didn’t look at - a line of the code for the web admin site. I used AWS Cognito for authentication and checked to make sure that unauthorized users couldn’t use the website. Even that didn’t require looking at the code - I had automated tests that tested all of the endpoints.
icedchai 14 hours ago [-]
This all makes sense.
I've witnessed human developers produce incredibly convoluted, slow "ETL pipelines" that took 10+ minutes to load single digit megabytes of data. It could've been reduced to a shell script that called psql \copy.
getly_store 2 hours ago [-]
Yep. Heavy ETL often adds latency; a staging table plus COPY into Postgres, then idempotent upserts, is usually enough. Keep it incremental and observable: checksums, counts, and replayable loads. For bigger scales, add CDC (logical decoding like Debezium) and parallelize ingestion across partitions; minimize in-Python transforms and push work into SQL.
unshavedyak 13 hours ago [-]
> For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.
Hell, often it feels slower/worse. Foreign code is easily confusing at first, which slows you down - and bad code quickly gets bewildering and sends you down paths of clarifications that waste time.
SchemaLoad 13 hours ago [-]
So many times I get AI generated PRs from juniors where I don't feel comfortable with the code, I wouldn't do it like this myself, but I can't strictly find faults that I can reject the PR with. Usually it's just a massive amount of code being generated which is extremely difficult to review, much harder than it was for the submitter to generate and send it for review.
Then often it blows up in production. Makes me almost want to blanket reject PRs for being too difficult to understand. Hand written code almost has an aversion to complexity, you'd search around for existing examples, libraries, reusable components, or just a simpler idea before building something crazy complex. While with AI you can spit out your first idea quickly no matter how complex or flawed the original concept was.
MarkSweep 12 hours ago [-]
Rejecting a PR for being overly complicated or difficult to understand is valid. Breaking a large change into understandable pieces is an important skill both for making changes reviewable as well as helping the author understand the problem.
js8 20 hours ago [-]
> requires an amount of time approaching the time spent if they had just done it themselves
It's actually often harder to fix something sloppy than to write it from scratch. To fix it, you need to hold in your head both the original, the new solution, and calculate the difference, which can be very confusing. The original solution can also anchor your thinking to some approach to the problem, which you wouldn't have if you solve it from scratch.
ummonk 13 hours ago [-]
In fairness though, it does give you good practice for the essential skill of maintaining / improving an old codebase.
bluGill 19 hours ago [-]
Sloppy code that has been around for a while works. It likely has support for edge cases you forgot about. Often the sloppyness is because of those edge cases.
js8 15 hours ago [-]
That's the incidental (necessary) vs accidental (avoidable) complexity distinction. But I don't think it makes it any easier to deal with.
bluGill 14 hours ago [-]
those are different things. Often you don't plan for all the necessary things and so it doesn't fit in - even though a better design evists that would have it fit in neater - but only years latter do you see it and getting there is now a massive effort you can't afford. The result looks sloppy because on hindsight right is obvious
steveBK123 20 hours ago [-]
Right, code reviews should already have been happening with human written junior code.
If AI is a productivity boost and juniors are going to generate 10x the PRs, do you need 10x the seniors (expensive) or 1/10th the juniors (cost save).
A reminder that in many situations, pure code velocity was never the limiting factor.
Re: idiot prooofing
I think this is a natural evolution as companies get larger they try to limit their downside & manage for the median rather than having a growth mindset in hiring/firing/performance.
AgentOrange1234 18 hours ago [-]
Seniors are going to need to hold Juniors to a high bar for understanding and explaining what they are committing. Otherwise it will become totally soul destroying to have a bunch of juniors submitting piles of nonsense and claiming they are blocked on you all the time.
sethops1 18 hours ago [-]
This was challenging enough pre AI. Now that everybody has an AI slop button, the life of an effective code reviewer just got so much more miserable.
esafak 11 hours ago [-]
Make them first go through an AI reviewer that is informed by the code base's standards.
18 hours ago [-]
onion2k 19 hours ago [-]
I.e. senior review is valuable, but it does not make bad code good.
I suspect that isn't the goal.
Review by more senior people shifts accountability from the Junior to a Senior, and reframes the problem from "Oh dear, the junior broke everything because they didn't know any better" to "Ah, that Senior is underperforming because they approved code that broke everything."
jetrink 20 hours ago [-]
It could create the right sort of incentives though. If I'm a junior and I suddenly have to take my work to a senior every time I use AI, I'm going to be much more selective about how I use it and much more careful when I do use it. AI is dangerous because it is so frictionless and this is a way to add friction.
Maybe I don't have the correct mental model for how the typical junior engineer thinks though. I never wanted to bug senior people and make demands on their time if I could help it.
devonbleak 19 hours ago [-]
What you're actually going to see is seniors inundated by slop and burning out and quitting because what used to be enjoyable solving of problems has become wading through slop that took 10 minutes to generate and submit but 30+ minutes to understand and write up a critique for it.
kavalg 4 hours ago [-]
30+ minutes is a gross underestimation IMHO. It is probably somewhere in the range 1:10, 10:100, especially if you include the cost of context switches a senior has to do. In my experience, the loss of flow, due to context switch is very prominent and sometimes painful.
8note 12 hours ago [-]
not even that. implementing this requirement could be a general work stoppage whenever the senior engineer is in all day meetings or on vacation
With a layout of 4 juniors, 5 intermediates, and 0-1 senior per team, putting all the changes through senior engineer review means you mostly wont be able to get CRs approved.
I guess it could result in forcing everyone who's sandbagging as intermediate instead of going to senior to have to get promoted?
wiseowise 5 hours ago [-]
What vacation? Didn’t you hear we unlocked unlimited productivity?
SpicyLemonZest 18 hours ago [-]
In my experience, Claude and the juniors piloting it are usually receptive to quick feedback along the lines of "This is unreasonably hard to understand, please try refactoring it this way and let me know when it's cleaner".
suzzer99 18 hours ago [-]
Can I interest you in a bunch of emoji-laden comments?
bs7280 19 hours ago [-]
This is also why I think we will enter a world without Jr's. The time it takes for a Senior to review the Jr's AI code is more expensive than if the Sr produced their own AI code from scratch. Factor in the lack of meetings from a Sr only team, and the productivity gains will appear to be massive.
Whether or not these productivity gains are realized is another question, but spreadsheet based decision makers are going to try.
czscout 19 hours ago [-]
In this scenario, how might one become a senior without first being a junior? Seniors just pop into existence?
bs7280 17 hours ago [-]
The business leaders do not care about this yet. I think a lot of people think we already have more Seniors than we will need in the next 5-10 years.
Also - the definition of Senior will change, and a lot of current Seniors will not transition, while plenty of Juniors that put in a lot of time using code agents will transition.
ForHackernews 14 hours ago [-]
>while plenty of Juniors that put in a lot of time using code agents will transition.
But will they? I'm not at all convinced that babysitting an AI churning out volumes of code you don't understand will help you acquire the knowledge to understand and debug it.
esafak 11 hours ago [-]
Some will, some will not; hiring interviews and promotion committees will take care of the rest.
kavalg 4 hours ago [-]
Apprenticeship. You will have to prove to the company that working at a minimal wage is still beneficial. Or we can take it even further, you will have to pay the company for getting the necessary experience. Maybe you sign a 5 year contract with a big cancellation fee. It is not unheard of. I remember some of the navy schools having something like this. You study for 5 years for free (bed and food are paid by the school) and then you have to work for at least 5 years for the navy or pay a very big fine if you refuse to do so.
simplyluke 19 hours ago [-]
The bet from various industry leaders appears to be that the current generation of engineers will be the last who will ever need to think about complex systems and engineering, as the AI will just get good enough to do all of that by the time they retire.
lovich 18 hours ago [-]
I think it’s deeper than that because it’s affected more industries than software and already started pre AI.
American corporate culture has decided that training costs are someone else’s problem. Since every corporation acts this way it means all training costs have been pushed onto the labor market. Combine that with the past few decades of “oops, looks like you picked the wrong career that took years of learning and/or 10 to 100s of thousands of dollars to acquire but we’ve obsoleted that field” and new entrants into the labor market are just choosing not to join.
Take trucking for example. For the past decade I’ve heard logistics companies bemoan the lack of CDL holders, while simultaneously gleefully talk about how the moment self driving is figured out they are going to replace all of them.
We’re going to be outpaced by countries like China at some point because we’re doing the industrial equivalent of eating our seed corn and there is seemingly no will to slow that trend down, much less reverse it.
bluefirebrand 14 hours ago [-]
> we’re doing the industrial equivalent of eating our seed corn and there is seemingly no will to slow that trend down, much less reverse it.
I know I'm probably coming across as a lunatic lately on HN but I really do think we're on the path towards violence thanks to AI
You just cannot destroy this many people's livelihoods without backlash. It's leading nowhere good
But a handful of people are getting stupidly rich/richer so they'll never stop
lovich 14 hours ago [-]
I don't think you're a lunatic.
If you look at the luddite rebellion they weren't actually against industrial technology like looms. They were against being told they weren't needed anymore and thrown to the wolves because of the machines.
The rich have forgotten they are made of meat and/or are planning on returning to feudalism ala Yarvin, Thiel, Musk, and co's politics.
wiseowise 4 hours ago [-]
> The rich have forgotten they are made of meat and/or are planning on returning to feudalism ala Yarvin, Thiel, Musk, and co's politics.
What are you realistically willing to do? How far can you go?
bluefirebrand 13 hours ago [-]
> They were against being told they weren't needed anymore and thrown to the wolves because of the machines.
I guess that makes me a modern luddite then
A software engineer luddite
A techno-luddite if you will
Maybe I have a new username
hintymad 15 hours ago [-]
> Review by a senior is one of the biggest "silver bullet" illusions managers suffer from
Especially in a big co like Amazon, most senior engineers are box drawers, meeting goers, gatekeepers, vision setters, org lubricants, VP's trustees, glorified product managers, and etc. They don't necessarily know more context than the more junior engineers, and they most likely will review slowly while uncovering fewer issues.
strogonoff 5 hours ago [-]
I would argue that the amount of time needed for a proper review exceeds the amount of time needed to just do it yourself.
When reviewing, you need to go through every step of implementing it yourself (understand the problem, solve the problem, etc.), but you additionally need to 1) understand someone else’s solution and 2) diff your solution against theirs to provide meaningful feedback.
Review could take roughly equivalent time, but only if I am allowed to reject/approve in a binary way (“my solution would not be the same, therefore denied”) which is not considered appropriate in most places.
This is why I am not a fan of being the reviewer.
raw_anon_1111 19 hours ago [-]
Why only AI generated code? I wouldn’t let a junior or mid level developer’s code go into production without at least verifying the known hotspots - concurrency, security, database schema, and various other non functional requirements that only bite you in production.
I’m probably not going to review a random website built by someone except for usability, requirements and security.
happytoexplain 19 hours ago [-]
I didn't restrict my opinion to genAI code. I'm expressing a general thought that was relevant before AI. AI is just salient in relation to it.
I also said senior review is valuable, but I'm not 100% sure if you're implying I didn't.
OrangeDelonge 13 hours ago [-]
I’ve seen hundreds of PR’s produced by a junior and reviewed by a mid lvl go into prod. I don’t see any problem with that
lionkor 2 hours ago [-]
> [...] requires an amount of time approaching the time spent if they had just done it themselves.
Yes, but with the caveat that the junior learns and eventually can become the senior.
zamalek 14 hours ago [-]
> Review by a senior is one of the biggest "silver bullet" illusions managers suffer from.
My manager has been urging us to truly vibe code, just yesterday saying that "language is irrelevant because we've reached the point where it works - so you don't need to see it." This article is a godsend; I'll take this flawed silver bullet any day of the week.
mrothroc 17 hours ago [-]
Senior review can definitely help, regardless if the code comes from a junior or an LLM. We've done this since the dawn of time. However, it doesn't scale and since LLM volume far exceeds what juniors can do, you end up overwhelming the seniors, who are normally overbooked anyway.
The other problem is that the type of errors LLMs make are different than juniors. There are huge sections of genuinely good code. So the senior gets "review fatigue" because so much looks good they just start rubber stamping.
I use an automated pipeline to generate code (including terraform, risking infrastructure nukes), and I am the senior reviewer. But I have gates that do a whole range of checks, both deterministic and stochastic, before it ever gets to me. Easy things are pushed back to the LLM for it to autofix. I only see things where my eyes can actually make a difference.
Amazon's instinct is right (add a gate), but the implementation is wrong (make it human). Automated checks first, humans for what's left.
belval 20 hours ago [-]
The unwritten thing is that if you need seniors to review every single change from junior and mid-level engineers, and those engineers are mostly using Kiro to write their CRs, then what stops the senior from just writing the CRs with Kiro themselves?
qnleigh 19 hours ago [-]
I seriously doubt that they think senior reviewers will meticulously hunt down and fix all the AI bugs. Even if they could, they surely don't have the time. But it offers other benefits here:
1. They can assess whether the use of AI is appropriate without looking in detail. E.g. if the AI changed 1000 lines of code to fix a minor bug, or changed code that is essential for security.
2. To discourage AI use, because of the added friction.
yifanl 20 hours ago [-]
Senior reviews are useful, but as I understand it, Amazon has a fairly high turnover rate, so I wonder just how many seniors with deep knowledge of the codebase they could possibly have.
tartoran 19 hours ago [-]
From engineers are interchangeable to high turnover are decisions that the company took. The payback time always comes at some point.
grvdrm 20 hours ago [-]
What a statement at the end. You are absolutely right.
I hear “x tool doesn’t really work well” and then I immediately ask: “does someone know how to use it well?” The answer “yes” is infrequent. Even a yes is often a maybe.
The problem is pervasive in my world (insurance). Number-producing features need to work in a UX and product sense but also produce the right numbers, and within range of expectations. Just checking the UX does what it’s supposed to do is one job, and checking the numbers an entirely separate task.
I don’t many folks that do both well.
hnthrow0287345 19 hours ago [-]
>requires an amount of time approaching the time spent if they had just done it themselves.
I would actually say having at least 2 people on any given work item should probably be the norm at Amazon's size if you also want to churn through people as Amazon does and also want quality.
Doing code reviews are not as highly valued in terms of incentives to the employees and it blocks them working on things they would get more compensation for.
rafaelmn 5 hours ago [-]
Eventually they'll get rid of juniors and mid level devs because realistically it's easier to review when you're the one doing the prompting.
lokar 20 hours ago [-]
The goal of Sr code review is not to make the code better, it's to make the author better.
skeeter2020 19 hours ago [-]
Agree but even broader: authors. I always viewed reviews as targeting Brook's less famous findings about the optimal team size being one, and asking how can we get better at building systems too big for the individual. I think code review is about shared, consistent understanding with catching bugs a nice side effect (or justification for the bean counters).
lokar 19 hours ago [-]
I agree, made (mostly) that point in my top level comment. Code reviews (both in the normal GitHub flow, but also small meetings, design reviews, etc) all help to tie the team together and improve quality.
sumeno 14 hours ago [-]
That's not going to work when the author is an LLM
radiator 18 hours ago [-]
Deming's point 3 (of 14): Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.
rco8786 13 hours ago [-]
Going to systemically turn off your senior staff over time also. Most Senior Engineers aren't that interested in doing even more code review.
mmcconnell1618 12 hours ago [-]
Also, have massive layoffs every few months just to keep people on edge. AWS wants people to leave with RTO and badging policies, comp range shifts lower unless you have year over year ratings, and an obsessive push to force AI into every process. Top talent is leaving and will continue to leave AWS.
5 hours ago [-]
mrbonner 20 hours ago [-]
What stops the senior from using AI to review the AI generated code the junior published?
tartoran 19 hours ago [-]
That’s something that the junior can do. What companies want to do is put responsibility on someone who has more knowledge and skin in the game
femiagbabiaka 20 hours ago [-]
the outcome of the review isn't just that the code gets shipped, it's knowledge transfer from the senior engineer to the junior engineers that then creates more senior engineers
remarkEon 18 hours ago [-]
Other than “don’t hire idiots”, what is the solution to this problem? I agree with you, and this particular systems management issue is not constrained to software.
happytoexplain 12 hours ago [-]
I don't know.
We need smart people at every layer. If leadership isn't in that category, it spreads to all layers.
I don't know how we defeat capitalism to incentivize smart leadership. It's fundamentally opposed to market forces.
yalogin 13 hours ago [-]
Don’t forget that this auto generated code will have subtle bugs and feels complete at the outset
munk-a 15 hours ago [-]
Reviewing code changes (generally) takes more time than writing code changes for a pretty significant chunk of engineers. If we're optimizing slop code writing at the expense of more senior's time being funneled into detailed reviews then we're _doing it wrong_.
napolux 19 hours ago [-]
LGTM
RamblingCTO 20 hours ago [-]
Who said PR reviews need to solve all the things and result in proof against idiots?
So you're saying that peer reviews are a waste of time and only idiots would use/propose them?
happytoexplain 19 hours ago [-]
None of that, sorry if I wasn't clear.
To partially clarify: "Idiot proof" is a broad concept that here refers specifically to abstraction layers, more or less (e.g. a UI framework is a little "idiot proof"; a WYSIWYG builder is more "idiot proof"). With AI, it's complicated, but bad leadership is over-interpreting the "idiot proof" aspects of it. It's a phrase, not an insult to users of these tools.
33MHz-i486 10 hours ago [-]
In case it isn’t completely obvious from this, it is indeed hellish to work there. Most of AWS has a 2 reviewer requirement. If AI is writing most of the code (and it is because most Amazon code is copypasta boilerplate) you need 3 developers to sign off to ship anything. But of course due to headcount attrition, managers have ~1.5 developers to a project. Meanwhile the L8 manager is doing nothing except stack ranking each level of engineers according to number of commits merged & customer facing features shipped, and firing 15% of the bottom at the end of each year. There is no notion of subject matter expertise or technical depth, theyre happy to replace whoever with fresh-grads (theyre all just cogs anyway right!). Between that and voluntary departures, teams having 80-100% turnover every 5 years is basically par.
Also while this is happening most developers are getting constantly hammered by operational issues and critical security tasks because 1) the legacy toolchain imports 6 different language package ecosystems and 2)no one ever pays down tech debt in legacy code until its a high severity ticket count in a KPI dashboard visible to the senior management.
mikert89 8 hours ago [-]
The thing is, this management philosophy worked when AWS knew what they needed to build and just needed to execute with top notch operations.
But now with AI, they are getting disrupted. Most AWS services might become obsolete, why does an ai need these janky higher levels abstractions AWS piles on.
So now they need innovation, but the company isn’t set up for it. They are forcing short deadlines for product launches that don’t matter
prakhar897 19 hours ago [-]
From the amazon I know, people only care about a. not getting fired and b. promotions. For devs, the matrix looks like this:
1. Shipping: deliver tickets or be pipped.
2. Having Less comments on their PRs: for some drastically dumb reason, having a PR thoroughly reviewed is a sign of bad quality. L7 and above use this metric to Pip folks.
3. Docs: write docs, get them reviewed to show you're high level.
Without AI, an employee is worse off in all of the above compared to folks who will cheat to get ahead.
I can't see how "requesting" folks for forego their own self-preservation will work. especially when you've spent years pitting people against each other.
malfist 18 hours ago [-]
Not only is having too many comments on your PRs bad for you, but so is not leaving comments on other people's PRs. Both are metrics used
dude250711 15 hours ago [-]
I'd leave lots of comments out of spite whenever I would feel my PRs had been treated unfairly. If I am going down, you all are coming with me.
grogenaut 12 hours ago [-]
I specifically look at the quality / substance of the comments when I'm reviewing someone for promo/transfer/fire.
malfist 14 hours ago [-]
Welcome to Amazon, you'll fit right in.
embedding-shape 16 hours ago [-]
> 2. Having Less comments on their PRs: for some drastically dumb reason, having a PR thoroughly reviewed
I'm very far away from liking Amazon's engineering culture and general work culture, but having PRs with countless of discussions and feedback on it does signal that you've done a lot of work without collaborating with others before doing the work. Generally in teams that work well together and build great software, the PRs tend to have very little on them, as most of the issues were resolved while designing together with others.
joeframbach 14 hours ago [-]
I've been involved in so many CRs where I've given feedback over 10 revs, then the submitter cancels the CR and files a new one, for the metrics.
tom_ 11 hours ago [-]
If the review tooling is any good, getting the code somewhere it can see it is a convenient way for people to give and receive feedback. As the saying goes, the system is what it does!
(And/but yes/no, I have never worked at NAGFAM...)
ex-aws-dude 14 hours ago [-]
Eh I feel like there are some features where you just have to get in the weeds to even design it and the code review itself is part of the process of designing/figuring out the edge cases.
embedding-shape 37 minutes ago [-]
> Eh I feel like there are some features where you just have to get in the weeds to even design it and the code review
I agree, but those are separate tasks completely (in my view) compared to "Someone writes code that goes into production", usually called "spikes" or something else to differentiate them from "normal" tasks. They're quite literally just about exploration and figuring out the design, before the "real" work starts.
dboreham 15 hours ago [-]
4. Don't work in the corporate equivalent of The Hunger Games.
999900000999 13 hours ago [-]
At least in the past the idea is you do the dance , vest and leave.
I missed my FAANG chance during the good years. No retirement for me!
philip1209 14 hours ago [-]
I think the deeper need is a "self-review" flow.
People push AI-reviewed code like they wrote it. In the past, "wrote it" implies "reviewed it." With AI, that's no longer true.
I advocate for GitHub and other code review systems to add a "Require self-review" option, where people must attest that they reviewed and approved their own code. This change might seem symbolic, but it clearly sets workflows and expectations.
billbrown 11 hours ago [-]
Yes, underthinking is rampant. Glancing at "AI" output is not reviewing code: you have to grok it (in the Heinlein sense) in order to treat it as your own.
userbinator 10 hours ago [-]
You have to grok it, and not just Grok it.
Tyr42 14 hours ago [-]
Heck, doing a self review when you wrote the code catches stuff like forgetting debug prints.
nothrabannosir 13 hours ago [-]
(tangent of the decade : prefixing your debug printfs with NOCOMMIT helps catching them before commit :) sample precommit hook and GitHub ci action I wrote is at https://github.com/nobssoftware/nocommit but it’s just a grep)
therealdrag0 10 hours ago [-]
Self review should also include adding guiding comments for other reviewers.
fireant 3 hours ago [-]
Do you add these into the code or into the review itself? I sometimes write these into the review, but I wonder if it's a useful information that should actually be inside the code that will get lost when the PR is merged
jeremyjh 13 hours ago [-]
We have it in a checklist in PR template. I can’t imagine a fiat class feature that would be much more meaningful. It surprised me to learn there are developers who have to be reminded to review their own code and test it, but does seem to help.
kuekacang 12 hours ago [-]
I've been lucky to discover git relatively late and sublime merge relatively soon. It seems like separating the concern of editing and reviewing code is making me consider each more as separate thing.
It also makes me more comfortable figuring out how a project's pull acceptance are like (maybe due to how fast local ui is compared to web-based git). On the other hand, I can only run some basic git cli commands and can't quickly comprehend raw text-based diff, especially when encountering some linux patches from time to time.
paxys 13 hours ago [-]
If someone was confident enough to push through an AI change without even reading/reviewing it themselves adding more buttons to the UI isn't going to change anything.
13 hours ago [-]
8note 12 hours ago [-]
the tooling doesnt make it easy currently.
working at amazon, when I wanted to review code myself through the CR tool, Id still end up publishing it to the whole team and have to add some title shenanigans saying it was a self review or WIP and for others to not look at it yet
koinedad 13 hours ago [-]
Self review is #1
13 hours ago [-]
sdevonoes 20 hours ago [-]
Reviewing AI generated code at PR time is a bottleneck. It cancels most of the benefits senior leadership thinks AI offers (delivery speed).
There’s also this implicit imbalance engineers typically don’t like: it takes me 10 min to submit a complete feature thanks to Claude… but for the human reviewing my PR in a manual way it will take them 10-20 times that.
Edit: at the end real engineers know that what takes effort is a) to know what to build and why, b) to verify that what was built is correct. Currently AI doesn’t help much with any of these 2 points.
The inbetweens are needed but they are a byproduct. Senior leadership doesn’t know this, though.
hard24 19 hours ago [-]
Indeed. My view as a CEO is, if you are still reviewing the code yourself then what use is it that you can produce a bunch of text at a faster rate?
I'd prefer people wrote good quality code and checked it as they went along... whilst allowing room for other stuff they didn't think of to come to the front.
The production process of using LLMs is entirely different, in its current state I don't see the net benefit.
E.g. if you have a very crystalised vision of what you want, why would I want an engineer to use an LLM to write it, when the LLM can't do both raw production and review? Could this change? Sure. But there's no benefit for me personally to shift toward working that way now - I'd rather it came into existence first before I expose myself to incremental risk that affects business operations. I want a comprehensive solution.
beardedetim 19 hours ago [-]
This is what I don't understand about this policy. There's no way a senior has enough spare capacity to be the gate keeper on every PR made by AI below them. So now we are just making it so the senior people use more AI to keep up but now they're to blame for letting it happen.
It sounds like a piss poor deal for seniors unless senior engineer now means professional code reviewer.
malfist 14 hours ago [-]
That's amazon in a nutshell though. Create conflicting metrics for performance, push credit up and responsibility down, punish everyone below you for not meeting the double standards
znpy 14 hours ago [-]
> Create conflicting metrics for performance, push credit up and responsibility down, punish everyone below you for not meeting the double standards
This resonates with my experience.
The only thing you forgot is that you can also use the 12^H^H 14 leadership principles to argue whatever you want (and then the opposite of what you argued last month, still using the same leadership principles).
malfist 12 hours ago [-]
Got a project finished early? Well, you didn't insist on the highest standards. Made sure things were held to a high standard? Well, you weren't biased for action.
Were you a knowledge source for the entire team? Well, you weren't learning and being curious. Did you ask a lot of questions to learn everything? Well, then you weren't "are right a lot".
Did you think big and come up with an architecture that saved Amazon a lot of money? Then you weren't inventing and simplifying. Build something simple to get out out the door quick? Well, you weren't thinking big.
Did you act quickly without consulting others to fix an issue? Well you weren't earning trust. Did you consult people to make sure they were happy with the solution? Well you weren't biased for action.
Thats just a few examples, there's so many more
Terr_ 10 hours ago [-]
Very nice, I can imagine someone turning it into a little satirical webpage, which implements a kind of decision tree:
1. Choose from a set of challenge types (e.g. meeting a deadline, reliability)
2. Choose whether the challenge was "met" or "failed".
3. Choose whether you want to make the person look good or bad, by following/ignoring a principle.
4. Results: A list of relevant principles with short rationalizations.
I'm almost tempted to try, except perhaps I should treasure my ignorance.
If a tool like that gets popular enough that most employees are using it for office-politics, it might even start to deflate the whole Leadership Principles thing.
rhubarbtree 13 hours ago [-]
Most AI advocates I know believe this period, reviewing every line of code, will come to an end when models improve. So there will be no bottleneck. We will simply test and ship, with AI doing all the code and review.
bandrami 8 hours ago [-]
Possibly, but it doesn't make sense to restructure things in advance of that actually happening, particularly since there's no roadmap for getting there right now.
rhubarbtree 5 hours ago [-]
They are already at this point - they just think the world needs to catch up. They don’t review the code most of the time. They believe it’s just a matter of becoming comfortable with the idea you don’t write code. Seems plenty of startups in SV are also doing this.
bandrami 5 hours ago [-]
Like the fake facebook that had a security hole so severe every single participant had their API keys exposed?
rhubarbtree 4 hours ago [-]
Yep, that’s the most prominent example of a system built without code review I can also reach for. Whether all such systems also suffer critical flaws is another question entirely. And whether that matters is a further unknown.
qnleigh 19 hours ago [-]
Surely they know all this. They're worried about AI code degrading codebase quality, so they're putting on the brakes.
radiator 18 hours ago [-]
> Senior leadership doesn’t know this, though.
Well, you'd think senior leadership should know how their business and their people work.
Barrin92 15 hours ago [-]
to be fair senior engineering leads in the software world are like Voltaire's joke about the holy roman empire, neither holy, roman or an empire.
Despite the name not a lot of seniority, leadership or engineering going around
asadotzler 15 hours ago [-]
LOL
cmiles8 19 hours ago [-]
The optics here are really bad for Amazon. The continuing mass departures of long tenured folks, second-rate AI products, and a string of bad outages paints a picture that current leadership is overseeing a once respected engineering train flying off the tracks.
News from the inside makes it sound like things are getting pretty bad.
the_biot 14 hours ago [-]
> The continuing mass departures of long tenured folks
You mean senior programmers that have been there for ages don't want to spend their time reviewing AI slop? Who'd a thunk it!
petterroea 9 hours ago [-]
I feel bad for the seniors who have to take on this workload. The general pattern I am seeing is that seniors at "AI-first" companies are being held back from doing their work by reviewing junior PRs, who are now able to ship much more code they don't understand the badness of.
Mentoring Juniors is an important part of the job and crucial service to the industry, but juniors equipped with LLMs make the deal a bit more sour. Anecdotally, they don't really remember the feedback as well, because they weren't involved in writing the code. Its burnout-inducing to see your hard work and feedback go in one ear and out another.
I personally know people looking to jump ship because they waste too much time at their current employer on this.
wiseowise 4 hours ago [-]
> Mentoring Juniors is an important part of the job and crucial service to the industry
Not really.
ritlo 20 hours ago [-]
The only way to see the kinds of speed-up companies want from these things, right now, is to do way too little review. I think we're going to see a lot of failures in a lot of sectors where companies set goals for reduced hours on various things they do, based on what they expected from LLM speed-ups, and it will have turned out the only way to hit those goals was by spending way too little time reviewing LLM output.
They're torn between "we want to fire 80% of you" and "... but if we don't give up quality/reliability, LLMs only save a little time, not a ton, so we can only fire like 5% of you max".
(It's the same in writing, these things are only a huge speed-up if it's OK for the output to be low-quality, but good output using LLMs only saves a little time versus writing entirely by-hand—so far, anyway, of course these systems are changing by the day, but this specific limitation has remained true for about four years now, without much improvement)
SoftTalker 19 hours ago [-]
So will it turn out that actually writing code was never the time sink in the first place?
That has always been my feeling. Once I really understand what I need to implement, the code is the easy part. Sure it takes some time, but it's not the majority. And for me, actually writing the code will often trigger some additional insight or awareness of edge cases that I hadn't considered.
8note 12 hours ago [-]
At least with my experience at amazon it wasnt.
if i wanted, i could queue up weeks worth of review in a couple days, but that's not getting the whole team more productive.
Spending more time on documents and chatting proved much more useful for getting more output overall.
Even without LLMs ive been nearby and on teams where review burden from developers building away team code was already so high that youd need to bake an extra month into your estimates for getting somebody to actually look.
hard24 19 hours ago [-]
"So will it turn out that actually writing code was never the time sink in the first place?"
Of course it wasn't! Do you think people can envision the right objects to produce all the time? Yeah.. we have a lot of Steve Jobs walking around lol.
As you say, there's 'other stuff' that happens naturally during the production process that add value.
somewhereoutth 13 hours ago [-]
> actually writing the code will often trigger some additional insight or awareness of edge cases that I hadn't considered.
Thinking through making.
hard24 20 hours ago [-]
My prediction is a concorde-like incident is going to shatter trust and make people re-think their expectations of the capabilities of LLMs and their abilities of the present.
Essentially something big has to happen that affects the revenue/trust of a large provider of goods, stemming from LLM-use.
They wont go away entirely. But this idea that they can displace engineers at a high-rate will.
Terr_ 19 hours ago [-]
Assuming you mean this crash [0], it reads to me more like a confluence of bad events versus a big fundamental design flaw in the THERAC-25 mold.
I feel the current proliferation of LLMs is going to resemble asbestos problem: Cheap miracle thingy, overused in several places, with slow gradual regret and chronic harms/costs. Although I suppose the "undocumented nasty surprise" aspect would depend on adoption of local LLMs. If it's a monthly subscription to cloud-stuff, people are far less-likely to lose track of where the systems are and what they're doing.
Like bombing a building full of little kids? Oops too late...
827a 11 hours ago [-]
> Company that lays-off 20% of its staff every year in an attempt to "reduce inefficiency" and "remain agile in the adoption of new technologies and workflows" finds they cannot run a stable service, have more inefficiency than ever, and have also failed to establish leadership in the adoption of any new technologies or workflows. They plan to solve these problems by introducing more inefficiency (making your most expensive employees review the work of others).
We love this for Amazon, they're a very strong company making bold decisions.
lokar 20 hours ago [-]
If this is true, it misunderstands the primary goals of code review.
Code review should not be (primarily) about catching serious errors. If there are always a lot of errors, you can’t catch most of them with review. If there are few it’s not the best use of time.
The goal is to ensure the team is in sync on design, standards, etc. To train and educate Jr engineers, to spread understanding of the system. To bring more points of view to complex and important decisions.
These goals help you reduce the number of errors going into the review process, this should be the actual goal.
rossdavidh 15 hours ago [-]
As Deming once said in regard to manufacturing inspections: "Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product."
The fact that software is "soft" makes it seem like this doesn't apply, but it does, not least because of the fact that once you have gone down the wrong path with software design, it is very difficult to pull back and realize you need to go down an entirely different one.
lokar 12 hours ago [-]
I agree, but it's worse. Even a "simple" coding error (so, no long term arch issues) is a problem, if the review that catches it does not educate the author.
The analogy to manufacturing would be something like if the parts coming out a machine are all bad, just sending them to re-work is not a solution, you need to re-calibrate the machine.
paxys 12 hours ago [-]
Someone should teach the decision makers how pipelines work. If AI-created diffs are being churned out at 10x the previous rate but manual reviews are the bottleneck then the overall system is producing at the exact same rate as before. The only thing you have added is cost, uncertainty and engineers being less familiar with the system.
The next thing these geniuses will think of will be to have AI review the diffs.
rglover 14 hours ago [-]
The amount of time and money being wasted chasing this dragon is unreal.
dana321 14 hours ago [-]
Its useful but it makes wrong assumptions, not checking the code is essentially gambling.
alexyz12 14 hours ago [-]
there's nothing else left to chase apparently
Lalabadie 20 hours ago [-]
I'm not sure the sustainable solution is to treat an excess of lower-quality code output as the fixed thing to work with, and operationalize around that, but sure.
gtowey 20 hours ago [-]
It's the same as the offshoring episode of the early 2000's. There is such a massive financial incentive to somehow make the low quality code work. And they will try to resist the reality that it's a huge net negative for as long as they can.
ndr42 22 hours ago [-]
I think the problem of responsibility will come for many more companies sooner than later. It is possible that some of the alleged efficacy gains by using ai are not so big anymore when someone has to be accountable for it.
sethops1 20 hours ago [-]
> The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off.
So basically, kill the productivity of senior engineers, kill the ability for junior engineers to learn anything, and ensure those senior engineers hate their jobs.
Bold move, we'll see how that goes.
whateveracct 20 hours ago [-]
Juniors could just code things the old fashioned way. It isn't hard. And if they do find it too hard, they aren't cut out for this job.
sdevonoes 19 hours ago [-]
But aren’t companies enforcing AI usage? If noy, wait for it
ritlo 19 hours ago [-]
Mine's tracking it complete with a leaderboard (LOL) and it's been suggested to me that it'd be in my best interest not to be too low on that list, so I suspect in the back half of the year some sterner conversations and/or pink-slips are going to be coming the way of those who've not caught on that they need to at least be sending some make-work crap to their LLMs every day, even if they immediately throw the output in the metaphorical garbage bin.
It's basically an even-more-ridiculous version of ranking programmers by lines-of-code/week.
What's especially comical is I've seen enormous gains in my (longish, at this point) career from learning other tools (e.g. expanding my familiarity with Unix or otherwise fairly common command line tools) and never, ever has anyone measured how much I'm using them, and never, ever has management become in any way involved in pushing them on me. It's like the CEO coming down to tell everyone they'll be making sure all the programmers are using regular expressions enough, and tracking time spent engaging with regular expressions, or they'll be counting how many breakpoints they're setting in their debuggers per week. WTF? That kind of thing should be leads' and seniors' business, to spread and encourage knowledge and appropriate tool use among themselves and with juniors, to the degree it should be anyone's business. Seems like yet another smell indicating that this whole LLM boom is built on shaky ground.
tavavex 18 hours ago [-]
> It's like the CEO coming down to tell everyone they'll be making sure all the programmers are using regular expressions enough, and tracking time spent engaging with regular expressions, or they'll be counting how many breakpoints they're setting in their debuggers per week.
That's because they weren't sold regex as as service by a massive company, while also being reassured by everyone that any person not using at least one regular expression per line of code is effectively worthless and exposes their business to a threat of immediate obsolescence and destruction. They finally found a way to sell the same kind of FOMO to a majority of execs in the software industry.
catlifeonmars 8 hours ago [-]
Vibe code a side project at work. I’m willing to bet the tools aren’t mapping the code contribution locations to business impact (hard problem).
to11mtm 19 hours ago [-]
> even if they immediately throw the output in the metaphorical garbage bin.
Gotta be careful if you do that tho; e.x. Copilot can monitor 'accept' rate, so at bare minimum you'd have to accept the changes than immediately back them out...
tavavex 18 hours ago [-]
In a couple years, we'll have office workspaces equipped with EEG helmets that you must wear while working, to measure your sentiment upon seeing LLM-generated code. The worst performers get the boot, so you better be happy!
ourmandave 18 hours ago [-]
I wonder if Copilot can write a commit and backout routine for them.
lovich 18 hours ago [-]
If you use AI to back it out, sounds like you’ve found an infinite feedback loop for those metrics.
Did industrial psychology die out as a field? Why do we keep reinventing the wheel when it comes to perverse incentives. It’s like working on a team working with scrum where the big bosses expect the average velocity to go up every sprint, forever, but the engineers are the ones deciding the point totals on tickets.
bonesss 18 hours ago [-]
From a management perspective I would be highly skeptics of token leaderboards. You are incentivizing people to piss away company money with uncertain rewards.
I mean… throw some docs into the context window, see it explode. Repeat that a few times with some multi-step workflows. Presto, hundreds of dollars in “AI” spending accomplishing nothing. In olden days we’d just burn the cash in a waste paper basket.
tren_hard 16 hours ago [-]
My company doesn’t enforce AI usage but for those who choose to use it, every month they highlight the biggest users. It’s always non-tech people who absolutely don’t understand how LLMs work and just run a single chat for as long as possible before our system cuts them off and forces them into a new chat context.
dboreham 15 hours ago [-]
"Can't fix stupid"
slopinthebag 18 hours ago [-]
What's stopping someone from just having the AI churn out garbage all day long? Or like, put your AI into plan mode with extra high reasoning and have it churn for 10 minutes to make a microscopic change in some source file. Repeat ad infinium.
baal80spam 15 hours ago [-]
> What's stopping someone from just having the AI churn out garbage all day long?
In my case it's morality.
bravetraveler 13 hours ago [-]
Interesting consideration, 'mandates' and all. Definitely in camp 'toss the output', here. I think I'll see 'morality' leaving when $EMPLOYER fires 'professional discretion'... forcing usage and, ultimately, debasing the position.
edit: Peer said it well, IMO. The consequences aren't really yours. Also: something, something, Goodhart's Law.
ummonk 12 hours ago [-]
I would argue that making the company experience the consequences of its choice of metrics / mandates is in fact a moral imperative.
12 hours ago [-]
throw_m239339 19 hours ago [-]
Aren't these companies mandating the use of these tools at first place? Juniors aren't the problem.
thewhitetulip 20 hours ago [-]
Well, not when they are mandated to use AI tools and asked for justification about their usage!
I am saying in General, I've never worked in Amazon
dragonelite 20 hours ago [-]
Accelerate a person speed toward being burned out..
altairprime 20 hours ago [-]
..and you lower overall engineering salary spend by rotating out seniority-paid engineers for newly-promoted AI reviewers with lower specs
dude250711 15 hours ago [-]
But Amazon is something you tolerate for a year or two early in the career, before moving somewhere better (which is anywhere else)?
almostdeadguy 20 hours ago [-]
I'm sorry what? Junior engineers can't learn anything without using AI assistants (or is the implication that having seniors review their code makes them incapable of learning?) and senior engineer would hate their jobs reviewing more code from their teammates? What reality do people live in now?
zdragnar 20 hours ago [-]
I thought the implication was that juniors would continue to use AI to stay "productive" (AWS is not a rest and vest job for juniors, from what I've heard) and seniors would no longer have time to do anything but review code from juniors who just spin the AI wheel.
There's a lot of learning opportunity in failing, but if failure just means spam the AI button with a new prompt, there's not much learning to be had.
ritlo 20 hours ago [-]
> senior engineer would hate their jobs reviewing more code from their teammates
Jesus, yes. Maybe I'm an oddball but there's a limit to how much PR reviewing I could do per week and stay sane. It's not terribly high, either. I'd say like 5 hours per week max, and no more than one hour per half-workday, before my eyes glaze over and my reviews become useless.
Reviewing code is important and is part of the job but if you're asking me to spend far more of my time on it, and across (presumably) a wider set of projects or sections of projects so I've got more context-switching to figure out WTF I'm even looking at, yes, I would hate my job by the end of day 1 of that.
almostdeadguy 18 hours ago [-]
If we can't spend that much time reviewing code, what are we exactly doing with this AI stuff?
I don't disagree, I think reviewing is laborious, I just don't see how this causes any unintended consequences that aren't effectively baked into using an AI assistant.
bluefirebrand 13 hours ago [-]
Yes, this is part of why AI tools are bad
Code Review is hard and tiring, much moreso than writing it
I've never met anyone who would be okay reviewing code for their full time job
mentos 14 hours ago [-]
What are we going to do about software for critical infrastructure in the coming decade?
Feels inevitable that code for aviation will slowly rot from the same forces at play but with lethal results.
rhubarbtree 13 hours ago [-]
It will be written much the same as now, but using AI to improve quality through code inspection etc.
Just because nearly all software is going to be written by AI, does not mean critical infrastructure will be.
bandrami 8 hours ago [-]
At some point people are going to start asking why software that doesn't need to be right should exist in the first place
rhubarbtree 4 hours ago [-]
I’m not sure I follow, but I don’t hear anyone asking those questions of the bad software we already have today.
3 hours ago [-]
12 hours ago [-]
AlotOfReading 20 hours ago [-]
I'm not surprised by the outages, but I am surprised that they're leaning into human code review as a solution rather than a neverending succession of LLM PR reviewers.
I wonder if it's an early step towards an apprenticeship system.
monarchwadia 20 hours ago [-]
Interesting. How would it be an early step towards an apprenticeship system?
bilbo0s 20 hours ago [-]
You shouldn't be surprised.
How else would they train the LLM PR reviewers to their standards?
I've never personally been in the position, because my entire career has been in startups, but I've had many friends be in the unenviable position of training their replacements. Here's the thing though, at least they knew they were training their replacements. We could be looking at a potential future where an employee or contractor doesn't realize s/he is actually just hired to generate training data for an LLM to replace them, and then be cut.
AlexeyBrin 22 hours ago [-]
I wonder how this will work in practice. Say I'm a senior engineer and I produce myself thousands of lines of code per day with the help of LLMs as mandated by the company. I still need to presumably read and test the code that I push to production. When will I have time to read and evaluate similar amounts of code produced by a junior or a mid level engineer ?
hrmtst93837 15 hours ago [-]
Sign-off requirements like this quickly become performative when LLMs generate code faster than anyone can review it in detail. Relying on human oversight at scale is unrealistic unless the volume of changes drops or the review process itself becomes more automated.
jamiemallers 3 hours ago [-]
[dead]
quantified 22 hours ago [-]
This is an important bottleneck. You can have LLM-based reviewers help you. But unless you yourself understood your thousands of lines, it's "somebody else's" code and that somebody else cannot be fired or taken to court.
The presumably human mid-level or junior engineer has their own issues with this, but the point of the LLM is that you don't need that engineer. For productivity purposes, the dev org only needs the seniors to wrangle all the LLMs they can. That doesn't sustain, so a couple of more-junior engineers can do similar work to mature.
MichaelRo 16 hours ago [-]
>> I wonder how this will work in practice. Say I'm a senior engineer and I produce myself thousands of lines of code per day with the help of LLMs as mandated by the company.
LOL, it's the age old "responsibility without authority". The pressure to use AI will increase and basically you'll be fired for not using it. Simultaneously with the pressure to take the blame when AI fucks up and you can't keep up with the bullshit, leading you to get fired. One way or the other, get some training on how to stack shelves at the supermarket because that's how our future looks, one way or the other.
captainkrtek 11 hours ago [-]
One challenge with code review as an antidote to poor quality gen-AI code, is that we largely see only the code itself, not the process or inputs.
In the pre-gen-AI days, if an engineer put up a PR, it implied (somewhat) they wrote their code, reviewed it implicitly as they wrote it, and made choices (ie: why is this the best approach).
If Claude is just the new high level programming language, in terms of prompting in natural language, the challenge is that we're not reviewing the natural language, we're reviewing the machine code without knowing what the inputs were. I'm not sure of a solution to this, but something along the lines of knowing the history of the prompting that ultimately led to the PR, the time/tokens involved, etc. may inform the "quality" or "effort" spent in producing the PR. A one-shotted feature vs. a multi-iteration feature may produce the same lines of code and general shape, but one is likely to be higher "quality" in terms of minimal defects.
Along the same lines, when I review some gen-AI produced PR, it feels like I'm reading assembly and having to reverse how we got here. It may be code that runs and is perfectly fine, but I can't tell what the higher level inputs were that produced it, and if they were sufficient.
zcw100 18 hours ago [-]
I just met a guy from Amazon this past weekend who was bragging, "We've got unlimited access to LLMs and our developers have 10 agents going at a time.". I tried telling him it wasn't all unicorns and rainbows but I didn't get the impression he cared and just kept crapping out skittles.
agoodusername63 13 hours ago [-]
Because he doesn’t.
The impression I get from SWEs I’ve met throughout my life is that most of them don’t actually care about their job. They got in because it paid well and demand was plentiful.
booleandilemma 12 hours ago [-]
It's rare when I meet one who actually likes writing code nowadays. At my last company everyone was trying to be an architect.
fmajid 14 hours ago [-]
If they do not also increase the senior devs’ allotted time for code reviews to make up for the increased volume of changes due to increased productivity of the junior to mid level devs, or hire more seniors, this will just lead to burnout (on top of Amazon’s already high levels) and scapegoating seniors for having waved through a change because they materially can’t review them fast enough.
8note 12 hours ago [-]
Id maybe consider spinning out a different job role for like "review engineer" who's not busy making strategic decisions and near term planning, just making sure the code is actually good
still within the engineering IC role, but on a different track
VorpalWay 13 hours ago [-]
I'm bewildered by Amazon here. I would assume every change require code review by another enigneer already, as is standard practice in the industry I work in (industrial equipment). Is the change just that it has to be a senior engineer specifically, rather than any engineer? Or did Amazon really not have mandatory code review before?
shepherdjerred 13 hours ago [-]
Code review is mandatory at Amazon
julienchastang 18 hours ago [-]
> best practices and safeguards are not yet fully established
The way I am working with AI agents (codex) these days is have the AI generate a spec in a series of MD documents where the AI implementation of each document is a bite sized chunk that can be tested and evaluated by the human before moving to the next step and roughly matches a commit in version control. The version control history reflects the logical progression of the code. In this manner, I have a decent knowledge of the code, and one that I am more comfortable with than one-shotting.
danjl 14 hours ago [-]
Yes, more time on up front spec and plan building. Bite sized specifically to fit within the context window of a single implementation session. Each step should have a verification process that includes new tests.
Prior to each step, I prompt the AI to review the step and ask clarifying questions to fill any missing details. Then implement. Then prompt the AI after to review the changes for any fixes before moving on to the next step. Rinse, repeat.
The specs and plans are actually better for sharing context with the rest of the team than a traditional review process.
I find the code generated by this process to be better in general than the code I've generated over my previous 35+ years of coding. More robust, more complete, better tested. I used to "rush" through this process before, with less upfront planning, and more of a focus on getting a working scaffold up and running as fast as possible, with each step along the way implemented a bit quicker and less robustly, with the assumption I'd return to fix up the corner cases later.
sizzzzlerz 13 hours ago [-]
Who fixes code that gets rejected? Do you simply try again and hope or does someone go into this computer-generated code that they didn't write and do the equivalent of battlefield triage?
And what are they going to do when they've fired all the senior engineers because they make too much money, leaving just juniors and AI?
rhubarbtree 13 hours ago [-]
AI will fix it, same way AI wrote it. At the behest of a human.
When they fire everyone, juniors will fix it with AI.
This is in general. I wouldn’t recommend this at critical services like AWS.
sizzzzlerz 13 hours ago [-]
Or in airplanes, nuclear power plants, spacecraft, CAT scanners, ECGs, traffic control systems, navigation devices, warehouse management systems, banking. Feel free to add your own.
rhubarbtree 12 hours ago [-]
Doubt warehouse management systems will fit in there, and only critical systems at banking.
But yes agree with the rest, which probably makes up a tiny tiny fraction of the software created today, and will be orders of magnitude smaller as a fraction in the future.
throwaway613746 13 hours ago [-]
[dead]
throwaw12 19 hours ago [-]
If Seniors are going to review every GenAI generated code, how do they keep up with the volume of changes?
So you have 2 systems of engineers: Sr- and Sr+
1. Both should write code to justify their work and impact
2. Sr- code must be reviewed by Sr+
What happens:
a. Sr+ output drops because review takes their time more and more
b. Sr+ just blindly accepts because of the volume is too high, and they should also do their own work
c. Sr+ asks Sr- to slow-down, then Sr- can get bad reviews for the output, because on average Sr+ will produce more code
I think (b) will happen
daxfohl 14 hours ago [-]
You could create an agent template for each incident you've ever had, with context pre-cached with the postmortem report, full code change, and any other information about the incident. Then for every new PR you could clone agents from all those templates and ask whether the PR could cause something similar to the pre-loaded incident. If any of them say yes, reject the PR unless there's a manual override. You'd never have a repeat incident.
Obviously it's probably cost-prohibitive to do an all to all analysis for every PR, but I imagine with some intelligent optimizations around likelihood and similarity analysis something along those lines would be possible and practical.
8note 12 hours ago [-]
code review is too late to give some of that feedback, and design/requirements documents dont have nearly the standardization of presentation and feedback tools for that to be useable.
Amazon does have those things, and has fine tuning on models based on those postmortems.
Noisy reviews are also a problem causer. the PR doesnt know what scale a chunk of code is running at, without having access to 20 more packages and other details.
iLoveOncall 13 hours ago [-]
You vastly underestimate the complexity of systems in a company like Amazon.
COEs and Operation Readiness Reviews are already the documents that you mention, but they are largely useless in preventing incidents.
senderista 6 hours ago [-]
Actually CoEs would make an amazing training corpus for code reviews.
LogicFailsMe 19 hours ago [-]
For the good of the company's future, all code should be reviewed by L10s going forward before they are accepted. They're the only ones with enough skin in the game to know what really matters after all.
And from their sagely reviews, we shall train a large language model to ultimately replace them because the most fungible thing at Amazon is the leadership.
tracerbulletx 9 hours ago [-]
The way we used to build confidence in what we shipped was by beating our head against it for a week figuring it all out. You really can't have the same confidence with code reviews unless you basically do the same work you'd do to write it by hand for a lot of these things.
Insanity 19 hours ago [-]
It's only going to get worse with the brain drain as a result of the layoffs. Which will increase the use of AI assisted coding and increase the number of outages related to this.
Imagine having to debug code that caused an outage when 80% is written by an LLM and you now have to start actually figuring out the codebase at 2am.. :)
8note 12 hours ago [-]
but thats what it was like when i started at amazon in 2016?
i think the team i was on was a bit of an outlier in terms of owning 40 dumptser fires at once, and the first time reading any one of them was at 2AM because it was down.
having an LLM give early passes on reading the godawful c++ code that you can tell at a glance that its not gonna work as expected, but you cant tell why, or what expected actually is would have been phenomenal, and gotten me back to sleep at 3 on those codebases rather than 5.
PessimalDecimal 6 hours ago [-]
That's what it was like when you started out, but did you eventually learn that code? Imagine constantly getting out back into square one on understanding a legacy code base you just inherited, forever. This is what it's be like with constant LLM-induced churn on code repositories.
tcbrah 18 hours ago [-]
the funniest part is amazon literally started tying AI usage to performance reviews like 6 months ago and now theyre doing damage control. you cant simultaneously pressure every engineer to use more AI AND be shocked when AI-assisted code breaks prod. pick one lol
PessimalDecimal 6 hours ago [-]
Why can't they?
vetrom 8 hours ago [-]
AI seems to be the whipping boy, but to me, it really seems more of a symptom than a cause. At its root, isn't this an issue of a decline in critical thinking?
I do think AI adoption exacerbates said falloff.
mhogers 18 hours ago [-]
.agentignore/.agentnotallowed file
force agents to not touch mission critical things, fail in CI otherwise
let it work on frontends and things at the frontier of the dependency tree, where it is worth the risk
readthemanual 18 hours ago [-]
a) what happens if there is change that hasn't been encountered yet so it's not in .agentnotallowed?
b) is there a guarantee that something described in these files won't be touched? I've seen examples when agents directly violate these rules, profusely apologising after they get caught on it.
mhogers 15 hours ago [-]
allowlist instead of denylist, depending on your risk profile :)
dragonelite 20 hours ago [-]
Expect a shitload of AI powered code review products the next 18 months.
daheza 19 hours ago [-]
Create the problem and then create the solution.
0x500x79 18 hours ago [-]
Sell the solution. The Claude code review system is 15-25 dollars per-review!
recursive 16 hours ago [-]
The TSA Pre-check monetization model
gdulli 19 hours ago [-]
"Why don't they just make the plane out of the black box?"
wiseowise 4 hours ago [-]
“You wouldn’t write a code by hand, would you?”
booleandilemma 12 hours ago [-]
Emphasis on shit.
hard24 19 hours ago [-]
This is incredibly circular lol...
AlexeyBrin 20 hours ago [-]
You mean like what Anthropic announced yesterday ? Code Review can review your code for $15 - $25 per review.
/s
So now, you can speed up using Claude Code and use Code Review to keep it in check.
sailfast 13 hours ago [-]
I anticipate they will fix this by adding better AI evaluation tools that work better to test their infra and changes.
In the meantime they will be quite a bit slower I’d imagine.
Also wonder if those seniors will ever get to actually do any engineering themselves now that they’re the bottleneck. :)
nickvec 9 hours ago [-]
So Amazon senior SWEs now have to review every single PR for all intents and purposes? I didn't think Amazon could get worse.
butILoveLife 19 hours ago [-]
Maybe its my 1 buddy that works at amazon, but they seemed extremely slow to adopt LLMs. Big ships take a long time to turn, but this seemed hostile.
I am seeing this mindset still, with AI Agents. I imagine they will slowly realize they need to use this stuff to be competitive, but being slow to adopt AI seems like it could have been the source of this.
lmc 18 hours ago [-]
LLMs have been garbage for real work until very recently. Doesn't this show they were adopted too soon at amazon?
bigstrat2003 13 hours ago [-]
They're still garbage for real work.
butILoveLife 16 hours ago [-]
Disagree, I've been using it for at least a year to write functions.
newobj 16 hours ago [-]
Speed of code-writing was never the issue at Amazon or AWS. It was always wrong-headed strategic directions, out to lunch PMs, dogshit testing environment, stakeholder soup, high turnover, bureaucracy, a pantheon of legacy systems, insane operational burdens, garbage tooling, and last but not last -- designing for inter-system failure modes, which let's be real, AI has no chance of having context for -- and so on...
Imagine if the #1 problem of your woodworking shop is staff injuries, and the solution that management foists on you is higher RPM lathes.
joeyguerra 10 hours ago [-]
I’m wonder how many sr engineers are going to quit because they don’t want to read a bunch of code?
kmg_finfolio 22 hours ago [-]
The accountability problem is real but I think it's slightly different from what's being described. The issue isn't just "who signs off"; it's that the reasoning behind a change becomes invisible when AI generates it. A senior engineer can approve output they don't fully understand, and six months later when something breaks, nobody can reconstruct why that decision was made.
Human review works when the reviewer can actually interrogate the logic. At LLM-assisted velocity, that bar gets harder to clear every month.
smy20011 19 hours ago [-]
An outage could cost Amazon ~millions to tens of millions. Most of the time, we want the junior to learn from the outage and fix the process. With AI agent, we can only update the agent.md and hope it will never happen again.
znpy 14 hours ago [-]
"Make senior engineer sign off ai-assisted changes" sounds incredibly weird.
First thing that comes to mind is: reminds me of those movie where some dictatorship starts to crumble and the dictator start being tougher and tougher on generals, not realizing the whole endeavor is doomed, not just the current implementation.
Then again, as a former amazon (aws) engineer: this is just not going to work. Depending how you define "senior engineer" (L5? L6? L7?) this is less and less feasible.
L5 engineers are already supposed to work pretty much autonomously, maybe with L6 sign-off when changes are a bit large in scope.
L6 engineers already have their own load of work, and a fairly large amount of engineers "under" them (anywhere from 5 to 8). Properly reviewing changes from all them, and taking responsibility for that, is going to be very taxing on such people.
L7 engineers work across teams and they might have anywhere from 12 to 30 engineers (L4/5/6) "under" them (or more). They are already scarce in number and they already pretty much mostly do reviews (which is proving not sufficient, it seems). Mandating sign-off and mandating assumption of responsibility for breaking changes means these people basically only do reviews and will be stricter and stricter[1] with engineers under them.
L8 engineers, they barely do any engineering at all, from what I remember. They mostly review design documents, in my experience not always expressing sound opinions or having proper understanding of the issues being handled.
In all this, considering the low morale (layoffs), the reduced headcount (layoffs) and the rise in expectations (engineers trying harder to stay afloat[2] due to... layoffs)... It's a dire situation.
I'm going to tell you, this stinks A LOT like rotting day 2 mindset.
----
1. keep in mind you can't, in general, determine the absence of bugs
2. Also cranking out WAY MUCH MORE code due to having gen-ai tools at their fingertips...
andai 13 hours ago [-]
So the take-away here is maybe we should read the code that "we" wrote? :)
(Before injecting it into global infra...)
dedoussis 18 hours ago [-]
How do they determine whether a PR is AI-assisted and therefore requires senior review? A junior engineer could still copy-paste AI-generated code and claim it as their own.
emotiveengine 18 hours ago [-]
Right? If they're using some sort of tool, there's always another tool to fool the tool.
monster_truck 14 hours ago [-]
Have they tried simply not writing bugs? I've found that works best for me personally
zouhair 11 hours ago [-]
Some years of AI Technical Debt will be something to behold.
varenc 13 hours ago [-]
digression: the long twitter urls make this entire page wider and the text smaller on iOS for me. Feels like a minor bug. Maybe a `overflow-wrap: anywhere` CSS rules needs to be added to URLs.
mattschaller 20 hours ago [-]
Anyone work with Kiro before? As I understood, it was held as an INTERNAL USE ONLY tool for much longer than expected.
daheza 19 hours ago [-]
I used Kiro IDE and really liked it. The all you can eat model of LLM usage is very tempting compared to say Cursor. The features in the editor are basically the same.
Haven't tried Kiro CLI.
riknos314 15 hours ago [-]
The technique of creating specs before implementation that Kiro embodies was used widely internally before Kiro's release, but as a (now former) employee I gained access to the Kiro tool at the same time as the public. Others may have had internal access earlier but I'm not aware of them.
wenc 11 hours ago [-]
I use Kiro IDE (≠ Kiro CLI) primarily as a spec generator.
In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.
I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.
Kiro writes specs using structured formats like EARS and INCOSE. It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.
Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests.
Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).
Kiro didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.
jwpapi 11 hours ago [-]
How much of damage is 6 hours offline for Amazon?
bigbuppo 20 hours ago [-]
Ugh. The Great Oops has never been closer.
dude250711 20 hours ago [-]
I knew this would happen.
Take a perfectly productive senior developer and instead make him be responsible for output of a bunch of AI juniors with the expectation of 10x output.
frogperson 20 hours ago [-]
makes me want to vomit. I am not spending more time reviewing code than the "author" spent creating it. Ill just leave the industry if that happens.
hard24 20 hours ago [-]
I think as long as having to review code stays around, the 'artistry' of writing code isn't going away.
Think about it - how do you increase the speed at which one can review code? Well first it must be attractive to look at - the more attractive the faster you review/understand and move through the review. Now this won't be the case everywhere - e.g. in outsourced regions the conditions will force people to operate a certain way.
Im not a SWE by trade, I just try to look at things from a pragmatic stand-point of how org's actually make incremental progress faster.
bombdailer 13 hours ago [-]
The better looking the code, the less effort people will put into reviewing it due to the ease of reading it - the assumption being that what is beautiful is good. Just as a beautiful facade of a building can hide a cheap structure behind it, the same is true with code. Beauty itself is not a good signal for goodness as in excess it is in effect a rhetoric device that aims to mislead and draw ones eyes towards itself and away from what lies beneath it.
A beautiful building is only as good as the correctness of its foundation, framework, materials, and construction. Those qualities can only be assessed by those with expertise enough to understand their importance. Beauty in its proper place is the output of the intersection between a craftsman and a engineer. Beauty is optional, but it makes life more worth living. The same is true for code - attractive code is optional, but it makes being a SWE more rewarding.
jmspring 12 hours ago [-]
"After outages due to outsourcing the economically convenient developers with no skin in what your building or care, company X requires all senior engineers to review all code from outsourcing company".
dlev_pika 18 hours ago [-]
A few days ago, after some very weird failed purchase attempts I made (payment couldn’t be validated or
Smth) I received an even weirder mail from Amazon saying they had detected suspicious activity, all my devices got logged out and I was forced to change my password. I did it, after verifying it was a legit email (even if it looked sketchy af, pure text, unstyled, but sender verified and confirmed with in-app behavior), and next I know all my orders and browsing history had disappeared - +15 yrs of history, done.
Over the next few days my account history came back, except purchases made Q1 2026. Those are still missing. There are a few substantial purchases I made that are nowhere to be found anymore.
I attributed this Iranian missiles hitting some of their infrastructure in EU, as it had been reported.
Now I am not sure if it was blast radius from missiles or AI mishaps. Lmao - couldn’t happen to a worse company…
qxxx 6 hours ago [-]
so, seniors will review now the AI slop code.. I am also doing this task and reviewing this kind of code takes time as the code is often overengineered. Code works but will have potential bugs. I am not able to find every bug or implication quickly. But I am also using ai to review the ai slop lol, because why now. After that I am also quickly reviewing by myself.
skeledrew 20 hours ago [-]
> the affected tool served customers in mainland China
Thought this blurb most interesting. What's the between-lines subtext here? Are they deliberately serving something they know to be faulty to the Chinese? Or is it the case that the Chinese use it with little to no issue/complaint? Or...?
testbjjl 13 hours ago [-]
Then what’s the point of AI? Pay for the code gen, pay a human to review the code gen, when the senior can train a junior and coordinate output with their incentives and performance reviews, problems largely solved.
Seems to me too low level in everyone’s stack to not have humans doing the work, especially at this stage. But what do I know, I certainly am not at the helm of a multibillion dollar operation.
booleandilemma 12 hours ago [-]
To make the AI companies money and prop up the American economy, obviously.
m3kw9 14 hours ago [-]
A year later, they will require AI to sign off engineer changes.
xodn348 13 hours ago [-]
very expected outcomes.
oxqbldpxo 20 hours ago [-]
Not fun to work at amazon.com it seems.
rvz 16 hours ago [-]
Hope this happens at GitHub since there are constant outages on the entire platform.
CodingJeebus 22 hours ago [-]
I'm at a small company struggling with this problem. Fundamentally, we have a limited context and AI is capable of generating tremendous amounts of output that exceed our ability to deeply process.
I find myself context-switching all the time and it's pretty exhausting, while also finding that I'm not retaining as much deep application domain knowledge as I used to.
On the surface, it's nice that I can give my LLM a well-written bug ticket and let it loose since it does a good job most of the time. But when it doesn't do a good job or it's making a change in an area of the codebase I'm not familiar with, auditing the change gets tiring really fast.
10xDev 18 hours ago [-]
With AI it makes sense to have leaner teams. Being able to go faster requires greater responsibility.
letitgo12345 19 hours ago [-]
Worth noting that this is when they used Amazon's own AI product, not when using Claude Code or Codex.
th2o34i3432897 19 hours ago [-]
First Microsoft and now Amazon (eg. their RufusAI is useless compared to the old comment search!)
Has Seattle now become the code-slop capital ? Or is SFO still on top ?
fud101 10 hours ago [-]
This is what humans will become, on call to take the blame for AI. It will be less about skill and confidence and more about being on the hook to take the fall for when things go wrong.
MDGeist 20 hours ago [-]
A former colleague of mine recently took a role that has largely turned out to be "greybeard that reviews the AI slop of the junior engineers". In theory it sounds workable, but the volume of slop makes thoughtful review impossible. Seems like most orgs will just put pressure on the slop generators to do more and put pressure on the approvers and then scape goat the slop approvers if necessary?
dboreham 15 hours ago [-]
This was always the case in the before times with humans. AI just pulls back the curtain on the delusion that code can and is being meaningful reviewed.
locopati 11 hours ago [-]
use AI!
no! not that way!
moomoo11 12 hours ago [-]
If you don't use ~crypto~ AI you will go broke!
teeray 15 hours ago [-]
> Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off
So what incentive is there for juniors to look at the code at all? Seniors are now just another CI stage for their slop to pass.
luxuryballs 13 hours ago [-]
They weren’t already signing off on them? o.O
softwaredoug 14 hours ago [-]
Getting junior / mid-level people to slop cannon PRs at seniors will just burn out seniors. The team might be better having fewer developers using AI more thoughtfully.
secondcoming 14 hours ago [-]
Has Amazon's advertising TAM product been affected by AI?
rushabh 13 hours ago [-]
Unfortunately you can’t just yell at the AI so it learns never to do this again. Humans take such a large range of feedback that LLMs can’t.
I'm sure they are going to have a ball reading through thousands of lines of AI slop.
AlexandrB 19 hours ago [-]
"We want you to use AI for everything!"
"No, not like that though!"
fredgrott 19 hours ago [-]
Curious question, how many Amazon Engineers flunk basic CS?
If you know CS you know two things:
1. AI can not judge code either noise or signal, AI cannot tell.
2. CS-wise we use statistic analysis to judge good code from bad.
How much time does it take to take AI output and run the basic statistic tools for most computer languages?
Some juniors need firing outright
8note 12 hours ago [-]
that isnt computer science at all though? if it is, that would be phd research topics, rather than than basics.
maybe as software engineering topics, but thats a different discipline
wiseowise 4 hours ago [-]
“Haha, sure man. CSI, great tv series. Look, just review Jeff’s PR and move that button down, okay? Have a tennis lesson, kbye”
camillomiller 9 hours ago [-]
Such fun.
On top of your already strapped schedule, now you have to bet your career on vibe code that you will now have to spend time reading and debugging. All that instead of a chain of accountability that has people in place instead of stupid bots with fake agency.
This is beyond corporate satire.
There was never before a technology capable of convincing leadership of its usefulness despite its constant blunders and despite the low quality of its output.
This feels like a corporate mass delusion of unprecedented scale.
recallingmemory 13 hours ago [-]
.. So our jobs aren't going away?
jacknews 11 hours ago [-]
This looks like a blame allocation exercise to me.
The seniors will now be directly responsible for all the AI slop that goes in. But how can they possibly properly review reams of code to a sufficient degree they can personally vouch for it?
desireco42 13 hours ago [-]
So essentially they will be blamed, everything will stay the same.
I do consulting and use AI a lot. You just have to take responsibility for the code. We are delivering like never before, but have a lot of experience into how to do it as safe as possible. And we are learning along the way. They say you need a year to build up experience fyi.
I feel bad for those engineers who will have to sign off for things they will most likely not have enough time to review. Kiro is nice and all.
throw_m239339 19 hours ago [-]
Yet another example of vibe coding at scale. You'll have to hire a lot of seniors out of retirement to fix that mess of gigantic proportions... and don't blame "the juniors" for that, they didn't make the decision to allow those tools at first place.
10xDev 18 hours ago [-]
A lot of juniors only graduated using these tools. Good luck taking it away from them.
throw_m239339 13 hours ago [-]
Juniors don't set up these policies or even chose the tools they have to use professionally. If the higher ups are panicking it's fully of their own doing.
oliver_dr 11 hours ago [-]
[dead]
adrien_dev 16 hours ago [-]
[dead]
aplomb1026 12 hours ago [-]
[dead]
ihsw 13 hours ago [-]
[dead]
throwaway613746 15 hours ago [-]
[dead]
josefritzishere 20 hours ago [-]
The excessive exuberance of AI adoption is all part of the bubble.
19 hours ago [-]
andsoitis 22 hours ago [-]
> Amazon’s website and shopping app went down for nearly six hours this month in an incident the company said involved an erroneous “software code deployment.” The outage left customers unable to complete transactions or access functions such as checking account details and product prices.
The environment breathed a little.
8note 12 hours ago [-]
hardly.
as an alternative, a bunch of people got into their one-person trucks and drove to the store to buy whatever thing would have been efficiently delivered
rubyrfranklin2 11 hours ago [-]
We ran into something similar at heyvid.ai — shipped AI-generated code without a proper review gate and ended up with a subtle bug in our rendering pipeline that took the team a week to trace. Not catastrophic, but it seriously eroded trust in the tooling for a while. Amazon's approach makes total sense at their scale. The honest reality is that LLMs are great at producing plausible-looking code and genuinely bad at knowing when they're wrong. Senior sign-off isn't overhead — it's what makes AI-assisted development actually sustainable.
This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
>He asked staff to attend the meeting, which is normally optional.
Is that false? It also discusses a new policy:
>Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Is that inaccurate? It is good context that this is a regularly scheduled meeting. But, regularly scheduled meetings can have newsworthy things happen at them.
My SVP asks me to do things all the time, indirectly. I do probably 5% of them.
Ok, this is pretty off-topic, but is this still true? I get that you can't have 10K people all actively participate in the meeting at the same time, but doesn't Zoom have a feature where you can broadcast to thousands and thousands?
Doesn't X/Twitter have a feature like this? (Although, to be fair, the last time I heard about that it was part of a headline like "DeSantis announcement of Presidential run on X/Twitter delayed for hours as X/Twitter's tech stack collapses under 200K viewers")
But still - nowadays it seems like it should be possible to have 10K employees all tune in at the same time and then call it a meeting, yes?
Very different from the typical weekly/montly outage meeting, where discussion is actually expected, instead of being a ritual.
They have webinar/event support for 5000+ participants, viewers can raise hands/use chat feedback for questions etc. and the meeting host can invite people to be visible.
Scale cuts both ways.
What matters isn't how big the meeting is, it's how important the material is, and how well presented it is.
If I ever attend it just put it on mute and look at the slides while I do some real work. That way my attendance gets registered and it doesn't stress me out later with too much stuff left hanging.
That percolation is also translation of what they say to things that are relevant at my level. Like what we will be working on next year, if there's going to be bonus or job losses.
I couldn't give a crap about the company's strategy as a whole and that's not my job anyway. Why should I. I'm not here because I believe in some holy mission. I just wanna do something I like and get paid.
But this meeting is a course correction for how they're using AI, which is a huge initiative. He'll be trying to sell the right balance of "keep using the technology, but don't fuck anything up."
Too cautious, everyone freezes and there's a slowdown[0]. Too soft, everyone thinks it's "another empty warning not to fuck up" and they go right back to fucking everything up because the real message was "don't you dare slow down." After the talk, people will have conversations about "what did they really mean?"
[0] If you hate AI, feel free to flip the direction of the effect.
How are they expecting some juniors to do this when the industry as a whole doesn't know where to begin yet?
Like that Meta AI expert who wiped her whole mailbox with openclaw. These are the people who should come up with the answers.
Ps I mostly hate AI but I do see some potential. Right now it feels like we're entering a fireworks bunker looking for a pot of gold and having only a box of matches for illumination.
What we need to know from management is exactly what you mention. Do we go all out and accept that shit will hit the fan once in a while (the old move fast and break things) or do we micromanage and basically work manually like old. And that they accept the risk either way. That kind of strategy is really business leader kind of work. Blaming it on your techs when it inevitably goes wrong is not.
Because the tech as it is right now is very non-deterministic. One day it works magic and the next day it blows up.
And yes that SMILE thing was a good example. Been in too many of those time wasters.
Sorry, I got flashbacks...
It’s not really possible to measure how much it would cost to not have a meeting, and I think it’s pretty obvious that if there were no meetings ever, it would hurt a company a lot
"This could have been an e-mail" should never need to be said.
Why is an SVP doing this if it's just gonna be ignored?
If I get a note from my boss like that, I consider it mandatory.
> He asked staff to attend the meeting, which is normally optional.
Clearly means that while normally the meeting would be optional, this time it’s not
Judging from the comment above, no, the meeting happens every week, and this week they were asked to attend.
Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”.
As for the second quote: senior engineers have always been required to sign off on changes from junior engineers. There’s nothing new there. And there is nothing specific to AI that was announced.
This entire meeting and message is basically just saying “hey we’ve been getting a little sloppy at following our operational best practices, this is a reminder to be less sloppy”. It’s a massive nothingburger.
Being "asked" by your boss to attend an optional meeting is pretty close to being required, it's just got a little anti-friction coating on it.
Different companies have different cultures. Weird that people can’t grok this.
"Did ya get the memo... about that meeting? I'll just have my secretary forward you another copy of that memo, OK? Yeaaaaaaah..."
> Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.
definitely a team by team question. if it was required it would be a crux rule that the code review isnt approved without an l6 approver.
Items weren't displaying prices and it was impossible to add anything to your cart. It lasted from about 2pm to 5pm.
It's especially strange because if a computer glitch brought down a large retail competitor like Walmart I probably would have seen something even though their sales volume is lower.
"get a person to look at it" is a cop-out action item, and best intentions only. nothing that you could actually apply to make development better across the whole company
That's been their job ever since cable news was invented.
https://en.wikipedia.org/wiki/Yellow_journalism
It probably goes back as long as they have been shouting news in the town square in Rome or before that even.
But good journalism is still something else.
Must have as the comments are hours older than OP.
Are you completely missing the point of the submission? It's not about "Amazon has a mandatory weekly meeting" but about the contents of that specific meeting, about AI-assisted tooling leading to "trends of incidents", having a "large blast radius" and "best practices and safeguards are not yet fully established".
No one cares how often the meeting in general is held, or if it's mandatory or not.
no, and that's what people are noting: the headline deliberately tries to blow this up into a big deal. When did you last see the HN post about Amazon's mandatory meeting to discuss a human-caused outage, or a post mortem? It's not because they don't happen...
I do not understand how “company that runs half the internet has had major recent outages and now explicitly names lax/non-existent LLM usage guidelines as a major reason” can possibly not be a big deal in the midst of an industry-wide hype wave over how the world’s biggest companies now run agent teams shipping 150 pull requests an hour.
The chain of events is “AWS has been having a pretty awful time as far as outages go”, and now “result of an operational meeting is that the company will cut down on the use of autonomous AI.” You don’t need CoT-level reasoning to come to the natural conclusion here.
If we could, as a species, collectively, stop measuring the relevance of a piece of news proportionally by how much we like hearing it, please?
Im a massive AI skeptic. If anyone were to be jumping up and down on the corpse of AI and this incessant drive to use it everywhere, it’d be me. But I also work at Amazon. I got the email. I attended the meeting. I can personally attest that there are no new requirements for AI-generated code. The articles about this in the meeting at extremely misleading, if not outright wrong. But instead of believing the person that was actually there in the room, this thread is full of people dismissing my first-hand account of the situation because it doesn’t align with the “haha AI failed” viewpoint.
Maybe your CoT-level reasoning isn’t so robust.
Even if it weren't a finance publication, I have trouble imagining you making this argument if a headline said something like "Google deals with outages in the cloud" because of the idea that it's misleading to refer to it as anything other than GCP. I think you're fundamentally not understanding how people communicate about this sort of thing if you actually think that someone saying "Amazon" is misleading in any meaningful way.
I don’t blame you, because this is just bad reporting (and potentially intentionally malicious to make you think it’s about AWS). But the meeting and discussion was with the Amazon retail teams, talking about Amazon retail processes, and Amazon retail services. The teams and processes that handle this are entirely separate from any AWS outages you are thinking of.
The outages that Amazon retail has faced also have nothing to do with AI, and there was no “explicit call out” about AI causing anything.
https://www.theguardian.com/us-news/ng-interactive/2026/jan/...
What is worth being pointed out is how quickly people blame "The Media" for how people use, consume and spread information on social networks.
Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.
I.e. senior review is valuable, but it does not make bad code good.
This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof").
The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues.
The key to making this scalable is to make as few parts as possible critical, and make the potential bad outcomes as benign as possible. (This lets you go to a lower rating in whatever safety standard applies to your industry.) You still need tests for the less critical parts though, while downtime is better than injury, if you want to sell future machines to your customers you need to have a good track record. At least if you don't want to compete on cost.
This is a good lesson for anyone I think. Definitely something I’m going to think more about. Thanks for sharing!
If you told someone "I don't trust you, run all code by me first" it wouldn't go well. If you tell them "everyone's code gets reviewed" they're ok with it.
You don't get paid for features or code shipped. People don't pay $200 a head for fine dining based on the number of carrot chops or garlic crushes. The chops and crushes are necessary but not what you should be optimizing for.
they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices.
still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing.
Two years afterward, we got hit with ransomware. And obviously "I told you so" isn't a productive discussion topic at that point.
cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times
This bs is what I say my juniors when I want them to fuck off with their reviews and focus on my actual work.
Sounds very insightful though.
Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture.
In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good.
If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase.
What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.
I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".
We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.
On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.
Code is the most precise specification we have for interfacing with computers.
Incidentally, I think in many scenarios, LLMs are pretty great at converting code to a spec and indeed spec to code (of equal quality to that of the input spec).
So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting.
Writing tests to ensure a program is correct is the same problem as writing a correct program.
Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.
Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.
You can push uncertainty around and but you can't eliminate it.
This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.
As Douglas Adams noted: ultimately you've got to know where your towel is.
One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.
It has to end.
For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.
People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.
Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.
An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.
The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions.
A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through.
Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time.
We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this.
1. I spoke to sales to find out about the customer
2. I read every line of the contract (SOW)
3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks
3. I designed every single bit of AWS architecture and code
4. I did the design review with the client
5. I led the customer acceptance testing
> We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs
I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands
I’m not saying trusting cheap devs is a good idea either. I do think cheap devs are actually at risk here.
I didn’t blindly trust the Salesforce consultants either. I also didn’t verify every line of oSql (not a typo) they wrote.
I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.
* Work in small increments
* Explicitly instruct the LLM to make minimal changes
* Think through possible failure modes
* Build in error-checking and validation for those failure modes
* Write tests which exercise all paths
This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.
The gains are especially notable when working in unfamiliar domains. I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
>When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.
If we're being honest with ourselves, it's not making devs work faster. It at best frees their time up so they feel more productive.
I'd like to think that I have this under control because the methodology of working in small increments helps me to recognize when I've gotten stuck in an eddy, but I'll have to watch out for it.
I still maintain that the LLM is saving me time overall. Besides helping in unfamiliar domains, it's also faster than me at leaf-node tasks like writing unit tests.
AI doesn't make you code faster, it just makes the boring stretches somewhat more exciting.
... Errr... Yeah, that's not a great approach, unless you are defining 'work' extremely vaguely.
I still make an effort to understand the generated code. If there’s a section I don’t get, I ask the LLM to explain it.
Most of the time it’s just API conventions and idioms I’m not yet familiar with. I have strong enough fundamentals that I generally know what I’m trying to accomplish and how it’s supposed to work and how to achieve it securely.
For example, I was writing some backend code that I knew needed a nonce check but I didn’t know what the conventions were for the framework. So I asked the LLM to add a nonce check, then scanned the docs for the code it generated.
Yes, code produced this way will have bugs, especially of the "unknown unknown" variety — but so would the code that I would have written by hand.
I think a bigger factor contributing to unforeseen bugs is whether the LLM's code is statistically likely to be correct:
* Is this a domain that the LLM has trained on a lot? (i.e. lots of React code out there, not much in your home-grown DSL)
* Is the codebase itself easy to understand, written with best practices, and adhering to popular conventions? Code which is hard for humans to understand is also hard for an LLM to understand.
It introduces unnecessary indirection, additional abstractions, fails to re-use code. Humans do this too, but AI models can introduce this type of architectural rot much faster (because it's so fast), and humans usually notice when things start to go off the rails, whereas an AI model will just keep piling on bad code.
But I would never do the same for Azure.
"Seniors will do expert review" will slowly collapse.
No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.
I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.
For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.
The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.
We know from experimentation that agents will change anything that isn’t nailed down. No natural language spec or test suite has ever come close to fully describing all observable behaviors of a non-trivial system.
This means that if no one is reviewing the code, agents adding features will change observable behaviors.
This gets exposed to users as churn, jank, and broken work flows.
2. Assuming that techniques that work with human developers that have severely impaired judgement but are massively faster at producing code is a bad idea.
3. There’s no way you have enough experience with maintaining code written in this way to confidently hand wave away concerns.
One task is usually composed of 2 input files, a specification and a header file, and the task is to output the implementation and nothing more. Agent user has no other permissions in the file system, has no tools, just output the code that's directed into a file. I run ´make' whenever I update a specification. Token count is minimal.
Do I save time? Not much, but having to specify and argue about everything is interesting, and I trust myself that I'm not loosing any knowledge this way; be it the why or the how.
So many people on HN are so insulted that the people who put money in our bank accounts and in some cases stock in our brokerage accounts ever cared about their bespoke clean code, GOF patterns and they never did. LLM just made it more apparent.
It’s always been dumb for PR to be focused on for loops vs while loops instead of focusing on whether functional and non functional requirements are met
Speak for yourself. I don't hire people like you.
Even in late 2023 with the shit show of the current market, I had no issues having multiple offers within three weeks just by reaching out to my network and companies looking for people with my set of skills.
You sound like a bozo, I can sniff it through my screen.
Guess what? I also stopped caring how registers are used and counting clock cycles in my assembly language code like it’s the 80s and I’m still programming on a 1Mhz 65C02
But do you look at any of the AI output? Or is it just "it works, ship it"?
What I checked.
1. The bash shell scripts I had it write as my integration test suite
2. To make sure it wasn’t loading the files into Postgres the naive way -loading the file from S3 and doing bulk inserts instead of using the AWS extension that lets it load directly from S3. It’s the differ xe between taking 20 minutes and 20 seconds.
3. I had strict concurrency and failure recovery requirements. I made sure it was done the right way.
4. Various security, logging, log retention requirements
What I didn’t look at - a line of the code for the web admin site. I used AWS Cognito for authentication and checked to make sure that unauthorized users couldn’t use the website. Even that didn’t require looking at the code - I had automated tests that tested all of the endpoints.
I've witnessed human developers produce incredibly convoluted, slow "ETL pipelines" that took 10+ minutes to load single digit megabytes of data. It could've been reduced to a shell script that called psql \copy.
Hell, often it feels slower/worse. Foreign code is easily confusing at first, which slows you down - and bad code quickly gets bewildering and sends you down paths of clarifications that waste time.
Then often it blows up in production. Makes me almost want to blanket reject PRs for being too difficult to understand. Hand written code almost has an aversion to complexity, you'd search around for existing examples, libraries, reusable components, or just a simpler idea before building something crazy complex. While with AI you can spit out your first idea quickly no matter how complex or flawed the original concept was.
It's actually often harder to fix something sloppy than to write it from scratch. To fix it, you need to hold in your head both the original, the new solution, and calculate the difference, which can be very confusing. The original solution can also anchor your thinking to some approach to the problem, which you wouldn't have if you solve it from scratch.
If AI is a productivity boost and juniors are going to generate 10x the PRs, do you need 10x the seniors (expensive) or 1/10th the juniors (cost save).
A reminder that in many situations, pure code velocity was never the limiting factor.
Re: idiot prooofing I think this is a natural evolution as companies get larger they try to limit their downside & manage for the median rather than having a growth mindset in hiring/firing/performance.
I suspect that isn't the goal.
Review by more senior people shifts accountability from the Junior to a Senior, and reframes the problem from "Oh dear, the junior broke everything because they didn't know any better" to "Ah, that Senior is underperforming because they approved code that broke everything."
Maybe I don't have the correct mental model for how the typical junior engineer thinks though. I never wanted to bug senior people and make demands on their time if I could help it.
With a layout of 4 juniors, 5 intermediates, and 0-1 senior per team, putting all the changes through senior engineer review means you mostly wont be able to get CRs approved.
I guess it could result in forcing everyone who's sandbagging as intermediate instead of going to senior to have to get promoted?
Whether or not these productivity gains are realized is another question, but spreadsheet based decision makers are going to try.
Also - the definition of Senior will change, and a lot of current Seniors will not transition, while plenty of Juniors that put in a lot of time using code agents will transition.
But will they? I'm not at all convinced that babysitting an AI churning out volumes of code you don't understand will help you acquire the knowledge to understand and debug it.
American corporate culture has decided that training costs are someone else’s problem. Since every corporation acts this way it means all training costs have been pushed onto the labor market. Combine that with the past few decades of “oops, looks like you picked the wrong career that took years of learning and/or 10 to 100s of thousands of dollars to acquire but we’ve obsoleted that field” and new entrants into the labor market are just choosing not to join.
Take trucking for example. For the past decade I’ve heard logistics companies bemoan the lack of CDL holders, while simultaneously gleefully talk about how the moment self driving is figured out they are going to replace all of them.
We’re going to be outpaced by countries like China at some point because we’re doing the industrial equivalent of eating our seed corn and there is seemingly no will to slow that trend down, much less reverse it.
I know I'm probably coming across as a lunatic lately on HN but I really do think we're on the path towards violence thanks to AI
You just cannot destroy this many people's livelihoods without backlash. It's leading nowhere good
But a handful of people are getting stupidly rich/richer so they'll never stop
If you look at the luddite rebellion they weren't actually against industrial technology like looms. They were against being told they weren't needed anymore and thrown to the wolves because of the machines.
The rich have forgotten they are made of meat and/or are planning on returning to feudalism ala Yarvin, Thiel, Musk, and co's politics.
What are you realistically willing to do? How far can you go?
I guess that makes me a modern luddite then
A software engineer luddite
A techno-luddite if you will
Maybe I have a new username
Especially in a big co like Amazon, most senior engineers are box drawers, meeting goers, gatekeepers, vision setters, org lubricants, VP's trustees, glorified product managers, and etc. They don't necessarily know more context than the more junior engineers, and they most likely will review slowly while uncovering fewer issues.
When reviewing, you need to go through every step of implementing it yourself (understand the problem, solve the problem, etc.), but you additionally need to 1) understand someone else’s solution and 2) diff your solution against theirs to provide meaningful feedback.
Review could take roughly equivalent time, but only if I am allowed to reject/approve in a binary way (“my solution would not be the same, therefore denied”) which is not considered appropriate in most places.
This is why I am not a fan of being the reviewer.
I’m probably not going to review a random website built by someone except for usability, requirements and security.
I also said senior review is valuable, but I'm not 100% sure if you're implying I didn't.
Yes, but with the caveat that the junior learns and eventually can become the senior.
My manager has been urging us to truly vibe code, just yesterday saying that "language is irrelevant because we've reached the point where it works - so you don't need to see it." This article is a godsend; I'll take this flawed silver bullet any day of the week.
The other problem is that the type of errors LLMs make are different than juniors. There are huge sections of genuinely good code. So the senior gets "review fatigue" because so much looks good they just start rubber stamping.
I use an automated pipeline to generate code (including terraform, risking infrastructure nukes), and I am the senior reviewer. But I have gates that do a whole range of checks, both deterministic and stochastic, before it ever gets to me. Easy things are pushed back to the LLM for it to autofix. I only see things where my eyes can actually make a difference.
Amazon's instinct is right (add a gate), but the implementation is wrong (make it human). Automated checks first, humans for what's left.
1. They can assess whether the use of AI is appropriate without looking in detail. E.g. if the AI changed 1000 lines of code to fix a minor bug, or changed code that is essential for security.
2. To discourage AI use, because of the added friction.
I hear “x tool doesn’t really work well” and then I immediately ask: “does someone know how to use it well?” The answer “yes” is infrequent. Even a yes is often a maybe.
The problem is pervasive in my world (insurance). Number-producing features need to work in a UX and product sense but also produce the right numbers, and within range of expectations. Just checking the UX does what it’s supposed to do is one job, and checking the numbers an entirely separate task.
I don’t many folks that do both well.
I would actually say having at least 2 people on any given work item should probably be the norm at Amazon's size if you also want to churn through people as Amazon does and also want quality.
Doing code reviews are not as highly valued in terms of incentives to the employees and it blocks them working on things they would get more compensation for.
We need smart people at every layer. If leadership isn't in that category, it spreads to all layers.
I don't know how we defeat capitalism to incentivize smart leadership. It's fundamentally opposed to market forces.
So you're saying that peer reviews are a waste of time and only idiots would use/propose them?
To partially clarify: "Idiot proof" is a broad concept that here refers specifically to abstraction layers, more or less (e.g. a UI framework is a little "idiot proof"; a WYSIWYG builder is more "idiot proof"). With AI, it's complicated, but bad leadership is over-interpreting the "idiot proof" aspects of it. It's a phrase, not an insult to users of these tools.
Also while this is happening most developers are getting constantly hammered by operational issues and critical security tasks because 1) the legacy toolchain imports 6 different language package ecosystems and 2)no one ever pays down tech debt in legacy code until its a high severity ticket count in a KPI dashboard visible to the senior management.
But now with AI, they are getting disrupted. Most AWS services might become obsolete, why does an ai need these janky higher levels abstractions AWS piles on.
So now they need innovation, but the company isn’t set up for it. They are forcing short deadlines for product launches that don’t matter
1. Shipping: deliver tickets or be pipped.
2. Having Less comments on their PRs: for some drastically dumb reason, having a PR thoroughly reviewed is a sign of bad quality. L7 and above use this metric to Pip folks.
3. Docs: write docs, get them reviewed to show you're high level.
Without AI, an employee is worse off in all of the above compared to folks who will cheat to get ahead.
I can't see how "requesting" folks for forego their own self-preservation will work. especially when you've spent years pitting people against each other.
I'm very far away from liking Amazon's engineering culture and general work culture, but having PRs with countless of discussions and feedback on it does signal that you've done a lot of work without collaborating with others before doing the work. Generally in teams that work well together and build great software, the PRs tend to have very little on them, as most of the issues were resolved while designing together with others.
(And/but yes/no, I have never worked at NAGFAM...)
I agree, but those are separate tasks completely (in my view) compared to "Someone writes code that goes into production", usually called "spikes" or something else to differentiate them from "normal" tasks. They're quite literally just about exploration and figuring out the design, before the "real" work starts.
I missed my FAANG chance during the good years. No retirement for me!
People push AI-reviewed code like they wrote it. In the past, "wrote it" implies "reviewed it." With AI, that's no longer true.
I advocate for GitHub and other code review systems to add a "Require self-review" option, where people must attest that they reviewed and approved their own code. This change might seem symbolic, but it clearly sets workflows and expectations.
It also makes me more comfortable figuring out how a project's pull acceptance are like (maybe due to how fast local ui is compared to web-based git). On the other hand, I can only run some basic git cli commands and can't quickly comprehend raw text-based diff, especially when encountering some linux patches from time to time.
working at amazon, when I wanted to review code myself through the CR tool, Id still end up publishing it to the whole team and have to add some title shenanigans saying it was a self review or WIP and for others to not look at it yet
There’s also this implicit imbalance engineers typically don’t like: it takes me 10 min to submit a complete feature thanks to Claude… but for the human reviewing my PR in a manual way it will take them 10-20 times that.
Edit: at the end real engineers know that what takes effort is a) to know what to build and why, b) to verify that what was built is correct. Currently AI doesn’t help much with any of these 2 points.
The inbetweens are needed but they are a byproduct. Senior leadership doesn’t know this, though.
I'd prefer people wrote good quality code and checked it as they went along... whilst allowing room for other stuff they didn't think of to come to the front. The production process of using LLMs is entirely different, in its current state I don't see the net benefit.
E.g. if you have a very crystalised vision of what you want, why would I want an engineer to use an LLM to write it, when the LLM can't do both raw production and review? Could this change? Sure. But there's no benefit for me personally to shift toward working that way now - I'd rather it came into existence first before I expose myself to incremental risk that affects business operations. I want a comprehensive solution.
It sounds like a piss poor deal for seniors unless senior engineer now means professional code reviewer.
This resonates with my experience.
The only thing you forgot is that you can also use the 12^H^H 14 leadership principles to argue whatever you want (and then the opposite of what you argued last month, still using the same leadership principles).
Were you a knowledge source for the entire team? Well, you weren't learning and being curious. Did you ask a lot of questions to learn everything? Well, then you weren't "are right a lot".
Did you think big and come up with an architecture that saved Amazon a lot of money? Then you weren't inventing and simplifying. Build something simple to get out out the door quick? Well, you weren't thinking big.
Did you act quickly without consulting others to fix an issue? Well you weren't earning trust. Did you consult people to make sure they were happy with the solution? Well you weren't biased for action.
Thats just a few examples, there's so many more
1. Choose from a set of challenge types (e.g. meeting a deadline, reliability)
2. Choose whether the challenge was "met" or "failed".
3. Choose whether you want to make the person look good or bad, by following/ignoring a principle.
4. Results: A list of relevant principles with short rationalizations.
I'm almost tempted to try, except perhaps I should treasure my ignorance.
If a tool like that gets popular enough that most employees are using it for office-politics, it might even start to deflate the whole Leadership Principles thing.
Well, you'd think senior leadership should know how their business and their people work.
Despite the name not a lot of seniority, leadership or engineering going around
News from the inside makes it sound like things are getting pretty bad.
You mean senior programmers that have been there for ages don't want to spend their time reviewing AI slop? Who'd a thunk it!
Mentoring Juniors is an important part of the job and crucial service to the industry, but juniors equipped with LLMs make the deal a bit more sour. Anecdotally, they don't really remember the feedback as well, because they weren't involved in writing the code. Its burnout-inducing to see your hard work and feedback go in one ear and out another.
I personally know people looking to jump ship because they waste too much time at their current employer on this.
Not really.
They're torn between "we want to fire 80% of you" and "... but if we don't give up quality/reliability, LLMs only save a little time, not a ton, so we can only fire like 5% of you max".
(It's the same in writing, these things are only a huge speed-up if it's OK for the output to be low-quality, but good output using LLMs only saves a little time versus writing entirely by-hand—so far, anyway, of course these systems are changing by the day, but this specific limitation has remained true for about four years now, without much improvement)
That has always been my feeling. Once I really understand what I need to implement, the code is the easy part. Sure it takes some time, but it's not the majority. And for me, actually writing the code will often trigger some additional insight or awareness of edge cases that I hadn't considered.
if i wanted, i could queue up weeks worth of review in a couple days, but that's not getting the whole team more productive.
Spending more time on documents and chatting proved much more useful for getting more output overall.
Even without LLMs ive been nearby and on teams where review burden from developers building away team code was already so high that youd need to bake an extra month into your estimates for getting somebody to actually look.
Of course it wasn't! Do you think people can envision the right objects to produce all the time? Yeah.. we have a lot of Steve Jobs walking around lol.
As you say, there's 'other stuff' that happens naturally during the production process that add value.
Thinking through making.
Essentially something big has to happen that affects the revenue/trust of a large provider of goods, stemming from LLM-use.
They wont go away entirely. But this idea that they can displace engineers at a high-rate will.
I feel the current proliferation of LLMs is going to resemble asbestos problem: Cheap miracle thingy, overused in several places, with slow gradual regret and chronic harms/costs. Although I suppose the "undocumented nasty surprise" aspect would depend on adoption of local LLMs. If it's a monthly subscription to cloud-stuff, people are far less-likely to lose track of where the systems are and what they're doing.
[0] https://en.wikipedia.org/wiki/Air_France_Flight_4590
We love this for Amazon, they're a very strong company making bold decisions.
Code review should not be (primarily) about catching serious errors. If there are always a lot of errors, you can’t catch most of them with review. If there are few it’s not the best use of time.
The goal is to ensure the team is in sync on design, standards, etc. To train and educate Jr engineers, to spread understanding of the system. To bring more points of view to complex and important decisions.
These goals help you reduce the number of errors going into the review process, this should be the actual goal.
The fact that software is "soft" makes it seem like this doesn't apply, but it does, not least because of the fact that once you have gone down the wrong path with software design, it is very difficult to pull back and realize you need to go down an entirely different one.
The analogy to manufacturing would be something like if the parts coming out a machine are all bad, just sending them to re-work is not a solution, you need to re-calibrate the machine.
So basically, kill the productivity of senior engineers, kill the ability for junior engineers to learn anything, and ensure those senior engineers hate their jobs.
Bold move, we'll see how that goes.
It's basically an even-more-ridiculous version of ranking programmers by lines-of-code/week.
What's especially comical is I've seen enormous gains in my (longish, at this point) career from learning other tools (e.g. expanding my familiarity with Unix or otherwise fairly common command line tools) and never, ever has anyone measured how much I'm using them, and never, ever has management become in any way involved in pushing them on me. It's like the CEO coming down to tell everyone they'll be making sure all the programmers are using regular expressions enough, and tracking time spent engaging with regular expressions, or they'll be counting how many breakpoints they're setting in their debuggers per week. WTF? That kind of thing should be leads' and seniors' business, to spread and encourage knowledge and appropriate tool use among themselves and with juniors, to the degree it should be anyone's business. Seems like yet another smell indicating that this whole LLM boom is built on shaky ground.
That's because they weren't sold regex as as service by a massive company, while also being reassured by everyone that any person not using at least one regular expression per line of code is effectively worthless and exposes their business to a threat of immediate obsolescence and destruction. They finally found a way to sell the same kind of FOMO to a majority of execs in the software industry.
Gotta be careful if you do that tho; e.x. Copilot can monitor 'accept' rate, so at bare minimum you'd have to accept the changes than immediately back them out...
Did industrial psychology die out as a field? Why do we keep reinventing the wheel when it comes to perverse incentives. It’s like working on a team working with scrum where the big bosses expect the average velocity to go up every sprint, forever, but the engineers are the ones deciding the point totals on tickets.
I mean… throw some docs into the context window, see it explode. Repeat that a few times with some multi-step workflows. Presto, hundreds of dollars in “AI” spending accomplishing nothing. In olden days we’d just burn the cash in a waste paper basket.
In my case it's morality.
edit: Peer said it well, IMO. The consequences aren't really yours. Also: something, something, Goodhart's Law.
I am saying in General, I've never worked in Amazon
There's a lot of learning opportunity in failing, but if failure just means spam the AI button with a new prompt, there's not much learning to be had.
Jesus, yes. Maybe I'm an oddball but there's a limit to how much PR reviewing I could do per week and stay sane. It's not terribly high, either. I'd say like 5 hours per week max, and no more than one hour per half-workday, before my eyes glaze over and my reviews become useless.
Reviewing code is important and is part of the job but if you're asking me to spend far more of my time on it, and across (presumably) a wider set of projects or sections of projects so I've got more context-switching to figure out WTF I'm even looking at, yes, I would hate my job by the end of day 1 of that.
I don't disagree, I think reviewing is laborious, I just don't see how this causes any unintended consequences that aren't effectively baked into using an AI assistant.
Code Review is hard and tiring, much moreso than writing it
I've never met anyone who would be okay reviewing code for their full time job
Feels inevitable that code for aviation will slowly rot from the same forces at play but with lethal results.
Just because nearly all software is going to be written by AI, does not mean critical infrastructure will be.
I wonder if it's an early step towards an apprenticeship system.
How else would they train the LLM PR reviewers to their standards?
I've never personally been in the position, because my entire career has been in startups, but I've had many friends be in the unenviable position of training their replacements. Here's the thing though, at least they knew they were training their replacements. We could be looking at a potential future where an employee or contractor doesn't realize s/he is actually just hired to generate training data for an LLM to replace them, and then be cut.
The presumably human mid-level or junior engineer has their own issues with this, but the point of the LLM is that you don't need that engineer. For productivity purposes, the dev org only needs the seniors to wrangle all the LLMs they can. That doesn't sustain, so a couple of more-junior engineers can do similar work to mature.
LOL, it's the age old "responsibility without authority". The pressure to use AI will increase and basically you'll be fired for not using it. Simultaneously with the pressure to take the blame when AI fucks up and you can't keep up with the bullshit, leading you to get fired. One way or the other, get some training on how to stack shelves at the supermarket because that's how our future looks, one way or the other.
In the pre-gen-AI days, if an engineer put up a PR, it implied (somewhat) they wrote their code, reviewed it implicitly as they wrote it, and made choices (ie: why is this the best approach).
If Claude is just the new high level programming language, in terms of prompting in natural language, the challenge is that we're not reviewing the natural language, we're reviewing the machine code without knowing what the inputs were. I'm not sure of a solution to this, but something along the lines of knowing the history of the prompting that ultimately led to the PR, the time/tokens involved, etc. may inform the "quality" or "effort" spent in producing the PR. A one-shotted feature vs. a multi-iteration feature may produce the same lines of code and general shape, but one is likely to be higher "quality" in terms of minimal defects.
Along the same lines, when I review some gen-AI produced PR, it feels like I'm reading assembly and having to reverse how we got here. It may be code that runs and is perfectly fine, but I can't tell what the higher level inputs were that produced it, and if they were sufficient.
The impression I get from SWEs I’ve met throughout my life is that most of them don’t actually care about their job. They got in because it paid well and demand was plentiful.
still within the engineering IC role, but on a different track
The way I am working with AI agents (codex) these days is have the AI generate a spec in a series of MD documents where the AI implementation of each document is a bite sized chunk that can be tested and evaluated by the human before moving to the next step and roughly matches a commit in version control. The version control history reflects the logical progression of the code. In this manner, I have a decent knowledge of the code, and one that I am more comfortable with than one-shotting.
Prior to each step, I prompt the AI to review the step and ask clarifying questions to fill any missing details. Then implement. Then prompt the AI after to review the changes for any fixes before moving on to the next step. Rinse, repeat.
The specs and plans are actually better for sharing context with the rest of the team than a traditional review process.
I find the code generated by this process to be better in general than the code I've generated over my previous 35+ years of coding. More robust, more complete, better tested. I used to "rush" through this process before, with less upfront planning, and more of a focus on getting a working scaffold up and running as fast as possible, with each step along the way implemented a bit quicker and less robustly, with the assumption I'd return to fix up the corner cases later.
And what are they going to do when they've fired all the senior engineers because they make too much money, leaving just juniors and AI?
When they fire everyone, juniors will fix it with AI.
This is in general. I wouldn’t recommend this at critical services like AWS.
But yes agree with the rest, which probably makes up a tiny tiny fraction of the software created today, and will be orders of magnitude smaller as a fraction in the future.
So you have 2 systems of engineers: Sr- and Sr+
1. Both should write code to justify their work and impact
2. Sr- code must be reviewed by Sr+
What happens:
a. Sr+ output drops because review takes their time more and more
b. Sr+ just blindly accepts because of the volume is too high, and they should also do their own work
c. Sr+ asks Sr- to slow-down, then Sr- can get bad reviews for the output, because on average Sr+ will produce more code
I think (b) will happen
Obviously it's probably cost-prohibitive to do an all to all analysis for every PR, but I imagine with some intelligent optimizations around likelihood and similarity analysis something along those lines would be possible and practical.
Amazon does have those things, and has fine tuning on models based on those postmortems.
Noisy reviews are also a problem causer. the PR doesnt know what scale a chunk of code is running at, without having access to 20 more packages and other details.
COEs and Operation Readiness Reviews are already the documents that you mention, but they are largely useless in preventing incidents.
And from their sagely reviews, we shall train a large language model to ultimately replace them because the most fungible thing at Amazon is the leadership.
Imagine having to debug code that caused an outage when 80% is written by an LLM and you now have to start actually figuring out the codebase at 2am.. :)
i think the team i was on was a bit of an outlier in terms of owning 40 dumptser fires at once, and the first time reading any one of them was at 2AM because it was down.
having an LLM give early passes on reading the godawful c++ code that you can tell at a glance that its not gonna work as expected, but you cant tell why, or what expected actually is would have been phenomenal, and gotten me back to sleep at 3 on those codebases rather than 5.
I do think AI adoption exacerbates said falloff.
force agents to not touch mission critical things, fail in CI otherwise
let it work on frontends and things at the frontier of the dependency tree, where it is worth the risk
/s
So now, you can speed up using Claude Code and use Code Review to keep it in check.
In the meantime they will be quite a bit slower I’d imagine.
Also wonder if those seniors will ever get to actually do any engineering themselves now that they’re the bottleneck. :)
I am seeing this mindset still, with AI Agents. I imagine they will slowly realize they need to use this stuff to be competitive, but being slow to adopt AI seems like it could have been the source of this.
Imagine if the #1 problem of your woodworking shop is staff injuries, and the solution that management foists on you is higher RPM lathes.
First thing that comes to mind is: reminds me of those movie where some dictatorship starts to crumble and the dictator start being tougher and tougher on generals, not realizing the whole endeavor is doomed, not just the current implementation.
Then again, as a former amazon (aws) engineer: this is just not going to work. Depending how you define "senior engineer" (L5? L6? L7?) this is less and less feasible.
L5 engineers are already supposed to work pretty much autonomously, maybe with L6 sign-off when changes are a bit large in scope.
L6 engineers already have their own load of work, and a fairly large amount of engineers "under" them (anywhere from 5 to 8). Properly reviewing changes from all them, and taking responsibility for that, is going to be very taxing on such people.
L7 engineers work across teams and they might have anywhere from 12 to 30 engineers (L4/5/6) "under" them (or more). They are already scarce in number and they already pretty much mostly do reviews (which is proving not sufficient, it seems). Mandating sign-off and mandating assumption of responsibility for breaking changes means these people basically only do reviews and will be stricter and stricter[1] with engineers under them.
L8 engineers, they barely do any engineering at all, from what I remember. They mostly review design documents, in my experience not always expressing sound opinions or having proper understanding of the issues being handled.
In all this, considering the low morale (layoffs), the reduced headcount (layoffs) and the rise in expectations (engineers trying harder to stay afloat[2] due to... layoffs)... It's a dire situation.
I'm going to tell you, this stinks A LOT like rotting day 2 mindset.
----
1. keep in mind you can't, in general, determine the absence of bugs
2. Also cranking out WAY MUCH MORE code due to having gen-ai tools at their fingertips...
(Before injecting it into global infra...)
Haven't tried Kiro CLI.
In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.
I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.
Kiro writes specs using structured formats like EARS and INCOSE. It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.
Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests.
Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).
Kiro didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.
Take a perfectly productive senior developer and instead make him be responsible for output of a bunch of AI juniors with the expectation of 10x output.
Think about it - how do you increase the speed at which one can review code? Well first it must be attractive to look at - the more attractive the faster you review/understand and move through the review. Now this won't be the case everywhere - e.g. in outsourced regions the conditions will force people to operate a certain way.
Im not a SWE by trade, I just try to look at things from a pragmatic stand-point of how org's actually make incremental progress faster.
A beautiful building is only as good as the correctness of its foundation, framework, materials, and construction. Those qualities can only be assessed by those with expertise enough to understand their importance. Beauty in its proper place is the output of the intersection between a craftsman and a engineer. Beauty is optional, but it makes life more worth living. The same is true for code - attractive code is optional, but it makes being a SWE more rewarding.
Over the next few days my account history came back, except purchases made Q1 2026. Those are still missing. There are a few substantial purchases I made that are nowhere to be found anymore.
I attributed this Iranian missiles hitting some of their infrastructure in EU, as it had been reported.
Now I am not sure if it was blast radius from missiles or AI mishaps. Lmao - couldn’t happen to a worse company…
Thought this blurb most interesting. What's the between-lines subtext here? Are they deliberately serving something they know to be faulty to the Chinese? Or is it the case that the Chinese use it with little to no issue/complaint? Or...?
Seems to me too low level in everyone’s stack to not have humans doing the work, especially at this stage. But what do I know, I certainly am not at the helm of a multibillion dollar operation.
I find myself context-switching all the time and it's pretty exhausting, while also finding that I'm not retaining as much deep application domain knowledge as I used to.
On the surface, it's nice that I can give my LLM a well-written bug ticket and let it loose since it does a good job most of the time. But when it doesn't do a good job or it's making a change in an area of the codebase I'm not familiar with, auditing the change gets tiring really fast.
Has Seattle now become the code-slop capital ? Or is SFO still on top ?
no! not that way!
So what incentive is there for juniors to look at the code at all? Seniors are now just another CI stage for their slop to pass.
"No, not like that though!"
If you know CS you know two things:
1. AI can not judge code either noise or signal, AI cannot tell. 2. CS-wise we use statistic analysis to judge good code from bad.
How much time does it take to take AI output and run the basic statistic tools for most computer languages?
Some juniors need firing outright
maybe as software engineering topics, but thats a different discipline
There was never before a technology capable of convincing leadership of its usefulness despite its constant blunders and despite the low quality of its output. This feels like a corporate mass delusion of unprecedented scale.
The seniors will now be directly responsible for all the AI slop that goes in. But how can they possibly properly review reams of code to a sufficient degree they can personally vouch for it?
I do consulting and use AI a lot. You just have to take responsibility for the code. We are delivering like never before, but have a lot of experience into how to do it as safe as possible. And we are learning along the way. They say you need a year to build up experience fyi.
I feel bad for those engineers who will have to sign off for things they will most likely not have enough time to review. Kiro is nice and all.
The environment breathed a little.
as an alternative, a bunch of people got into their one-person trucks and drove to the store to buy whatever thing would have been efficiently delivered