Design Systems Are No Longer Optional

I've been thinking a lot about where design systems fit now that AI is writing so much of our UI and what their job actually is now that a meaningful share of the work is being generated rather than written by hand.

For a long time design systems (to put it simply) have been sold as a consistency tool and a way to move faster. It has been useful but it was also the kind of thing that companies deprioritised when budgets got tight. The idea being that a button slightly off here or some spacing that did not quite match there didn't matter in the grand scheme of things (a stance I disagree with but I digress).

However I believe treating design systems as some kind of optional consistency work doesn't makes sense anymore, and this is what I've been muddling through in my brain.

What happens now that work is generated

When most of our UI was written by people, the design system lived partly in the codebase and partly in everyone's heads. Engineers and designers absorbed the conventions over a long period of time (hopefully). The system was a shared reference, but it was not the only thing holding the line because people remembered how things were supposed to look.

However, AI does not remember any of your guidelines, it's not ignorant of design system concepts, it just does not know yours unless you put it in context. Every generation starts fresh, with no sense of your conventions and no loyalty to them. I recently read an article by Christoph Nakazawa that made the point: working with coding agents is structurally the same as running a large organisation, because you are constantly dropping new people with no context into the codebase. His conclusion was that the more constraints you put on the code (things like lint rules, automated tests etc) the faster people can actually move, because the guardrails catch mistakes before they spread.

A design system is meant to be the guardrail for the visual and interaction layer of a product, the cost of a weak design system isn't just inconsistency anymore, it's bad automation at scale. You should be constantly assessing whether it's accurate enough to be trusted, because something is now going to generate against it thousands of times a month.

Managing drift

The thing I worry about most is drift. I’ve seen this happen in a lot of tiny ways like filling a label with empty space to avoid restriction of a required label, using the component but adding its own spacing, or replacing a paragraph with a div so it doesn't have to deal with the margins...and to be fair people do this all the time. I often hear "oh I'll just do this for now and i'll fix it later" but later never comes. While none of that looks hugely problematic in review it's the exactly why it spreads. The difference now is the speed and volume that AI works at. When the drift came from people, it was slow and constrained by the limits of how much anyone could actually build. But AI removes both of those constraints, and now we have the additional risk that its output feeds straight back into the codebase or system that the next generation reads from.

So the concern is not that AI is uniquely bad at following conventions, it's that the consequences of not following them compounds a lot faster than ever before.

The tempting way to put it is if AI can get eighty-five percent fidelity, compounded over five generations, this would be around forty four percent, or something...but this assumes each pass will corrupt a fresh slice of everything, and that is not how things actually work. A generation that touches a form does not have some random chance of breaking an unrelated button elsewhere. You can't even really claim one perfect number either, colour tokens, spacing, component choice, accessibility, and naming are all different things that do not collapse into a single neat percentage to point at.

So really it's a lot simpler than that, AI generates against the current state of the codebase, not the state you intended. When a model needs to build a new card, it looks at how cards are built right now, including the slightly-off one a previous generation produced that nobody corrected. So now any deviations are no longer a one off mistake it's now part of what the next generation is going to treat as normal. Arguably this was always the case but now it's just magnified by the speed at which AI works.

But this is what I think is actually compounding, each time it deviates it raises the chance that the next model is going to copy it. The system begins to fail toward whatever sloppy local pattern happens to accumulate, and then it accelerates in that direction, because AI is very good at confidently extending any pattern it can see.

The risk isn't just one bad generation it's that bad generations become the reference for the good ones.

Ownership of the Design Systems

It's easy to look at an existing design system and assume that because it already exists it no longer needs dedicated people looking after it, especially now that AI can interpret and apply its patterns. In my opinion that way of thinking before AI was already short-sighted, but with AI in the loop, I think it becomes reckless. It's no longer just a trade-off in quality, you're deciding to actively weaken a source of truth that AI is going to use to generate against thousands of times a month.

A design system is not a static artifact, it is a continuously maintained claim about what a product should look and behave like. Tokens get added, components get deprecated, patterns diverge and someone has to decide what is canonical, documentation has to be rewritten when the intent changes. None of this maintains itself.

An unowned design system does not stay frozen in a good state, things change all the time, and the gap between the system and the codebase widens until the system is no longer trustworthy and people just copy whatever is nearby instead.

This connects to something Nakazawa describes in his article as strong ownership. Agents amplify ownership, someone who deeply understands the system can now direct an enormous amount of output against it, while an unowned system produces an enormous amount of output against nothing in particular. So without a central team you don't end up with a neglected but stable design system. Instead you end up with an inaccurate reference propagated across your product at machine speed, which I think is genuinely worse than having no system at all.

One thing I want to call out here though is what central ownership actually means. Nathan Curtis wrote about the fallacy of federated design systems and it's worth reading alongside this. A federated model distributes responsibility across designers and/or engineers working on features or across feature teams each taking a slice of the system as part of a broader product role. As a result the system gets maintained as a side effect of shipping features. It gives you broad coverage, but this is not the same thing as accountability, because whenever a feature deadline and a system quality decision come into conflict, the feature wins. It wins every time regardless of your good intentions. We all already know this, so lets stop pretending it's not true.

Central ownership means someone is paid to say no, it's their job to protect it when the pressure of building features makes the wrong thing look like the easy path forward. A central team's success metric is the accuracy and usability of the system itself, not feature velocity, not a sprint deadline. This separation is intentional. A feature team will make pragmatic trade-offs under pressure. A central team's job is precisely to not make that trade-off, because the health of the system is what they are measured against. That means ensuring consistent output for people and for AI, tracking how much of the surrounding codebase actually uses the system, and stepping in where things deviate. It means holding the line when components get built so generically they stop being usable, or when people reach for the easy implementation rather than the right one. The system does not stay accurate by accident. Someone has to care more about getting it right than about getting it done.

This does not mean slowing feature teams down. A well-run central team works alongside people working on features. It also doesn't mean there wont be any tension. There will always be a healthy tension between feature teams trying to deliver fast and a systems team aiming for good solid foundations. But, the guardrails that the team work on are what help everyone else move faster, because the questions about what is correct have already been answered or are being answered by the people focused on it.

Companies shouldn't be asking "can we afford a design systems team" you should be asking "can we afford to point AI at a reference that nobody is keeping true or accurate?"

Constraints and feedback

I think the way through this has two parts to it and in my experience teams often focus primarily on the first one.

That is constraints, which make the system legible to the model. This is the part most teams already understand. A real token system, components with tight and well-defined APIs, clear usage documentation. The piece that often gets missed is putting that context where the model actually reads it, which is in the repo, not in a separate documentation site, or in a Figma file or the heads of a few senior people. If each agent session is a new hire without access to your scattered context, then the most useful thing you can do is keep the context next to the work. Usage guidelines, do and do not examples, and the intent behind a component belong alongside the code, because that is the context the model is guaranteed to see.

The part that matters just as much, and I think gets less attention, is the feedback loop which helps the system be self-correcting. Constraints tell the model what good looks like and feedback loops catch it when it inevitably misses the mark anyway. This might mean linting that rejects off-token values, component checks that fail when something escapes the system, and visual regression tests that flag a card that no longer matches the canonical one. The point is to move correction from something a human might notice in review to something the pipeline will reject. These tight feedback loops are what let the model finish work quickly rather than slowly.

That being said both extremes here can cause you a world of grief, if you are too loose the model might invent inconsistent patterns because nothing stops it. But if you are too rigid, and the model cannot compose anything genuinely new, so it generates workarounds that escape the system entirely, which is the worst case, because now the drift is invisible to the checks you built. Tuning that boundary to be strict enough to hold the line but flexible enough that the system stays a tool rather than an obstacle is a very important part of the job and it's not something you can just set once.

What is left for people

If we have a pipeline that rejects off-system output and the documentation is accurate and in the repo, it's fair to ask what is left for people to do? The answer that is most thrown around on LinkedIn or "AI thought pieces" is judgement. This is essentially a shift from being bottle necked on producing code and design to being bottle necked on exercising judgement. But this kind of feels the same to me? The hard questions haven't disappeared and it was always what Design System teams focused on. However they are way more important now, what is canonical when two good patterns conflict? When a new pattern is worth adding to the system rather than rejected as a one-off? What is the product actually meant to feel like? Is the API is too strict or too flexible?

I've heard people say we should offload the work we don't enjoy to AI, but if you are handing off work you don't understand you lose the ability to know whether the output is any good. In his article Nakazawa puts it pretty plainly: with agents, everyone can generate a lot of bullshit all day long. The thing that separates useful output from noise is whether the person directing the model knows what good looks like. You cannot guide something to the right outcome if you cannot recognise the right outcome. The goal shouldn't be to avoid the parts that feel difficult it should be to know them well enough that you can direct, review, and push back when the model gets it wrong.

Those are decisions, not generations, and they are where the design system team of designers and engineers (you do need both) earns its place. The model executes against the system, the system encodes the team's judgement and a central team keeps that judgement accurate so it can be executed against at scale.

Overall thoughts

I may be biased, but for me nothing changes, the design system is still important, we may need to adjust some things sure, but AI certainly doesn't remove the need for a strong, well maintained design system. It makes it matter more!

Going fast against a reference that you are unaware is drifting in the background might get you something faster, but it will also get you lost a lot sooner. Keeping the reference accurate is what will make the speed you're aiming for consistent and successful in the long term.

I wonder if the teams that do well in the long run, will not be the ones that adopted AI the fastest but instead the ones where a central team held the line, so that when they moved fast, they moved in the right direction.