Data Outlives Code | Rajat Kumar

Every line of code you write today will be deleted.

Not might be. Will be. Refactored, replaced, or just abandoned when the next rewrite comes along. The question isn’t whether your code survives. It won’t. The question is whether your data makes it across.

Code is more temporary than we like to admit

I’ve been thinking about this a lot lately.

That service you spent months architecting? Give it five years. It’ll be “legacy.” That framework you picked after all those comparison spreadsheets? The community’s already eyeing the next thing. And that clever abstraction you’re proud of? The next team will rip it out because they don’t get it. And honestly, they’ll probably be right to.

This isn’t a failure. It’s just… how software works.

Look at how fast our “best practices” rotate. jQuery to React to whatever’s next. Monoliths to microservices, and now some teams are quietly consolidating back. REST was the answer, then GraphQL, then gRPC, and now it’s “it depends” again.

Code changes because it should. It reflects what we know now, what tools we have now, what problems we’re solving now. When those shift, the code has to shift too.

But data? Data sticks around.

Somewhere right now, some enterprise is migrating customer records that started life in a COBOL system from the ’80s. The code that created those records? Gone. The developers who wrote it? Retired. But the data (names, addresses, transactions) is still there, still matters.

Here’s the thing: data represents what happened. Code represents how we chose to handle it. Facts don’t go out of style. Methods do.

And if you think about what actually matters to a business? Customer relationships, transactions, user behavior, operational history. It’s all data. Your competitors can study your architecture. They can copy your patterns. They can poach your engineers. But they can’t copy your data. That’s the one thing that gets more valuable over time instead of less.

We invest backwards

Here’s what bothers me.

Most teams put maybe 90% of their energy into code. Architecture debates. Framework decisions. Design patterns. Code reviews. Whether to use Repository pattern or not. Tabs vs spaces. Folder structure arguments.

And data? Data gets maybe 10%. Schema design happens in an afternoon. Column names get abbreviated because who wants to type that much. Business context lives in someone’s head. Migration strategy is “we’ll figure it out.”

But “later” shows up faster than you think.

What it actually looks like to invest in data

This doesn’t mean writing less code. It means treating data as the thing that matters, and code as the thing that serves it.

On naming things: A column called status with values 1, 2, 3? That’s a liability waiting to happen. A column called order_fulfillment_status with pending, processing, shipped? That explains itself. The first one needs “that engineer from 2019 who knows what the numbers mean.” The second one just… works.

On format choices: Boring and portable beats clever and proprietary. JSON, Parquet, Avro. You’ll still be able to read these in twenty years. That custom binary format you optimized for this quarter’s performance numbers? Good luck in five.

On documentation: Write down what the data means, not just what type it is. Nobody cares that cust_ltv is a DECIMAL(10,2). They care that it’s “customer lifetime value, calculated as total purchases minus refunds, updated every night.”

On planning for the inevitable: Before you write the first line of code, ask yourself: “How will we get this data out when we replace this system?” If you don’t have an answer, you’re building a trap.

And now there’s another reason this matters

Here’s something I didn’t anticipate a few years ago: your data isn’t just for future engineers anymore. It’s for future AI tools too.

LLMs are getting remarkably good at understanding codebases, writing migrations, even rebuilding systems from documentation. But they need something to work with. They need context.

If your data is clean (well-named columns, documented schemas, standard formats), an LLM can help you analyze it, migrate it, transform it. It can look at your data dictionary and actually understand what order_fulfillment_status means. It can generate migration scripts that make sense.

But if your data is a mess? Cryptic abbreviations, proprietary formats, business logic that only exists in someone’s memory? The AI is just as lost as a new engineer would be. Maybe more so.

The extractability question used to be: “Can a future human make sense of this?”

Now it’s also: “Can a future AI make sense of this?”

And the answer to both is the same: only if you invested in making your data self-explanatory. Only if you treated it like the asset it is.

This is why I’m excited about tools like MCPs and CLI integrations that let LLMs connect directly to your databases. The storage layer isn’t going away, but the interface to it is evolving fast. And clean, well-documented data is what makes those connections actually useful.

Two teams, same problem

Picture two teams building order management systems.

Team A goes deep on the code. Beautiful domain model. Elegant patterns. 95% test coverage. Their data schema mirrors their object hierarchy perfectly: custom types, internal conventions, the works. The code is genuinely impressive.

Team B keeps the code simpler but obsesses over the data layer. Clear schemas. Business-meaningful names. Standard formats. Documented contracts. Versioning from day one. The code is… fine. Nothing special.

Five years go by. Technology moves on. Both systems need replacing.

Team A’s original engineers are gone. The new folks stare at tables full of abbreviations and undocumented business rules. That “elegant” code doesn’t help. It just enforces assumptions nobody remembers making. Migrating the data becomes archaeology. And when they try to use AI tools to help? The AI hallucinates meanings for columns it can’t interpret.

Team B’s system is just as outdated. But the data tells its own story. The schemas make sense. The formats are standard. Migration is still work, but it’s planned work. Not detective work. And when they feed their docs to an LLM to help with the migration? It actually understands what it’s looking at.

The code disappears either way. The difference is whether the data makes it to the other side.

Here’s how I think about it now

Code is scaffolding. You need it to build. You can’t do anything without it. But scaffolding comes down. That’s the point.

Data is the building. It’s what’s still there when the scaffolding’s gone. It’s what people actually use. And increasingly, it’s what your AI tools need to actually help you.

When you’re writing code, you’re writing a letter to the present: “Here’s how we solve this problem with today’s tools.”

When you’re designing data, you’re writing a letter to the future: “Here’s what happened, what it meant, and everything you’ll need to make sense of it without us.”

Write accordingly.

This is part of a series on what lasts. Next: Build vs Buy Is the Wrong Question explores how to build systems that can die gracefully. And Context Outlives Prompts looks at what this means in the AI era.