The difference between vibe coding and software engineering with LLMs

The process of building big software projects over the past half year has become dramatically easier, because all of the most recent improvements in LLM technology and tooling have encompassed/enabled truly meaningful increases in capability - but large projects are still entirely different than 'vibe coded' apps.

When I think of the term 'vibe coding', I think of quick one-off demos, often heavy in front-end sizzle, but not always deeply capable in terms of integration with complex organizational workflows.

Building real world production software may no longer require lots of hand written code, but it's still hard work. Professional software development typically involves dealing with complex natural situations, in which humans encounter endless edge case workflow exceptions. It involves satisfying groups of human users who all have their own divergent preferences about what constitutes a 'good' user experience. Big software projects typically live in an environment where existing legacy systems need to be connected, and new software and data needs to be regularly integrated. Software exists in a world where bad actors are constantly trying to break into servers to scam a few bucks, despite the outsized havoc those breaches can wreak on an attacked organization.

Large software systems need to constantly evolve to satisfy the unfolding usability requirements that develop as new features are conceived from the existing state of a production system. Every solution to a problem may add some complexity to the system, and that tends to breed entirely new classes of problems. And every new software feature encourages users & stakeholders to conceive even more new features which they'd like conveniently tacked onto an ever evolving foundation of existing functionalities that have been wired together from previous rounds of satisfied requirements.

That constantly evolving situation typically means that developers need to refactor existing code to weave in new capabilities, and/or completely re-write existing database schema, logic, UI code, updated infrastructure, etc. to make room for future growth and new scopes of direction that the software will be pushed to involve itself in.

Those sorts of real-world software challenges obviously go far beyond the simple routines experienced when building one-off 'vibe-coded' demos.

Real software evolves over months, years, and decades. It rarely ever gets conceived once, and then just continues to exist in that state forever. It grows and becomes more complex with new logic and processes. The potential security threats which any of those new processes could potentially expose, need to be regularly evaluated and tested.

Developers need to be able to adjust existing applications without breaking existing functionality, and avoid regressions that derive from integrating newly imagined logic and schema.

Developers need to not only test code for technical correctness, but also take into account feedback about user experience.

Developers need to support existing software when a user inevitably finds a way to break a workflow, by trying to force a procedure to accomplish a task which was never intended or imagined by the developers or the management of the organization who conceived the business logic. Developers need to experience and constantly respond to those unexpected new real-world edge cases.

Developers need to train users who don't know how to do the most basic things with a computer, to perform complex interactive steps with a piece of software that handles complicated processes, often involving financial data & other critically important info. And those operations are often critically time-sensitive. There's always a handful of users in a organization who don't even know what a web browser is, and who can't understand how to perform the common practice of 2-factor authentication, if their life depended on it- and we're expected to build apps which those users can operate intuitively.

Real world software typically can't be out of commission for even a few minutes. Developers need to have perfectly synced development and production environments established, to test new functionalities, in exactly the way users will experience them - and then the new versions of those fully tested code bases need to be pushed instantly and seamlessly to a working environment, where if one little functionality doesn't work for 2 minutes, calls from hundreds of users will flood in, who need to be able to finish a pressing workflow, immediately.

When software updates aren't 100% perfect, developers need to be able to rollback to a previously working production version, instantly, without losing a single byte of data at any moment. Database schemas, and the processes that fill them, need to be resilient and well-conceived enough to ensure that any data entered into a new schema doesn't get lost in a rollback.

And those are just some basics.

Developers need to have their environments and professional practices established so well that multiple change requests can be handled multiple times a day, for weeks/months at a time, day or night, especially when rolling out any new application. And we need to be able to anticipate and juggle the needs of multiple clients at the same time, without missing a beat. Everyone needs a call back immediately, even if you've only slept 5 of the last 72 hours because some downstream provider your client relies on experienced a catastrophic failure with their API server when their OS automatically updated at 2am Sunday morning.

Developers need the experience and the foresight to know that VPS hosts and other service providers will experience unexpected outages, inevitably at the absolute worst possible times (during an initial launch of a new application/feature, of course), and we need to have a fully working emergency workflow in place to handle those outages seamlessly. Users need zero effective down time, and perfect functionality, basically 100% of the time they're working.

That's all just the beginning - the responsibilities involved in building critical business systems go so far beyond that.

Developers need to communicate with clients to understand the intent, purpose, and position of any piece of software, within the full scope of how a business operates. We need to understand the meaning of the data in the database, and how it's used by stakeholders, users, technicians, and operators, to make decisions. We need to infer about how workflows and features can be better conceived to reach an end-goal, by understanding how that data is used together to serve some purpose which isn't digital.

We build technical solutions to effect the quality of people's lives, and to improve the functionality of activities which make up the lifeblood of people's livelihoods. That responsibility can get to be overwhelming at times, when the work of the clients who use our creations to operate their daily chores, gets disrupted by third party failures and challenges that you have no control over (but still have to find solutions for).

Very often, clients have only a very fuzzy conception of the details about how a piece of software actually may even need to work, to accomplish a business goal they have in mind - heck, they may not even fully understand their own business logic and the processes they use regularly. They've just gotten used to following some habitual operational guidelines which were established 20 years ago, when their existing very complex infrastructure was implemented - and the new developers are expected to re-implement that proprietary black-box foundation, as if by magic, because the client is going to take all that existing infrastructure for granted when building a new system to replace it.

It's an experienced developer's role to help root out those issues, identify what the client doesn't understand about how to even conceive a usable system, and then build and iterate on versions of that system, with requirements that may have never been fully disclosed by the client, or which the client thought would simply emerge magically, or which they inferred would just be automatically understood, because the context is so clear in their own daily work, but is totally invisible to a developer who only understands the mechanics involved in saving values to a database table.

Developers therefore need to have lots of domain knowledge about the business workflows their software will be integrated with, and what the daily operations of the users are like, in order to build effective solutions.

Developers need to understand everything about the IT infrastructure, network mechanics, operating systems of both servers and client devices, security machinery, and the full stack of prerequisite software foundations that will be employed - not just the language(s) and framework(s) an application is built upon directly, but everything in the imported libraries, installation tools, and every layer of the entire underlying stack, all the way down to the ones and zeros that CPUs process and transfer between RAM, microprocessors, storage devices, network protocols, etc., so that optimization is possible where it's needed, so that architecture can be built at the layer where it's best established, and so that troubleshooting can be understood and applied at the layer where it's required.

Interestingly, LLMs are becoming more and more capable of helping not just with managing that entire tech stack infrastructure, but also with communications between developers and clients, users, stakeholders, IT support and service providers, etc.

I commonly paste emails and texts from clients and users into ChatGPT, to get a clearer understanding of what they mean to convey, not only about some intended software functionality, but also about how that requirement fits into the likely experience of the users' day-to-day work.

Clients may talk about their vendors and the APIs a developer may need to use, as if those companies and products are common knowledge. Industry-specific services which are well known in one small sector, and which may be used daily within a small ecosystem, are often completely off the radar of even the rest of the nearby industry.

Developers typically need to have deep knowledge about the intricate details of not just such niche services, but also the particular operational routines that each client business has built up from their own unique operational experiences, their local markets, and their particular clients.

For example, an expert in some sector of medical billing, with specialized knowledge about how insurance payers and clearinghouses integrate using ANSI X12 837 EDI standards, may have zero understanding about how a home health agency interacts with mobile clinicians, to efficiently manage care teams for geographically distributed patients. And those disciplines are entirely different from the challenges experienced by a government office which must update a 40 year old database used to organize daily diabetes registry information for more than 600 providers in a state. And that work is very different from the challenges a small vision therapy business may need to solve, in order for eval/re-eval workflows, scheduling, and reporting obligations to be handled smoothly in a busy office, staffed by 1-2 people who run back-to-back daily appointments, intake paperwork, etc.

Those are all actual cases in my relatively recent project history, which are just a few specialized niches in a massively complex 'health care' field - all of which have stringent compliance laws to satisfy.

The process of building software systems to handle critical processes in those environments is absolutely nothing like the process of writing personal utility applications, games, etc. 'Vibe coded' applications are not a part of any software development practice in such niches, but LLMs can certainly be useful in helping a developer understand not only the detailed specific requirements explained by a client, but also how to reason about solving their particular problems. Because LLMs have such a broad depth of knowledge, they can be an extraordinarily useful tool in researching how to approach solving the needs of clients who face even the most obscure challenges. And of course they can be used to write application code, once all the requirements are defined.

And LLMs are useful in handling so much more of the drudgery involved in development-adjacent work. I use them to help compose requests of IT managers, security teams, project managers, etc. I use LLMs to walk through the details of infrastructure installation, and to understand the documentation of systems that my software needs to connect with, APIs I need to interact with, etc. I use LLMs to build documentation for users, and to write invoices. And all of that generated communication gets saved in my chat histories, so every detail of the surrounding context, including how the final clear thoughts evolved from fuzzier understandings and logical revisions, can be instantly retrieved, reviewed, evaluated, and worked into in future conversations...

So, the ability for LLMs to write perfect code first-shot is just a small piece of the puzzle. Even as LLMs become smart enough to engineer very large and specialized systems, that's still just a small part of whole the picture.

Software needs to satisfy human needs, and that goal requires the involvement of human experience, communication, and iteration, to build systems which fit human users' preferences, and situations like a glove.

Doing all that work is a very different experience than vibe-coding a flashy little one-off generic 3D game, or building an aesthetically pleasing/entertaining novelty app, or animating a shiny informational web page. Vibe coded demos, and even functionally useful vibe coded applications, can certainly be impressive to look at, but they rarely encompass the complexity and endless specificity involved in real-world production software that must scale to handle real world challenges in a relentlessly busy environment.

Building real-world systems goes beyond even all the technical engineering challenges. It involves human interaction and preferences. Handling that human interaction well is about orchestrating the technologies which are so captivatingly capable of automatically generating cool little generic vibe coded apps, to produce far more specialized solutions that satisfy extremely precise requirements, to improve complex, layered, unique human experiences, in the end.

So not only is software development different from 'vibe coding' in all those ways, I believe there will likely always be some place for human work in all the various sorts of systems engineering disciplines that exist, to align all the constantly improving technological capabilities we're seeing evolve so quickly, with human needs, desires, and feelings.

To be clear, I think of 'vibe coding' as being mostly about letting an LLM make a cool little app on it's own, typically without much real world context or many edge case requirements to satisfy.

Software engineering is about orchestrating many more complex pieces, and integrating those pieces to fit the exact requirements of a large system, which works perfectly as intended, within a messy surrounding ecosystem, to accomplish extremely specific goals, along a constantly changing and evolving development workflow, where no functionality can ever break from a user's perspective, where all security and compliance obligations are met, and the human users are satisfied with a piece of digital machinery that fits perfectly into their existing complex workflow.

I've been writing code with LLMs as part of my daily professional life, for about 3 1/2 years. The journey has been amazing, and after more than 3 decades of professional software development experience, the way I approach not only software development, but he way I interact with clients, their staff, IT support teams, end users of applications, server infrastructure, etc., has been utterly transformed.

I've gained a lot of perspective about how people from all walks of life perceive the benefits, drawbacks, and effectiveness/usefulness of 'AI' in general. My girlfriend is a vision therapist, and she's still asking when AI is going to impact her work in any noticeable way whatsoever (I expect robots will eventually replace even the work she does, but that's another topic...).

I remember watching when people with zero software development experience first began asking ChatGPT 3.5 to build simple web pages, and they were typically utterly disappointed by the output. Back then, GPT would output some code, limited to a total context of 4000 tokens, of course without any images displayed, often without any style sheets attached, and which often needed to be installed on a server, where some supporting libraries needed to be installed, in order for the generated code to work at all.

At that time, generated code of any kind would often need some tweaks to work well, and it certainly needed to have an environment prepared to work whatsoever. When a person with no experience looked at the code result, it appeared to be totally dysfunctional garbage. Even though much of that early generated code included remarkably useful content, it typically needed to be groomed and properly integrated into a larger context where it could be deployed and usefully employed. To non-technical users who didn't have enough knowledge to connect any of those dots, the output was basically useless.

And of course, some extrapolation of that sort of challenge was true for developers with any level of experience.

With a context limit of only 4000 tokens, it was hard for developers to define more than just a simple function, a limited number input parameters, some basic logic, and the intended output values, before the capacity of early LLM systems were exceeded. Users who didn't understand how to confine the entire scope of a chat interaction to that context limit, would often be surprised that the LLM's response didn't make any sense at all. The LLM would lose the initial part of a conversation before the current piece of conversation even had a chance to be attended to.

When GPT's context limit expanded to 12,000 tokens, all of a sudden we could paste in larger existing functions, and lots more description/explanation about logic requirements, for example. But it was also immediately clear that LLMs were far better trained on certain code ecosystems than others.

Ask ChatGPT to produce some working Python code, and it appeared to be an experienced engineer at times. Ask it to produce some working code in an obscure language like Rebol, and it seemed to be an utterly incapable moron. In GPT's training corpora, the parameters had been exposed and adjusted to process billions of tokens of existing Python code & documentation, so it had a better working knowledge of Python, of course.

As new models were released with far greater context capabilities, much larger training corpora, instilled reasoning capabilities, and generally more emergent intelligence, thoughtful long task ability, reduced tendency to hallucinate, and better overall precision, understanding, knowledge, etc., they were just able to inherently accomplish more useful goals, without as much hand-holding, without precise prompts, etc.

The most popular public facing chat systems like ChatGPT, Claude, Gemini, and Grok built in workspaces and tool use features which gave the LLMs the ability to actually run code, for example, so users didn't need to stand up their own environments to see that code work. The LLM could, for example, run the server code in a Flask app it created, so that front ends could be presented in the complete context required to show full-stack interactions with server code, as well as display automatically generated images, styling, etc.

Even more important than that, chat systems were built to connect with tools that provided specialized functional capabilities.

Remember when ChatGPT would respond with an answer such as 'this question refers to an event after my last training date in February 2022'? They added a search tool, so that the LLM could look up new information online, reason about the meaning of that info in context, and return an intelligent, useful result to the user.

Similarly, LLM engineers also realized that GPT could complete only a few levels of the Towers of Hanoi game, but when they gave it access to a Python interpreter, the LLM could reason about how to write code to complete many more levels. So 'tool use' became a mantra.

Companies improved pre-training and post-training methodologies, added exponentially more data and hardware to build bigger LLMs which could handle greater context, trained in reasoning loops, added tools and agentic loops which enabled LLMs to iterate through debug sessions, install internally usable infrastructure, spawn separate worker contexts, etc., as needed to run and test generated code, and they generally improved the performance of those integrated systems, until Andrej Karpathy coined the term 'Vibe Coding' in 2025.

The LLM/chat/tool/agent ecosystems had evolved to a point where they could produce interesting working applications all on their own. And in the last few months, it's become much more generally accepted that the best frontier models will spit out more reliable code, more quickly, than even the most experienced developers, and that a properly engineered workflow will typically make use of an LLM to test and evaluate it's own code revisions (and multiple LLMs can be employed to evaluate each others' work...). Anthropic's unreleased Mythos model has apparently beaten 30+ years f human tech work uncovering security threats in the most popular and large applications ever created by humanity. LLMs are getting better than us at many things.

The current frontier models are capable of generating bug-free working code, first-shot, for many complicated tasks, especially when the programming languages and frameworks the LLMs are most deeply trained in, are used. LLMs and agent harnesses keep getting better, but that capability still needs to be directed.