Vibe coding wins the sprint and loses the codebase

Vibe coding wins the sprint and loses the codebase

Quick vibe coding is undeniably fun. You describe what you want, the model fills in the blanks, and a rough product appears before the coffee gets cold. That speed is real, and it matters. Even official guidance around vibe coding treats it as a strong fit for proof-of-concept work, draft applications, and personal tools, where the goal is momentum rather than long-term stewardship.

The catch is simple: software is not judged only by how quickly it appears, but by how safely it changes. And that is where the mood changes. Google’s 2025 DORA research found very broad AI adoption in software teams, with reported productivity gains and many respondents saying AI helps code quality, but the same research also describes a trust gap and argues that successful adoption is a systems problem, not just a tooling problem.

Why the first version is the easy part

The first version of an app has one job: exist. It only needs to prove that an idea can become a screen, a workflow, a dashboard, a small utility, an internal demo. That is exactly why vibe coding feels so powerful. It collapses the distance between “I have an idea” and “I can click it.” Google Cloud’s definition of vibe coding is blunt about that trade-off: in its purest form, it is close to “forgetting that the code even exists,” and it is best suited to rapid ideation and throwaway weekend projects. GitHub’s own tutorial frames the same approach as useful for proofs of concept, drafts you will later harden, or software for personal use.

That framing matters because it quietly contains the answer to the maintenance question. A prototype and a product are not the same asset. A prototype optimizes for speed of birth. A product optimizes for speed of change. The first one rewards improvisation. The second one punishes it.

Maintenance is where software reveals its true structure

Maintenance is not just bug fixing. It is understanding intent six weeks later. It is changing one function without breaking three unrelated flows. It is handing the code to another developer without also handing them a detective novel. It is figuring out whether a scary-looking conditional is dead code, business logic, or a workaround for a production incident no one documented.

This is where AI-generated software can become expensive in a very old-fashioned way. A large 2025 GitClear analysis of 211 million changed lines of code found that the share of changed lines associated with refactoring fell sharply from 2021 to 2024, while copy-pasted or cloned lines increased. That does not prove every AI-assisted workflow produces messy code, but it does reinforce a familiar risk: speed can raise output faster than it raises coherence.

When coherence drops, maintenance slows down first. The symptoms are easy to recognize. The naming drifts. Similar features use different patterns. Business rules live half in the UI and half in the backend. Error handling is inconsistent. The code works, but only in the nervous way that makes every future edit feel more dangerous than it should.

GitHub’s own Copilot guidance quietly points to the same problem. It tells users not only to check functionality and security, but also readability and maintainability, and to rely on automated tests and tooling rather than trust generated output at face value. That is not a warning against AI coding. It is a reminder that generated code still has to join a living system.

Upgrades are where the vibes run out

Nothing exposes weak software faster than an upgrade. A major framework release, a runtime bump, a database driver change, a new authentication library, a package deprecation, a stricter type checker, a build pipeline migration, a breaking API contract. These are the moments when software stops being a demo and starts revealing whether it was assembled or engineered.

That is also why upgrades are such a sharp test for vibe-coded projects. Upgrades force exactness. They care about version boundaries, hidden dependencies, undocumented assumptions, obsolete config, and behavior that “worked fine before.” A human may tolerate fuzzy architecture during prototyping because the system is still small. Upgrade work does not. It drags every implicit decision into daylight.

GitHub’s own Copilot review documentation contains a revealing detail here: certain file types are excluded from Copilot code review, including dependency-management files and many config-adjacent files such as package-lock.json, requirements.txt, build.gradle, tsconfig.json, and others. In other words, some of the files most likely to matter during upgrades are exactly the files where you cannot assume AI review is covering the full risk surface.

That does not make AI useless for upgrades. It does mean upgrades demand a different posture. You are no longer asking, “Can the model make this work?” You are asking, “Can this system be changed predictably, with rollback paths, test coverage, and enough internal consistency that the next change is cheaper instead of harder?”

AI is still valuable after the prototype phase

The good news is that maintenance and modernization are not outside AI’s reach. In fact, the strongest vendor documentation now assumes the opposite. GitHub has dedicated guides for upgrading projects and modernizing legacy code. Its upgrade workflow explicitly says Copilot can analyze a project, generate a plan, fix issues it encounters while executing that plan, and produce a summary. Its modernization workflow emphasizes refactors, documentation, generated tests, running those tests, and refining the code after verification.

OpenAI’s Codex guidance pushes in the same direction. It recommends that teams do not stop at generation, but ask the agent to create tests, run checks, confirm results, and review the work before acceptance. OpenAI also notes that Codex can be guided by AGENTS.md files and says that, at OpenAI, Codex reviews 100% of pull requests. That is the important shift in mindset: the mature use of coding agents is not “generate and hope.” It is generate, verify, review, and encode standards.

Anthropic’s Claude Code best practices make the same point even more sharply. The company calls giving the model a way to verify its own work the single highest-leverage step, recommends tests or expected outputs, and promotes a workflow of explore first, then plan, then code. That is almost the opposite of pure vibe coding, and for good reason. Production code needs explicit success criteria. Otherwise the human becomes the only feedback loop, which is exactly what teams are trying to escape.

So yes, AI can help with maintenance and upgrades. But it helps most once the work stops being purely improvisational.

The missing layer is engineering memory

The real problem with raw vibe coding is not that the model writes code. It is that the code often arrives without durable memory around it. No stable conventions. No architectural explanation. No “why.” No repository-specific rules. No house style for tests. No note on what must never change. No map of where the sharp edges are.

That is why instruction files are becoming central to serious AI-assisted development. Anthropic recommends a concise CLAUDE.md that captures code style, workflow rules, testing instructions, repository etiquette, architectural decisions, and non-obvious gotchas. GitHub supports repository instructions and AGENTS.md files so agents can inherit project-specific behavior. OpenAI makes the same case with AGENTS.md, explicitly tying better agent performance to clear documentation, configured environments, and reliable testing setups.

This is not administrative busywork. It is how you turn one-off prompting into maintainable collaboration. A repository with durable instructions gives the model continuity. A repository without them forces the model to rediscover the project every time, which is just another way of saying it forces the team to pay the context tax over and over.

Documentation matters for the same reason. GitHub’s legacy-code guidance is refreshingly practical on this point: unclear legacy systems become harder to maintain because documentation decays, context leaves with people, and hurried patches turn code into a tangle. The company’s advice is not merely to generate new code faster, but to use AI to explain old code, document assumptions, generate docstrings, clarify edge cases, and leave the codebase cleaner for the next person.

What a production-ready vibe workflow actually looks like

The smart move is not to reject vibe coding. It is to put it in its place.

Use pure vibe coding for disposable work. If the app is a mockup, a design probe, an internal toy, or a one-week experiment, optimize for speed and learning. Do not pretend it has already earned the right to become the production codebase.

Add verification before you add more generation. Tests, linters, screenshots, type checks, expected outputs, and reproducible commands are what convert AI speed into safe iteration. Anthropic and OpenAI both treat this as core practice, not optional polish.

Freeze conventions into the repo. Put coding rules, architecture notes, test commands, migration rules, and review expectations where the agent can read them every time. The goal is not more prompting. The goal is fewer repeated prompts and fewer silent assumptions.

Treat upgrades as a separate discipline. Dependency updates, framework migrations, runtime changes, and config rewrites deserve a narrower, more deliberate workflow. They should be chunked into small, reversible changes with clear validation after each step. GitHub’s own upgrade documentation is plan-driven for a reason.

Keep human review focused where the risk really lives. AI review is useful. Automated checks are necessary. But dependency files, configs, generated artifacts, architectural trade-offs, and backward compatibility still need deliberate human attention. GitHub’s exclusions make that impossible to ignore.

The real divide is not human coding versus AI coding

The strongest teams will not be the ones that refuse vibe coding, and they will not be the ones that worship it. They will be the ones that understand where it belongs in the lifecycle.

The prototype phase rewards speed, curiosity, and low-friction experimentation. The maintenance phase rewards clarity, repeatability, and controlled change. The upgrade phase rewards system knowledge, test discipline, and respect for hidden complexity. AI can contribute to all three, but not in the same way.

That is also the deeper message in the DORA research. AI behaves like a multiplier. In teams with healthy systems, it compounds good habits. In teams with weak standards, fragmented ownership, or legacy bottlenecks, it accelerates disorder just as efficiently.

So yes, quick vibe coding is super. It is one of the fastest ways the software industry has ever found to turn intent into working output. But the real product does not begin when the code first runs. It begins when the code must survive change. Maintenance and upgrades are not side quests after the fun part. They are the point at which software either becomes an asset or reveals itself as a temporary performance.

Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

Vibe coding wins the sprint and loses the codebase
Vibe coding wins the sprint and loses the codebase

Sources

What is vibe coding?
Google Cloud’s overview of vibe coding, including its definition, practical use cases, and why it fits rapid ideation better than long-lived systems.
https://cloud.google.com/discover/what-is-vibe-coding

Vibe coding with GitHub Copilot
GitHub’s official tutorial presenting vibe coding as suitable for proofs of concept, draft applications, and personal-use software.
https://docs.github.com/en/copilot/tutorials/vibe-coding

How are developers using AI? Inside our 2025 DORA report
Google’s summary of the 2025 DORA findings on AI adoption, productivity, trust, and the role of systems and workflows in successful implementation.
https://blog.google/innovation-and-ai/technology/developers-tools/dora-report-2025/

Best practices for using GitHub Copilot
GitHub’s official guidance on reviewing generated code for maintainability, readability, security, and correctness, with emphasis on tests and tooling.
https://docs.github.com/en/copilot/get-started/best-practices

Files excluded from GitHub Copilot code review
GitHub’s documentation listing file types Copilot review does not cover, including many dependency and configuration files relevant during upgrades.
https://docs.github.com/en/copilot/reference/review-excluded-files

Upgrading projects with GitHub Copilot
GitHub’s official workflow for AI-assisted project upgrades, including analysis, planning, issue fixing, and upgrade summaries.
https://docs.github.com/en/copilot/tutorials/upgrade-projects

Modernizing legacy code with GitHub Copilot
GitHub’s guide to refactoring, explaining legacy code, generating tests, and improving modernization work through verification.
https://docs.github.com/en/copilot/tutorials/modernize-legacy-code

Best Practices for Claude Code
Anthropic’s official guide covering verification, planning before implementation, repository instruction files, and safer agentic coding workflows.
https://code.claude.com/docs/en/best-practices

Best practices for Codex
OpenAI’s official guidance on using coding agents with tests, reviews, repository instructions, and repeatable workflows.
https://developers.openai.com/codex/learn/best-practices/

Introducing Codex
OpenAI’s launch post describing how Codex works in sandboxed environments, runs tests and checks, and follows repository standards through AGENTS.md.
https://openai.com/index/introducing-codex/

AI Copilot Code Quality 2025 Research
GitClear’s large-scale code-change analysis focused on cloned code, refactoring decline, and maintainability concerns in AI-assisted development.
https://www.gitclear.com/ai_assistant_code_quality_2025_research

Documenting and explaining legacy code with GitHub Copilot
GitHub’s practical article on using AI to explain older systems, generate documentation, clarify edge cases, and improve maintainability.
https://github.blog/ai-and-ml/github-copilot/documenting-and-explaining-legacy-code-with-github-copilot-tips-and-examples/