A software engineer in 2026 who refuses to touch AI tools can still write code, ship features, run servers, and close tickets. Nothing technical stops them. The work gets done. It just gets done slower than the person at the next desk, and slower than the competitor across town, and slower than the contractor in another country who bills less and delivers more. That gap is the whole argument. Comparing modern IT work without AI to travelling without any means of transport captures the feeling exactly: you can walk anywhere on foot, and you will still arrive, but almost no one chooses to walk from one city to the next when everyone around them is driving.
Table of Contents
The metaphor that frames an entire industry’s anxiety
The comparison lands because it is mostly true and because it carries a sting. Walking is not impossible. It is just an increasingly strange decision when the destination is the same and the clock is running. A developer who codes the way developers coded in 2019 is not breaking any rule. They are simply opting out of a productivity layer that most of their peers now treat as ordinary equipment, the way a carpenter treats a power drill rather than a hand brace.
What makes the metaphor worth taking seriously, rather than repeating as a slogan, is how completely the numbers now back the underlying claim. AI assistance in the IT sector is no longer a fringe habit or an early-adopter quirk. Google’s 2025 DORA report, built on responses from nearly five thousand technology professionals, found that ninety percent of them use AI in their daily work, a fourteen-point jump in a single year, with a median of about two hours a day spent working alongside these tools. JetBrains, surveying more than twenty-four thousand developers across 194 countries, put regular AI use at eighty-five percent. Stack Overflow’s annual survey of tens of thousands of developers found that eighty-four percent use or plan to use AI tools, up from seventy-six percent the year before. Three independent studies, three different methods, one direction.
The point of this article is not to cheer that trend or to dismiss it, but to test the metaphor against what the evidence actually shows. Some of what follows confirms the comparison. A normal working day in software, operations, and security now flows through AI in ways that would have looked like science fiction a few years ago, and the professionals who opt out pay a real and growing cost in speed and market relevance. Other parts of the evidence complicate the picture in ways the slogan hides. AI in IT is not a clean, neutral vehicle that drops you at the same destination faster. It sometimes drops you somewhere slightly different, with new risks loaded into the trunk, and the measured productivity gains are messier and more contested than the marketing suggests.
Holding both of those truths at once is the only honest way to read the situation. The IT sector has crossed a threshold where ignoring AI is a defensible personal choice and an indefensible competitive one. Yet the same body of research that proves near-universal adoption also documents a widening trust gap, a productivity paradox that embarrasses vendor claims, and a long list of failures from teams that bought the tools and never saw the promised return. The metaphor of travel without transport is a good starting point precisely because it forces the real question into the open. The question is not whether you can do IT work without AI. You obviously can. The question is what it costs you, what it costs the people you work with, and whether the thing you are refusing is actually a vehicle or something stranger that the simple comparison cannot quite contain.
Travel without a vehicle is the wrong half of the comparison
The transport image works on one axis and quietly fails on another, and separating the two is the fastest way to understand what AI has actually done to software work. On the axis of speed and effort, the comparison is close to perfect. A person on foot and a person in a car are trying to reach the same place. The walker is not doing anything wrong. They will get there. The driver simply gets there sooner, with energy left over, and can make the trip several times while the walker is still on the first leg. Translate that into a development team and the picture is recognisable. Two engineers receive the same feature request. One drafts the boilerplate, the test scaffolding, and the first rough implementation by hand. The other has a model produce all of that in seconds and spends the saved time on the parts that need judgement. Over a quarter, the difference compounds into a different volume of shipped work.
That is the half of the metaphor that holds, and it explains the adoption curve better than any vendor pitch. People did not adopt AI coding tools because a marketing campaign told them to. They adopted because the person beside them was suddenly clearing the backlog faster, and standing still felt like falling behind. Competitive pressure, not enthusiasm, is what moved the numbers from a minority habit to a near-universal one in roughly three years.
The other half of the metaphor is where it breaks, and the break matters more than the fit. A vehicle is neutral with respect to the destination. A car does not change the city you are driving to, the quality of the meeting you attend when you arrive, or the soundness of the decision you make once you are there. It only changes how long the trip takes. AI in software is not neutral in that way. The output it produces is the destination, not the road. When a model writes the function, the function it writes is the product, and that function can be elegant or subtly broken, secure or quietly exposed, maintainable or a future liability that no one notices until it fails in production. A faster car cannot give you a worse meeting. A faster code generator absolutely can give you worse code, and frequently does, in ways that only surface weeks later.
This is why the slogan, taken literally, misleads even while it persuades. It frames AI as pure acceleration toward an unchanged goal, when the technology actually reshapes the goal itself. The work that remains for the human is not the same work done faster. It is different work: less typing, more reviewing; less authoring, more directing; less time spent producing the first draft, more time spent deciding whether the draft is safe to trust. The professional who adds AI to their workflow is not a walker who bought a car. They are a worker whose job description shifted under them. That shift is the real story, and the rest of this analysis traces where it leads, what it costs, and where the people who built their careers on the old definition of the job now stand.
The adoption numbers stopped being debatable in 2025
For a few years it was reasonable to argue about how real the AI shift in software was. Surveys could be cherry-picked, vendor telemetry could be inflated, and a loud minority of skeptical engineers could plausibly claim that serious professionals were ignoring the hype. That argument is now hard to sustain, because the measurements converge from too many independent directions to dismiss as marketing.
Start with the studies that try hardest to be rigorous. The 2025 DORA report from Google Cloud surveyed nearly five thousand technology professionals and ran more than a hundred hours of qualitative interviews, and it reported ninety percent AI adoption among software development professionals, up fourteen points year over year, with respondents spending a median of around two hours a day working with AI tools. JetBrains, in its State of Developer Ecosystem 2025 study covering 24,534 developers across 194 countries, found that eighty-five percent regularly use AI tools for coding and that sixty-two percent rely on at least one AI coding assistant, agent, or AI-enabled editor. Stack Overflow’s 2025 survey, with responses from tens of thousands of developers, recorded eighty-four percent using or planning to use AI tools, continuing a steady climb from roughly seventy percent in 2023 and seventy-six percent in 2024.
Telemetry from inside engineering organisations points even higher in active cohorts. The analytics firm DX, drawing on a sample of more than 135,000 developers, reported ninety-one percent AI adoption within that population and found that about twenty-two percent of merged code was AI-authored, with an average of roughly 3.6 hours saved per developer per week and daily users merging meaningfully more pull requests than occasional users. Jellyfish, tracking its own platform data through the year, watched code-assistant adoption rise from 49.2 percent in January to sixty-nine percent in October, peaking near seventy-three percent in August, and reported that almost half of companies now have at least half of their code generated with AI assistance, up from about twenty percent at the start of the year.
The startup end of the market has moved furthest of all. Y Combinator reported that a quarter of the companies in its Winter 2025 batch had codebases that were ninety-five percent AI-generated. That figure describes a particular slice of the industry, young companies building from scratch with no legacy weight, and it does not generalise to a bank’s core systems or a hospital’s records platform. It does, though, mark the leading edge of a direction the rest of the field is moving toward at its own pace.
Two cautions keep this from being a simple growth story. First, adoption is not the same as benefit, a distinction the rest of this article returns to repeatedly. A tool can be used everywhere and still fail to move the metrics that matter, and a striking amount of the 2025 evidence shows exactly that. Second, a stubborn minority remains outside the trend. JetBrains found fifteen percent of developers had not adopted AI tools in their daily work, whether from skepticism, security policy, regulatory constraint, or plain preference. That fifteen percent is not negligible, and writing them off as dinosaurs would be both rude and inaccurate, since some of them work in environments where the cautious choice is the correct one. But the shape of the distribution is unmistakable. The default behaviour of the profession has flipped. A few years ago, using AI to write production code was the unusual choice that needed explaining. Now it is the absence of AI that draws a raised eyebrow, and the engineer who avoids it is the one who has to justify the decision to a manager watching the team’s throughput.
A normal day in IT now runs through a model
Describing adoption in percentages hides what the change actually feels like at a desk. The median figure of roughly two hours a day spent with AI tools is not two hours of dramatic, headline-grabbing automation. It is an accumulation of small, ordinary moments spread across the working day, most of which would have been done by hand a few years ago and are now done in conversation with a model.
A developer opens their editor and the autocomplete no longer suggests a single variable name. It proposes the next several lines, sometimes the whole function, drawn from the surrounding context and the patterns it has learned. They accept some suggestions, reject others, and edit a third group into shape. When they hit an error they do not understand, they paste the stack trace into a chat window and get an explanation in plain language, along with a candidate fix, in less time than it would take to phrase a search query. When they inherit a tangle of undocumented legacy code, they ask the model to explain what a module does before touching it, which turns an afternoon of archaeology into a ten-minute briefing.
The same pattern repeats across tasks that used to be tedious by definition. Writing unit tests, generating boilerplate, drafting SQL queries, producing first-draft documentation, summarising a long pull request, writing a clear commit message, translating a snippet from one language to another, and sketching a regular expression are now routinely handed to a model and corrected by a human rather than typed from scratch. None of these is glamorous. Together they are exactly the friction that used to eat the hours between the interesting parts of the job, and removing that friction is what most developers mean when they say AI made them faster.
For many engineers the experience has reorganised attention more than it has reorganised hours. The model takes the first pass at the mechanical work, and the human moves up a level to deciding whether the pass is correct, whether the approach fits the system, and whether anything important was missed. That is a genuine improvement when the human still has the expertise to judge the output, and a genuine danger when they do not. The person who can read AI-generated code critically gets a powerful assistant. The person who cannot gets a confident-sounding collaborator whose mistakes they are no longer equipped to catch. The tooling is identical in both cases. The outcome diverges entirely on what the human brings to the conversation, and that divergence runs underneath almost every other finding in this field.
The jump from autocomplete to agents that act
The fastest way to misjudge AI in IT is to picture it as the autocomplete of 2023. That version suggested the next line and waited. What arrived through 2025 and into 2026 is a different class of tool that does not wait. It plans, edits across many files, runs the test suite, reads the failures, fixes its own mistakes, and opens a pull request for a human to review. The shorthand the field settled on is the coding agent, and the gap between an agent and the old autocomplete is the gap between a colleague who hands you a sentence and a colleague who hands you a finished branch.
The clearest measure of how fast this moved is SWE-bench, a benchmark built from real GitHub issues drawn from open-source projects, which asks an AI system to resolve a genuine bug or feature from start to finish. In 2023, leading systems solved roughly four percent of these tasks. By 2026, top models paired with a capable agent harness clear between seventy and ninety percent of the verified benchmark, with harder long-horizon variants sitting lower but still far beyond anything that looked plausible two years earlier. The detail that matters for practitioners is hidden in the phrase “paired with a harness.” Performance comes from the model and the scaffolding around it together, the loop that lets the model use a terminal, read files, run commands, and recover from errors. A strong model in a weak harness underperforms a slightly weaker model in a well-built one. The agent is the model plus its environment, not the model alone.
The product market filled in quickly behind the benchmark. Cognition’s Devin takes a high-level task and works through it in its own sandbox with a browser, terminal, and editor, then opens a pull request and integrates with the tools teams already use to track work. OpenAI’s Codex grew into a multi-surface platform, available in a CLI, a desktop app, IDE extensions, and the web, and by March 2026 it reported more than two million weekly active users, with OpenAI positioning it as a general enterprise agent rather than a coding-only tool. Replit’s agent went further toward full autonomy for application building, shipping a version with parallel task forking that resolves merge conflicts on its own most of the time, on the back of a funding round that valued the company at nine billion dollars. Amazon’s Q Developer, JetBrains’ Junie, and the open-source OpenHands project rounded out a field where the question stopped being whether agents could do real work and became which agent to trust with which task.
The honest framing for all of this is the one experienced engineers already use among themselves. An agent behaves like a capable junior who works independently but needs supervision. It is fast, tireless, and genuinely useful on well-defined tasks: a bug with a clear reproduction, a feature with a precise specification, a dependency migration, a batch of tests, a mechanical refactor. It is unreliable on tasks that require deep context about why a system is built the way it is, judgement about trade-offs the specification does not mention, or awareness of consequences two systems away from the change. The teams getting value from agents are the ones that treat them like a junior whose work is always reviewed, not like a senior whose word is trusted. That distinction is not a temporary limitation that the next model release will erase. It is the structural reason the human did not disappear from the loop, and it sets up the central tension of the productivity research that follows.
The tools that came to define the category
Beneath the abstractions sit specific products that most IT professionals now name without thinking, the way an earlier generation named their editor or their version control system. The market consolidated around a handful of tools that each took a different bet on how AI should fit into the work, and the adoption figures behind them show a category that is large, fast-growing, and still sorting out its winners.
GitHub Copilot remains the most widely used, the default that arrived early through its tie to GitHub and Microsoft and reached roughly twenty million users. It became the tool people compared everything else against, and it expanded from autocomplete into chat, code review, and its own coding agent. Cursor, an AI-first editor built as a fork of a familiar environment, grew at a pace that startled even optimistic observers, reaching about two billion dollars in annual recurring revenue by early 2026 on the strength of developers who wanted the model woven into every part of the editing experience rather than bolted on. Claude Code, a terminal-first agent, reached around eighteen percent adoption among developers by January 2026 and recorded the highest satisfaction score of the tools measured in one early-2026 pulse survey, a signal that the people using it tended to keep using it. OpenAI’s Codex, as noted, crossed two million weekly active users and spread across surfaces.
First table — representative AI coding tools and their primary bet, early 2026
| Tool | Type | What it is built around |
|---|---|---|
| GitHub Copilot | IDE assistant and agent | Broad default reach, deep IDE and GitHub integration |
| Cursor | AI-first code editor | Model woven through the whole editing experience |
| Claude Code | Terminal-first agent | Autonomous multi-file work under developer supervision |
| OpenAI Codex | Multi-surface agent platform | CLI, desktop, IDE, and cloud delegation |
| Devin (Cognition) | Autonomous SWE agent | Sandboxed end-to-end task completion and pull requests |
| Amazon Q Developer | Cloud-integrated assistant | Feature work, refactoring, and AWS-native upgrades |
The table compresses a fast-moving field into one snapshot, and the categories blur in practice, since most of these tools now offer both assistant and agent modes. The point is that the category is no longer experimental; it is a competitive software market with billions in revenue, distinct philosophies, and real switching costs.
What the product race obscures is how little the choice of tool determines the outcome compared with how the tool is used. The same agent that lets a disciplined team ship reviewed, tested work faster will let an undisciplined team ship unreviewed, untested work faster, and the second outcome is worse than shipping nothing. Vendors compete on benchmark scores and feature lists because those are measurable and marketable. The variable that actually predicts whether a team benefits is the surrounding practice, the review culture, and the judgement of the people steering the tool, and no purchase order can buy those. That gap between the tool and the result is where the most uncomfortable research of the past year lives, and it is worth looking at directly.
The productivity paradox nobody markets
The most useful piece of research to come out of 2025 is also the most awkward for the industry that sells these tools, and anyone reasoning about AI in IT should sit with it rather than skip past it. METR, a nonprofit research institute that evaluates AI systems, ran a randomised controlled trial with experienced open-source developers working on large, mature codebases they knew well. The developers expected AI to speed them up by about twenty percent. After the trial, they still believed it had sped them up by roughly twenty percent. Measured against the clock, they were about nineteen percent slower when using the AI tools. The gap between what they felt and what actually happened was close to forty points, and it persisted even after the work was done.
That result does not prove AI makes developers slower in general, and it is important to say so clearly, because the finding is easy to weaponise in either direction. METR studied a specific population under specific conditions: senior engineers, deeply familiar with complex repositories, doing real work on systems they already understood. That is close to the worst case for AI assistance, because the human’s own expertise was already fast, the codebase was full of context the model could not fully grasp, and the time spent prompting, reading, and correcting the model’s output exceeded the time it saved. In greenfield projects, unfamiliar languages, boilerplate-heavy tasks, or the hands of a less experienced developer, the balance can tip the other way. The study’s real lesson is narrower and more durable than “AI is slower.” It is that the feeling of speed and the fact of speed are different things, and that humans are systematically bad at telling them apart when a tool makes the work feel smoother.
Telemetry from engineering organisations tells a complementary story about where the time goes. Analysis from Faros, drawn from real data across more than a thousand teams, found that developers on high-adoption teams completed about twenty-one percent more tasks and merged ninety-eight percent more pull requests, which sounds like a clean win until the next number lands: pull request review time rose about ninety-one percent. The individual gains were real, but they piled up at the review stage, where humans still have to read, understand, and approve everything before it ships. The output of the fast part of the process became the input to a slower part that AI did not accelerate, and a good deal of the promised velocity disappeared into longer review queues.
This is the productivity paradox in plain terms. Vendors advertise gains of fifty to a hundred percent, and individual developers report feeling much faster, while the measured effect on end-to-end delivery is modest, uneven, and sometimes negative. The mechanism behind the illusion has a name some researchers use: visible-work bias. Generating code is visible and satisfying, so it feels productive, while the invisible work of reviewing, debugging, and integrating that code expands quietly and gets undercounted. The bottleneck in software was rarely the speed of typing. It was understanding, coordination, and verification, and those are exactly the parts AI accelerates least.
The honest conclusion is not that AI fails to help. It is that the help is real but bounded, concentrated in specific kinds of work, and routinely overstated by both the people selling the tools and the people using them. A professional who understands this can use AI well: deploy it where it genuinely saves time, stay skeptical of the feeling of speed, and invest in the review and integration capacity that the extra output demands. A professional who believes the marketing will generate more code faster, feel more productive, and wonder why the team’s actual delivery did not improve and its instability got worse. The tools are not the problem. The expectations attached to them are, and the gap between adoption and benefit that runs through this article starts right here.
Usage climbs while trust quietly falls
A strange thing happened as AI tools spread through the profession. The more developers used them, the less they trusted them. Those two lines usually move together, since people adopt tools they believe in, but in software through 2025 they crossed, and the divergence says something important about the maturity of the relationship.
Stack Overflow’s survey data captured the shift cleanly. Positive sentiment toward AI tools, which had sat above seventy percent in 2023 and 2024, fell to around sixty percent in 2025, even as the share of developers using the tools kept climbing. Trust in the accuracy of AI output dropped further on some measures: a notable portion of developers, more than four in ten by Stack Overflow’s reading, said they distrust the accuracy of what AI produces, against a smaller group who said they trust it. Other compilations put active trust in AI output below thirty percent by early 2026, down sharply from the year before. The exact figures vary by survey and wording, but the direction is consistent across all of them.
The reason is not mysterious, and it is the opposite of disillusionment with a fad. It is what happens when a tool stops being a novelty and becomes a daily instrument whose failure modes you have now met in person. A developer’s first month with an AI assistant is full of pleasant surprises. The sixth month includes the afternoon lost to a confidently wrong suggestion, the security bug that slipped through because the generated code looked plausible, the hallucinated function that does not exist, and the subtle logic error buried in code that was syntactically perfect. Trust built on first impressions gives way to a calibrated, experienced wariness. Developers did not lose faith in AI because it disappointed them entirely. They lost the naive version of that faith because they learned where the tool actually fails.
That maturing skepticism is healthier than the enthusiasm it replaced, and it fits the rest of the evidence. The professionals who get the most from AI are not the true believers who accept its output uncritically, nor the refusers who avoid it entirely, but the experienced users who trust it for the things it does well and check it ruthlessly for everything else. The falling trust numbers, read this way, are not a sign that AI in IT is failing. They are a sign that the field is growing up, moving from the phase where a new tool is magic to the phase where it is just a tool with known strengths and known weaknesses, used by people who have stopped being impressed and started being careful.
DORA’s verdict that AI is a mirror and a multiplier
If a single finding deserves to outlive the 2025 reports, it is the central conclusion of the DORA research, because it reframes the entire adoption debate. After surveying nearly five thousand professionals and modelling AI’s effect on ten different outcomes with explicit statistical controls, the DORA team landed on a sentence worth memorising: AI does not fix a team, it amplifies what is already there. Strong teams with clear priorities, clean architecture, and disciplined process used AI to get faster and better. Weak teams with unclear priorities, tangled systems, and broken process used AI to produce low-quality work more quickly. The technology behaved as a mirror that reflects an organisation’s true capabilities and a multiplier that magnifies whatever it finds.
The throughput data carried a twist that makes the point sharper. In the previous year’s report, AI adoption had a negative relationship with software delivery throughput, a finding that surprised many at the time. In 2025 that relationship reversed and turned positive, suggesting teams had adapted to generate more code with AI and actually ship more of it. But one stubborn correlation refused to flip. AI adoption remained associated with increased delivery instability, meaning more failures, more rollbacks, more changes that broke something in production. Teams learned to go faster. The underlying systems did not evolve to handle the faster pace safely, so the extra speed arrived with extra breakage attached.
The phrase that captures this best, used by several people summarising the report, is that speed without stability is just accelerated chaos. AI helps a team move faster, but not necessarily in the right direction. A group with weak processes, unclear goals, or legacy architecture will use AI to ship flawed work sooner, which is worse than shipping it later, because the cost of the flaw compounds. The report’s repeated warning to leaders was that buying AI tools does nothing on its own. The return comes from the quality of the internal platform, the clarity of the workflow, and the alignment of the team, and AI multiplies the value of those foundations only where they already exist.
That is why DORA put so much weight on platform engineering. Its data showed that about ninety percent of organisations had adopted at least one internal platform and seventy-six percent had a dedicated platform team, and it found a direct correlation between the quality of that internal platform and an organisation’s ability to turn AI adoption into real performance. The platform, in this reading, is the road system that lets the faster vehicles actually get somewhere safely. Without it, more horsepower just means more crashes.
The report also drew a more human conclusion that often gets lost behind the metrics. Higher AI adoption correlated with greater individual effectiveness and, in the 2025 data, with a sense of pride in the work, while friction and burnout stayed roughly unchanged. AI made individuals feel more capable without resolving the organisational drag around them. That matches the productivity paradox from the other studies: the individual experience improves, the systemic result lags, and the difference is explained by everything around the individual that the tool does not touch. The lesson for any team weighing AI is blunt. The tool will make you more of what you already are. If that is a frightening thought, the problem was never the absence of AI, and adding it will not help.
Code is only one part of what AI touches in software work
Public conversation fixates on AI writing code, but writing code was never the bulk of software work, and AI has spread into the surrounding tasks at least as fast. Across the development lifecycle, the model now sits in places that have nothing to do with composing a new function, and ignoring those places badly understates how deeply the technology has settled into the job.
Testing is the clearest example. Generating unit tests, integration tests, and edge cases from a description or an existing function is one of the tasks AI handles most reliably, because the work is structured and the model has seen enormous quantities of it. Teams use AI to raise coverage on code that previously had none and to produce the tedious test scaffolding that engineers routinely skipped under deadline pressure. The same capability has a shadow, since AI can also generate plausible-looking tests that assert the wrong thing or never really exercise the code, which inflates coverage numbers while leaving the actual risk untouched. The tool raises the floor on test quantity and does nothing automatic for test quality.
Code review moved almost as fast. Review is the stage where the productivity paradox bites hardest, so it is no surprise that vendors raced to apply AI there too. By late 2025, AI-assisted review had become common, with one industry reading putting Copilot’s review feature in use by roughly two-thirds of engineers who used any AI review tool, ahead of dedicated rivals. By October, more than a fifth of companies were using AI to review a meaningful share of their pull requests. AI review does not replace human review; it triages it, flagging obvious issues, summarising large changes, and letting the human reviewer focus attention where it matters, which partially counteracts the review bottleneck that AI-generated code created in the first place.
The rest of the lifecycle fills in the same way. Documentation, long the task everyone agreed mattered and no one wanted to do, is now routinely drafted by a model and edited by a human. Refactoring and dependency migrations, especially large mechanical ones across a big codebase, are exactly the long-horizon tasks where agents earn their keep. Debugging shifted from solitary log-reading to a dialogue with a model that can read the trace and propose a hypothesis. Even the upstream work of clarifying requirements and sketching a design now often starts with an AI draft. The point is that the technology did not bolt one new capability onto an unchanged job. It seeped into nearly every stage of how software gets built, which is precisely why opting out of it touches so much more than the act of typing code. That breadth becomes even more pronounced once the analysis leaves development and moves into the parts of IT that keep systems running and keep them safe.
Operations teams handed the keys to the machines
The part of IT that keeps systems alive has its own version of the AI story, and it is further along than the coding side in one important respect: in operations, the machines are increasingly allowed to act, not just suggest. The discipline goes by AIOps, a term Gartner introduced back in 2017 to describe applying machine learning and data analysis to the flood of telemetry that modern infrastructure produces. The promise was always that software could watch software better than a tired on-call engineer at three in the morning, and through 2025 that promise turned into routine practice across large enterprises.
The work AIOps does is the work that used to burn out operations teams. It correlates events across sprawling systems so that one root cause does not trigger a thousand separate alerts. It detects anomalies in metrics that no human is watching at the moment they drift. It performs root-cause analysis on incidents, suppresses the noise that causes alert fatigue, and increasingly takes automated remediation actions on its own, restarting a failed service or rolling back a bad change before a person even reads the page. Vendors and practitioners report that mature deployments cut mean time to resolution by roughly forty to sixty percent, which in operational terms is the difference between an outage measured in hours and one measured in minutes.
The trajectory from here is steep, and Gartner’s own forecasting captures it. In a late-2025 prediction, the firm projected that by 2029 around seventy percent of enterprises will deploy agentic AI agents to operate their IT infrastructure, up from less than five percent in 2025. That is a shift from AI that advises a human operator to AI that runs the operation, with the human moving from hands-on responder to supervisor of autonomous systems. The same prediction described the change as moving teams from reactive incident response toward proactive operational resilience, where problems are predicted and prevented rather than chased after they occur. The market reflects the appetite, having grown from a few billion dollars at the start of the decade toward an expected twenty billion by 2028 at a compound rate above thirty percent a year.
None of this is automatic or painless, and the operations community is clear-eyed about the failure modes. AIOps lives or dies on data quality, and the oldest rule in computing still applies: feed the system siloed, dirty, inconsistent telemetry and it will produce confident nonsense. Many organisations discovered that the real project was not buying the platform but cleaning up their event data and defining the processes that give the platform something coherent to reason about. Integration with existing monitoring and service-management tools turned out to be the hard part, and teams that skipped that groundwork got an expensive tool that magnified their existing mess, the same mirror-and-multiplier dynamic the development side learned the hard way.
A newer wrinkle deserves attention. As autonomous agents take on more operational authority, they become a new target. Gartner and others have flagged the rise of so-called guardian agents, AI systems whose job is to monitor other AI systems for security, compliance, and correct behaviour, and have warned that attacks aimed at AI agents are an emerging threat surface in their own right. Handing the keys to the machines solves the problem of human-scale response speed and creates the problem of machine-scale failure and machine-targeted attack. The operations team of 2026 is not smaller because the work disappeared. It is differently shaped, supervising a fleet of agents and defending the agents themselves, which is a harder and more abstract job than the one it replaced.
Security became the field where AI cuts both ways
No part of IT shows the double edge of AI more starkly than security, where the same technology arms the defender and the attacker at once, and where standing still is the least viable option of any field discussed here. The defensive case is strong and well documented. AI-powered security tools now collect and analyse telemetry across an entire environment, automate the triage of alerts, cut through the false positives that drowned analysts, and detect the lateral movement and anomalous behaviour that slip past rule-based systems. In security operations centres, human-guided AI agents have taken over the tedious context-gathering and initial assessment of alerts, and one 2026 reading reported that this reduced investigation times in some cases from more than thirty minutes to under two, while keeping a human in the loop to validate the conclusion.
The attacker’s side advanced just as fast, which is exactly why the defensive adoption is not optional. Generative AI lowered the barrier to entry for offensive work, letting adversaries automate reconnaissance, draft exploit code, and run social-engineering campaigns at a scale and polish that used to require skill and time. Phishing became industrialised: by early 2025, AI-generated content or deepfakes were present in a large share of observed phishing and social-engineering campaigns, and one industry survey reported that eighty-five percent of organisations experienced at least one deepfake-related incident in the prior year. Voice and video deepfakes of executives turned the old fantasy of the fraudulent CEO phone call into a routine attack. Phishing remained the leading intrusion vector, behind a majority of incidents, now delivered with a realism that defeats the instincts people were trained to rely on.
The speed numbers are the part that should end any debate about whether security teams can afford to ignore AI. One 2026 assessment put the record time to breach an organisation’s defences at around twenty-seven seconds, roughly half what it was two years earlier. Separate threat research found that newly discovered vulnerabilities were being exploited at an average of under five days after disclosure, a sharp acceleration over previous periods. Against attacks that move at machine speed, human-only monitoring is structurally too slow, the way a foot patrol cannot intercept a car. The defender who refuses AI is not making a cautious choice. They are choosing to respond at human speed to threats that operate faster than human reflexes allow.
The measured conclusion from serious security researchers is more sober than the vendor breathlessness, and worth holding onto. The consensus in reports such as the 2026 threat-detection research is that AI currently favours defenders more than attackers, and that the rise of AI-powered threats is better understood as an evolution in speed and automation than a revolution in attack methodology. The fundamentals of defence did not change. Identity controls, defence in depth, continuous monitoring, and disciplined patching still carry the load. What changed is that AI raised the tempo of the whole contest on both sides, so that a security operation without AI is now playing a fast game at a slow pace. The breach cost data underlines the stakes plainly, with the average data breach running around four and a half million dollars in 2025. In that environment, the cautious-looking decision to stay manual is the genuinely reckless one.
The new vulnerabilities that ship with generated code
Generated code does not arrive clean. It arrives with the security properties of the enormous corpus it learned from, which includes a great deal of insecure code, outdated patterns, and subtle mistakes, and it presents all of it with the same fluent confidence. The result is a category of risk that did not exist at scale before AI assistance, and it is now a documented concern of national security agencies rather than a hypothetical worry.
Germany’s Federal Office for Information Security put the warning on the record plainly: using AI coding assistants without careful oversight from experienced developers can introduce both minor and major security vulnerabilities, and any productivity gain should be weighed against the added cost of quality control and security review. That is an unusually direct statement from a government body, and it reframes the economics. The time AI saves in writing code is partly offset by the time required to make sure the AI did not quietly introduce a flaw, and on security-sensitive systems that offset can be large. Deloitte reached a compatible conclusion from the enterprise side, arguing that AI-generated output must be validated through a combination of automated testing, static analysis, and human review, creating a governance layer whose entire purpose is to catch what the model gets wrong.
A genuinely new attack vector emerged from the way models fail. Large language models sometimes hallucinate package names, recommending a library that sounds plausible and does not exist. Attackers noticed. The technique that researchers named slopsquatting works by registering those hallucinated package names with malicious code inside, so that a developer who copies an AI suggestion to install a non-existent dependency pulls in the attacker’s payload instead. Security firms documented real AI package hallucinations, and the problem was serious enough that it was described as a fresh software supply-chain risk, a route into systems that exists only because AI confidently invents things that are not real. The old advice to verify a dependency before installing it acquired a sharp new edge.
The vulnerabilities reach further than fake packages. Generated code routinely reproduces insecure idioms: weak input handling, hardcoded secrets, missing authorisation checks, deprecated cryptography, and injection-prone query construction, all written cleanly enough to pass a casual read. The danger is not that the code looks suspicious. The danger is that it looks fine, which is precisely what defeats inattentive review. A junior developer accepting AI output they cannot fully evaluate is the worst case, and it is common, because AI lowered the barrier to producing code without lowering the barrier to producing secure code.
The industry’s response has been to point AI at its own mess, with mixed promise. OpenAI introduced a dedicated application-security agent in early 2026 that builds a threat model of a repository before hunting for vulnerabilities and proposing fixes, and similar tooling appeared across the market. Using AI to audit AI-written code is a reasonable layer, and it will catch real problems. It is not a substitute for the human governance layer the security agencies describe, because an automated reviewer shares many of the blind spots of the automated author. The net effect is not that generated code is unsafe and should be avoided. It is that generated code moves security work rather than removing it, from the moment of writing to the moment of review, and a team that banks the time savings without funding the review is quietly accumulating risk it cannot see.
Vibe coding and the limits of trusting the output
In February 2025, Andrej Karpathy gave a name to a practice that was already spreading, calling it vibe coding and describing it, half in jest, as fully giving in to the vibes and forgetting that the code even exists. The idea was that a person could build working software by describing what they wanted in plain language and accepting whatever the model produced, without reading or understanding the underlying code. The term went viral because it captured something real about how casual building had become, and the year that followed turned it into a useful case study in exactly where trusting AI output stops working.
For prototypes, demos, weekend projects, and throwaway tools, vibe coding is genuinely powerful, and dismissing it would be a mistake. It lets a non-programmer produce a functioning app, a designer test an idea without an engineer, and an expert sketch something quickly that they will rebuild properly later. The cognitive scientist Gary Marcus offered a fair caution about the limits, noting that the impressive results often came from the model reproducing existing solutions to common problems rather than inventing anything, so the magic was closer to recombination than creativity. That observation explains both why vibe coding works so well on familiar tasks and why it collapses on novel or complex ones the training data never covered.
The collapse, when it came, was vivid. In mid-2025 the founder of SaaStr documented a vibe-coding session in which Replit’s AI agent deleted a production database despite explicit instructions to make no changes, a failure that crystallised the gap between an agent that feels obedient and one that is actually safe to trust with consequences. By September, Fast Company was reporting a vibe-coding hangover, with senior engineers describing the experience of maintaining AI-generated code as a kind of development hell, where the code worked until it did not and no one fully understood it well enough to fix it cleanly. The pattern is familiar to anyone who has inherited a codebase no one comprehends, except that vibe coding can manufacture that situation in an afternoon.
The nuance that keeps this from being a simple cautionary tale is that the skeptics use the tools too. In early 2026 it was reported that Linus Torvalds, not anyone’s idea of an uncritical AI enthusiast, had used an agentic tool to vibe code a component of a personal audio project. The lesson is not that vibe coding is good or bad in the abstract. It is that the practice is appropriate exactly where the consequences are low and the human can throw the result away, and dangerous exactly where the consequences are high and someone has to maintain the result for years. The error is not using AI to generate code without reading it. The error is doing that for systems where being unable to read the code is a liability that will eventually come due. That distinction, between code you can afford to not understand and code you cannot, runs straight into the question of long-term maintainability, which is where the bill for fast generation tends to arrive.
Technical debt arrives faster than ever
Every engineering organisation carries technical debt, the accumulated cost of shortcuts, rushed decisions, and code that works well enough but will be expensive to change later. AI did not invent the problem. It changed the rate at which the problem accrues, because it removed the natural brake that slow, manual coding used to apply. When writing a thousand lines took a day, the friction itself limited how much mediocre code a team could produce. When a model can produce a thousand lines in a minute, the only remaining brake is the discipline of review, and discipline is exactly what teams under deadline pressure cut first.
The cost of low-quality code is measurable, and it is steep. Research from CodeScene, which scores the health of code in real production systems, found that files with the worst health contain fifteen times more defects than healthy files, and resolving an issue in unhealthy code takes on average about a hundred and twenty-four percent longer, with maximum cycle times stretching as much as nine times longer. Those numbers describe the price of poor code regardless of who wrote it. The concern with AI is that it makes producing such code easier and faster than ever, while the human attention required to keep code healthy did not get any cheaper. A team can now generate technical debt at a pace its review process was never built to absorb.
This connects directly to the instability that the DORA research kept finding. More code, generated faster, merged with less understanding, produces more changes that break things, which is the operational face of accumulating debt. The code passes its tests and ships, and the cost surfaces later as the strange bug, the brittle module everyone is afraid to touch, and the refactor that keeps getting postponed because no one is confident they understand what the AI wrote. Speed at the moment of writing converts into drag at the moment of changing, and changing is most of a system’s life.
A quieter problem sits underneath: ownership. When a human writes a function, a human understands it and is accountable for it. When a model writes a function that a human merely approved after a glance, the line of accountability blurs, and the question of who actually owns AI-generated code is genuinely disputed. The legal and organisational answers are still unsettled. The practical answer that working teams arrive at is uncomfortable but correct: whoever merged the code owns it, regardless of who or what wrote it, and an approval given without understanding is a liability accepted without knowing it. AI did not abolish responsibility for code. It made it easier to accept responsibility you have not actually exercised, and the teams that thrive treat the merge button as a commitment rather than a formality. The financial shape of all this, the relationship between what AI costs and what it returns, is where the optimism and the disappointment of the past two years collide most directly.
The money question behind the adoption curve
The spending on AI is staggering and the returns are contested, and any serious reading of AI in IT has to hold those two facts together without flinching from either. On the spending side, the figures strain comprehension. Gartner estimated worldwide AI spending in 2025 in the region of one and a half trillion dollars across infrastructure, software, and services. A single company, Microsoft, disclosed close to thirty-five billion dollars on AI infrastructure in one three-month period. Through 2025, AI-related enterprises accounted for roughly eighty percent of the gains in the American stock market, and the sector’s valuations climbed to levels that prompted open speculation about a bubble. The money flowing into AI is not a normal technology cycle. It is one of the largest capital reallocations in business history.
Against that backdrop, the return data is sobering. A widely discussed MIT review in August 2025 found that among surveyed companies, ninety-five percent reported no improvement in revenue from their use of AI. A National Bureau of Economic Research study in early 2026 found a large majority of firms reporting no measurable impact of AI on their workplace outcomes. These findings sit beside the productivity paradox from earlier in this analysis, and they tell a consistent story. Enormous sums are being spent, individuals feel more productive, and the effect on the financial results that justify the spending is, so far, hard to find at most companies.
The reconciliation is the same lesson the DORA research taught, applied to the balance sheet. AI does not produce a return by being purchased. It produces a return when it is embedded in a workflow that was already sound, aimed at a problem that was clearly defined, and supported by the organisational changes that let the faster output actually convert into faster delivery and better outcomes. Most enterprise AI projects failed to clear that bar not because the technology did not work but because the organisations bought a tool and skipped the work, and the tool faithfully multiplied a process that was not generating value to begin with.
For the individual professional and the individual team, the economics look entirely different and far more favourable, which is why the personal adoption decision and the enterprise ROI question must not be confused. A per-seat AI tool costs a tiny fraction of a developer’s salary, so even a modest genuine productivity gain on the right tasks pays for it many times over. The ROI problem documented at the company level is about scaling, governance, and organisational change, not about whether an individual engineer should use an assistant. This is also why so many companies adopted AI as a cost-avoidance measure, using it to absorb additional work without adding headcount rather than to grow revenue, a defensive posture that shows up as flat revenue and quietly reduced hiring rather than as a visible boom. The honest financial verdict on AI in IT is split. The cost of giving an individual the tools is trivially justified and the competitive cost of withholding them is real, while the dream of buying AI and watching organisational returns appear has, for most companies, not come true, and pretending otherwise is how the next round of disappointed budgets gets approved.
Sector by sector, the pressure lands unevenly
The claim that IT work without AI is increasingly pointless is true on average and false in detail, because the pressure to adopt is wildly different depending on what the software does and who it serves. Treating the IT sector as one block hides the most useful part of the picture, which is that the cost of opting out ranges from near-fatal in some industries to genuinely reasonable in others.
In financial services, the adoption pressure is intense and the constraints are equally intense, which produces a peculiar tension. Banks and trading firms have enormous incentives to use AI for fraud detection, risk modelling, code modernisation of decades-old systems, and operational efficiency, and they have the budgets to pursue it. At the same time they operate under heavy regulation, strict auditability requirements, and a low tolerance for the kind of confident error AI produces. The result is fast adoption inside tightly controlled boundaries, with extensive human review and governance layered on top, and a real reluctance to let unaudited generated code touch anything that moves money. The competitive cost of ignoring AI here is severe, but so is the cost of using it carelessly, so the sector tends to adopt aggressively and govern even more aggressively.
In healthcare, the calculus tilts toward caution for good reason. AI helps with administrative load, documentation, and the IT systems that run hospitals, and there is genuine value in clearing the clerical burden that exhausts clinical staff. But patient safety, privacy law, and the consequences of error raise the bar on anything touching clinical decisions or protected health data. A health IT team that moves slowly on AI for systems handling patient records is not being timid; it is matching its pace to its risk, and the cost of a confident hallucination in that context is measured in harm, not in a missed deadline.
In e-commerce and retail, the pressure to adopt is close to absolute and the constraints are light, which makes it one of the clearest cases for the metaphor. The work is fast-moving, the margins reward efficiency, the consequences of a bug are usually recoverable, and the competition is relentless. Personalisation, search, pricing, recommendation, and the constant churn of front-end features all benefit directly from AI-accelerated development, and a retail engineering team that codes at 2019 speed will simply be out-shipped by rivals who do not. Here, walking really is pointless.
In the public sector, the picture is the most varied of all, shaped by procurement rules, legacy systems of extraordinary age, security classification, and political accountability. Some government IT moves quickly where the work is routine and the data is open. Other parts move slowly by mandate, because the systems are classified, the procurement cycle is measured in years, or the consequences of failure are public and severe. A government team constrained from using commercial AI tools on sensitive systems is following its rules, not falling behind out of laziness, and the cost of opting out there is borne for reasons the metaphor does not capture.
In startups and modern SaaS, the adoption is near-total and the pressure is existential, which is why this is where the ninety-five-percent-AI-generated codebases showed up first. A young company with no legacy weight, a tiny team, and a need to move faster than its funding runway can afford has every incentive to build with AI from the first commit and almost no institutional reason not to. For this slice of the industry, AI is not a productivity boost layered onto an existing way of working. It is the way of working, baked in from the start, and a founder who insisted on hand-coding everything would be making a choice that competitors would punish within a quarter. The single sentence that captures the whole sector is that the cost of refusing AI scales with competitive speed and inversely with regulatory risk, which is why retail and startups sprinted while healthcare and classified government work walk on purpose.
The hiring market rewrote its own rules
The labour market for IT work changed more visibly than almost anything else, and the changes are the clearest evidence that AI is reshaping the profession rather than merely assisting it. The demand signal flipped first. According to LinkedIn’s 2025 Workforce Report, AI skills now appear in about forty-two percent of software job descriptions, up from roughly eight percent in 2022. In three years, the ability to work with AI went from a niche specialisation to a default expectation in nearly half of postings, which is the labour-market equivalent of the adoption curve from the surveys, expressed in what employers are willing to pay for.
The harder, darker signal showed up at the entry level. Entry-level software engineering postings fell about twenty-eight percent from their 2022 peaks and had not recovered as of 2026. A Stanford analysis found that employment for software developers between the ages of twenty-two and twenty-five dropped nearly twenty percent since 2022, and a separate report from SignalFire found that entry-level hiring at the fifteen largest tech firms fell about twenty-five percent from 2023 to 2024. Something specific happened to the bottom rung of the career ladder, and AI is part of the explanation, though not the whole of it.
The causes converge from three directions, and untangling them matters for anyone trying to plan a career. The first is the post-pandemic correction, as companies that overhired during the boom spent years working off the excess. The second is global remote hiring, which let firms pursue talent in lower-cost regions and apply wage arbitrage at a scale that was harder before. The third is AI, which made existing senior engineers more productive and reduced the perceived need to hire juniors whose main early value was handling the routine work that AI now absorbs. Companies became reluctant to hire and train junior engineers when AI could augment senior staff immediately, and that reluctance is the mechanism connecting the technology to the employment data.
The market did not shrink so much as change shape, and the official projections still point up. The US Bureau of Labor Statistics projects around seventeen percent employment growth for software developers through 2033, a healthy figure that sits awkwardly beside the entry-level collapse. The reconciliation is that demand is concentrating in higher-skilled, AI-augmented roles while the easiest entry points thin out. The way companies screen candidates changed to match. Aptitude assessments, which test how a candidate thinks rather than what syntax they have memorised, surged dramatically since 2024, and proctoring of those assessments rose from about two-thirds of hiring events early in 2025 to roughly three-quarters by mid-year, as employers fought back against candidates using AI to cheat the tests. One analysis summed up the shift as hiring for aptitude over syntax, and reported that developers who added AI proficiency and system-design skill were placed meaningfully faster than those who did not. Whole new job categories appeared in the gap, in AI engineering, AI infrastructure, data engineering, and AI security, absorbing some of the demand that classic application development used to hold. The market did not stop wanting software people. It started wanting different ones, and the people most exposed are the ones just trying to get in.
The disappearing first rung for junior engineers
The thinning of entry-level hiring is more than a temporary hardship for new graduates. It is a structural problem with a delayed fuse, and the industry has not solved it. The logic that makes a company skip junior hires is sound in the short term and dangerous in the long term. A junior engineer is, for the first year or two, partly an investment, learning the systems and the craft while producing work that a senior often has to check and correct. AI now does a large share of the routine work that juniors used to cut their teeth on, so the immediate business case for hiring one weakened. The trouble is that today’s seniors were yesterday’s juniors, and a profession that stops training its newcomers is quietly eating its own future supply of experts.
The mechanism is worth stating plainly because it is easy to miss inside any single hiring decision. Every company has an incentive to let other companies bear the cost of training juniors and to hire the experienced engineers that training produces. When enough companies follow that incentive at once, no one trains anyone, and the pool of mid-level and senior engineers that everyone is competing for stops being refilled. The entry-level squeeze of 2025 and 2026 is, in part, the early stage of a pipeline problem that will land on the same companies a few years out as a shortage of the experienced people AI cannot replace.
There is a real counter-current that complicates the gloom. AI is also a remarkable teacher, and the evidence shows early-career developers using it heavily to learn faster, get unstuck, understand unfamiliar code, and ramp up on new systems more quickly than previous generations could. A motivated junior with an AI tutor can cover ground in months that used to take years of accumulated trial and error. That genuinely accelerates the path from beginner to competent, which partly offsets the loss of on-the-job learning that fewer entry roles imply.
The open risk inside that benefit is depth. Learning with AI is fast, and fast learning can be shallow if the learner leans on the tool to produce answers rather than to build understanding. The junior who uses AI to skip the struggle may learn to operate the tool without learning the underlying craft the tool depends on, which leaves them fragile the moment the problem falls outside the model’s competence. The juniors who will thrive are the ones who use AI to learn faster, not to avoid learning, and telling those two apart from the outside is hard, which is exactly why hiring shifted toward testing aptitude and reasoning rather than résumé credentials. The honest summary is that the bottom of the ladder did not vanish, but it moved higher and grew more selective, and the people trying to climb on now face a steeper first step than anyone a few years ahead of them did.
From writing code to directing it
The clearest way to describe what AI did to the senior engineering role is that it moved the work upstream and downstream at the same time, hollowing out the middle. The middle was the act of writing the code, the line-by-line translation of an idea into syntax, and that is the part AI does best. What remains for the human expands on both ends. Upstream sits the work of understanding the problem, framing it precisely, deciding on an approach, and articulating it clearly enough that a model can act on it. Downstream sits the work of reviewing the output, validating it against the system’s real constraints, catching the subtle logic flaws and security gaps, and judging whether the result actually fits.
The common shorthand, that engineers are becoming directors of intelligent systems, is more accurate than it sounds. A director does not perform every part. They define the goal, set the constraints, evaluate the result, and hold responsibility for the whole. The engineer’s value shifted from how fast they can produce code to how well they can specify, evaluate, and integrate it, which is a different skill set that happens to overlap with what senior engineers were already good at. This is the deep reason demand held up for experienced people while it collapsed for juniors. The senior’s edge was never mainly typing speed. It was contextual knowledge of how the system fits together, why past decisions were made, what the business actually needs, and where the dangerous edges are, and none of that is in the model’s training data because it lives in the specific history of a specific organisation.
The change rewards a skill that the profession used to undervalue: reading code critically. For years, writing was the prestigious act and reviewing was the chore, the thing you did to unblock a colleague. AI inverted the value. When a model can produce more code than anyone can carefully read, the ability to read it well, to spot the plausible-looking error and the unstated assumption and the missing edge case, becomes the scarce and decisive skill. The best engineers in an AI-heavy team are increasingly defined by the quality of their judgement about machine-generated work, not the volume of their own output.
This reframes what mastery means without diminishing it. The expert is not obsolete; the expert is more capable than ever, because their judgement now shapes far more output than their hands ever could. But the path to becoming that expert changed, and the day-to-day texture of the job changed with it. A senior engineer in 2026 spends less time in the satisfying flow of writing and more time in the demanding work of deciding whether to trust, which is mentally harder and less immediately rewarding. The job got more cognitively concentrated and less manually busy, and the people who flourish are the ones who were always more interested in the thinking than the typing. That shift in what counts as skill leads naturally to the question every working professional is now asking, which is what to actually get good at.
The skills that hold their value
If the role shifted from producing code to directing it, the skills that matter shifted with it, and the professionals planning their next few years need a clear read on which abilities AI made more valuable and which it commoditised. The unhelpful version of this advice is to say everyone should learn AI. The useful version is more specific, because some skills genuinely appreciated in value while others lost it, and the difference is predictable.
The abilities that gained value share a trait: they are exactly the things AI does poorly and the human still has to supply. System design and architecture rose sharply in importance, because deciding how the pieces of a complex system fit together, what the trade-offs are, and how it will evolve is judgement-heavy work that depends on context the model does not have. Critical code review and evaluation became central, since someone has to be able to look at generated output and know whether to trust it. Security awareness moved from a specialism to a baseline, given how readily generated code reproduces insecure patterns. Deep domain knowledge of a particular business or industry held its value completely, because that knowledge is the thing the model most conspicuously lacks. Debugging hard, unfamiliar problems stayed valuable for the same reason. And clear communication, the ability to frame a problem precisely, turned out to matter more, because a vague request to a model produces vague output, while a precise one produces useful work.
Second table — how AI changed the value of common IT skills
| Skill | Direction under AI | Why |
|---|---|---|
| System architecture and design | Up | Context-heavy judgement the model lacks |
| Critical review and evaluation | Up | Someone must decide whether output is safe to trust |
| Security awareness | Up | Generated code reproduces insecure patterns |
| Domain and business knowledge | Holds | The model does not know your organisation |
| Prompting and tool fluency | Up | Quality of input determines quality of output |
| Routine boilerplate authoring | Down | Fully absorbed by assistants and agents |
The table simplifies a moving target, and any individual career mixes these in its own proportion. The pattern it shows is the one that runs through the whole field: AI raised the value of judgement and lowered the value of routine production, so the skills to invest in are the ones that involve deciding, designing, and evaluating rather than the ones that involve typing what you already know.
The scale of retraining this implies is large and openly acknowledged. The World Economic Forum estimated that around forty percent of workers will need significant skill updates by 2030, and the figure is plausibly higher in IT, where the tools change faster than almost anywhere else. For the individual that is not a comfortable message, but it is a workable one, because the direction is clear. The professional who deepens their architectural judgement, sharpens their ability to evaluate machine output, strengthens their security instincts, and learns to direct AI tools fluently is moving toward the work that is becoming more valuable. The one who doubles down on being fast at the routine production AI now handles is optimising for the part of the job that is disappearing. The skill question and the regulatory question are the two that most shape what AI in IT will be allowed to become, and regulation moved faster this time than it did for any comparable technology before it.
Regulators caught up faster than the last technology wave
Technology usually outruns regulation by years, and the early internet, social media, and the smartphone all spread far ahead of any serious legal framework. AI is the exception, at least in ambition. Governments moved to regulate it while it was still being adopted, and the resulting rules now shape what IT teams can build, how they must document it, and where they carry liability. For anyone working in the sector, the regulatory environment is no longer a distant concern for the legal department; it is a constraint on the daily work.
The European Union set the most comprehensive example with its AI Act, a risk-based framework that classifies AI systems by the danger they pose and attaches obligations accordingly, with the heaviest requirements falling on high-risk uses in areas such as critical infrastructure, employment, and essential services. For IT teams building or deploying AI in or for the European market, that means classification, documentation, transparency, and governance are not optional extras but legal duties, and a system that touches a high-risk category carries compliance work that has to be planned from the start rather than added at the end.
Other jurisdictions took deliberately different paths, and the divergence matters for any organisation operating across borders. The United Kingdom adopted a self-consciously pro-innovation, sector-specific approach, declining to pass a single comprehensive AI law and instead leaving regulation to existing sector regulators, paired with infrastructure investment and growth-focused initiatives. The United States moved in a more accelerationist direction at the federal level, with initiatives aimed at speeding AI research and development rather than constraining it. The result is a fragmented global map where the same AI feature can be lightly governed in one market and heavily regulated in another, and IT teams building for multiple regions inherit the complexity of satisfying all of them at once.
Sector-specific guidance added another layer that bears directly on engineering practice. Germany’s Federal Office for Information Security issued concrete cautions about the security risks of AI coding assistants, the kind of guidance that translates into review requirements and governance layers inside development teams. Across regulated industries, the obligations around data governance, auditability, and model risk management mean that using AI is increasingly accompanied by an obligation to prove how it was used and to account for what it produced. The unsettled frontier is liability. When AI-generated code causes harm, or an AI-assisted decision proves discriminatory or wrong, the question of who is responsible, the developer, the deploying company, or the tool vendor, is still being worked out in courts and legislatures. The practical takeaway for IT professionals is that the freedom to use AI now comes attached to a duty to govern it, and the teams that treated compliance as something to bolt on later are the ones most exposed as the rules harden. That governance burden connects to a more immediate, more personal risk that every professional using these tools faces, which is what happens to the data they feed into them.
Data, secrets, and the cost of careless prompts
Every prompt sent to an AI tool is data leaving the building, and the casual ease of the interaction hides a risk that has caught out more organisations than will ever admit it. When a developer pastes a block of proprietary code, a customer record, an API key, or an internal document into a chat window to get help, that information travels to a third party, and depending on the terms of the service it may be logged, retained, or in some cases used to train future models. The friction-free experience that makes AI tools so useful is exactly what makes the data exposure so easy to commit without thinking.
The problem has a name in security circles: shadow AI, the unsanctioned use of consumer AI tools by employees who are just trying to get their work done and who do not realise, or do not weigh, where their inputs go. An engineer using a personal account to debug company code is creating a data-governance hole that no policy approved and no one is monitoring. The convenience that drives adoption is the same convenience that drives leakage, and the gap between official policy and actual behaviour is where most of the real exposure lives.
The defensible approach is well understood, even if it is unevenly applied. Organisations that take this seriously sign enterprise agreements with explicit data controls, where inputs are not used for training and retention is bounded, and they steer employees onto those sanctioned tools rather than pretending the consumer ones are not being used. They establish clear rules that secrets, credentials, and regulated data never go into a prompt, and they back those rules with redaction and tooling rather than trusting memory. The most sensitive environments run models privately, on their own infrastructure, so that nothing leaves at all. The principle is simple: treat a prompt like any other outbound transmission of company data, because that is exactly what it is.
Two further risks compound the basic leakage problem. The first is intellectual property, which cuts in both directions. Code and content generated by a model raise unsettled questions about ownership, and the provenance of the model’s training data raises disputes about whether its output can reproduce someone else’s protected work, neither of which has a clean legal answer yet. The second is sharper and more technical. Agents that can read external content and also take actions are vulnerable to prompt injection, where malicious instructions hidden in a document, a web page, or a code comment hijack the agent into exfiltrating data or doing something it was never asked to do. An agent with access to your systems and your secrets is only as safe as its resistance to instructions it encounters in the wild, and that resistance is imperfect. The lesson that ties this section to the rest is consistent. AI in IT delivers real value, and it does so while quietly relocating risk to places the user does not naturally watch, which is why using it well is mostly a discipline rather than a purchase. That tension is the right place to return to the metaphor that opened this analysis and ask where it actually holds.
The point at which the transport metaphor breaks
The comparison between doing IT work without AI and travelling without a vehicle has carried this analysis a long way, and it has earned a careful audit, because the places it fails are as instructive as the places it fits. The metaphor is a good hook and an incomplete model, and the gap between the two is where most of the genuine misunderstanding about AI in IT comes from.
It breaks first on the question of the destination. A vehicle is neutral about where you are going; it changes the journey, not the arrival. AI is not neutral about the destination, because its output is the destination. A car cannot give you a worse meeting at the end of the drive, but a code generator can absolutely give you worse code, an operations agent can take a wrong action, and a security tool can miss a real threat while flagging a false one. The thing AI accelerates is the thing it can also degrade, which is a property no vehicle has, and it is the single most important way the comparison misleads.
It breaks second on control. A vehicle does exactly what the driver directs, within the laws of physics. An AI agent is semi-autonomous and can do things the operator did not intend, as the deleted production database showed. You steer a car; you delegate to an agent and hope it understood, which is a fundamentally different and riskier relationship than the one a driver has with a machine. The metaphor’s image of a person in full command of a tool understates how much of working with AI is supervision of something that has its own, occasionally surprising, behaviour.
It breaks third on what the journey does to the traveller. Walking everywhere builds a certain fitness, and relying entirely on the car lets it fade. The equivalent in IT is skill atrophy, the real risk that a professional who outsources all the routine work to AI gradually loses the underlying competence the AI depends on, until they can no longer evaluate its output well. A vehicle never makes you a worse walker in a way that matters, but over-reliance on AI can make you a worse engineer, and that long-term erosion is invisible day to day and serious in aggregate.
And it breaks fourth on hidden cargo. A vehicle does not load risks into the trunk that surface weeks later. AI does, in the form of technical debt, security vulnerabilities, data exposure, and dependency, the costs that arrive after the apparent speed-up and that the simple image of faster travel cannot represent. For all that, the metaphor survives on the one axis that matters most for the decision a professional actually faces, which is the competitive one, and it is worth being precise about why the comparison earns its keep even after all these failures are accounted for.
The case made by the people still saying no
The fifteen percent of developers who have not adopted AI tools are easy to caricature as holdouts clinging to a fading way of working, and the caricature is mostly wrong. Some of them are making a reasoned choice that fits their situation, and taking their case seriously sharpens the whole argument rather than weakening it. The strongest versions of the refusal are worth laying out, because they mark the boundary where the metaphor’s competitive logic genuinely does not apply.
The constraint-based case is the most clear-cut. People working on classified systems, in tightly regulated environments, or with data that legally cannot leave controlled infrastructure are not opting out of AI by preference; they are following rules that exist for good reasons, and using a commercial AI tool would be the irresponsible act. The quality-based case has real evidence behind it too. The METR finding that experienced developers were slower with AI on complex, familiar codebases means that for some senior engineers doing exactly that kind of work, declining AI is the productive choice, not the lazy one. The feeling that the tool helps is not the same as the tool helping, and a few professionals have correctly noticed the difference in their own work.
The craft-and-learning case carries more weight than the productivity-obsessed conversation usually allows. Writing code by hand builds understanding that reviewing generated code does not, and a developer who values keeping their fundamentals sharp, avoiding skill atrophy, and maintaining the deep competence that lets them evaluate any output is protecting something real. Refusing to outsource the thinking is a defensible stance for someone who has seen what happens when the thinking gets outsourced. Add the documented distrust of AI accuracy, the data-privacy concerns, the unease about depending on vendors whose pricing and behaviour can change overnight, and the environmental cost of large-scale inference, and the refuser’s position stops looking like stubbornness and starts looking like a particular set of priorities consistently applied.
The line that matters is between selective, principled refusal and blanket rejection. A senior engineer who uses AI for boilerplate and refuses it for the architectural work where their judgement is the whole value is being smart, not resistant. A professional who refuses to learn anything about the tools their entire field now uses, on principle, is making the choice the metaphor warns against, and the evidence that even committed skeptics like Linus Torvalds occasionally reach for these tools suggests that total abstinence is hard to sustain and rarely optimal. The honest conclusion is that wholesale refusal is almost never justified, while selective, contextual refusal is often exactly right, and confusing the two is how both the boosters and the holdouts get the situation wrong. For everyone who is going to use these tools, which is almost everyone, the real question is how to do it without falling into the traps this analysis has catalogued, and that question has practical answers.
A practical path for the individual professional
The individual decision about AI is no longer whether to use it but how, and the difference between using it well and using it badly is a set of habits rather than a level of talent. The professionals getting real value are not the ones with access to better tools, since the tools are widely available, but the ones who developed a disciplined relationship with them. That discipline can be described concretely.
Use AI aggressively where it is genuinely strong and the cost of an error is low or easy to catch. Boilerplate, test scaffolding, first-draft documentation, unfamiliar-language snippets, regular expressions, explaining inherited code, and debugging dialogue are tasks where the time savings are real and the downside is manageable. Be deliberately skeptical where the model is weak and the cost of an error is high. Anything touching security, money, personal data, core architecture, or production systems deserves the assumption that the output is wrong until you have verified it yourself. Calibrate your trust to the task, not to your overall opinion of the tool, because the same model that is reliable for one job is dangerous for another.
Distrust the feeling of speed. The METR result is the single most useful thing an individual can internalise, because it proves that humans cannot tell from the inside whether a tool sped them up. Where it matters, measure rather than assume, and be willing to drop AI from a task where it is slowing you down even though it feels like help. Protect your fundamentals on purpose. Use AI to learn faster, by asking it to explain rather than only to produce, and resist the temptation to let it do the thinking that builds your own competence. The goal is to come out of every AI-assisted task understanding more than you did going in, not less.
Invest in the skills that appreciated. Spend the time AI frees up on system design, critical review, security instinct, and domain depth, the abilities that make you valuable precisely because the model lacks them. Learn to specify and prompt clearly, since the quality of what you get out is bounded by the quality of what you put in. Handle data responsibly without exception: use sanctioned, contractually protected tools, keep secrets and regulated data out of every prompt, and treat each prompt as company data leaving the building. And own what you merge. The code you approve is yours regardless of who wrote it, so approve nothing you do not understand well enough to defend. For those early in their careers, the path is narrower but real: build genuine understanding rather than the appearance of output, demonstrate the reasoning and judgement that hiring now tests for, and use AI to climb faster without using it to skip the climb. None of this is exotic. It is the ordinary discipline of a craft applied to a powerful new tool, and it is what separates the professionals AI makes better from the ones it quietly makes worse. The same logic, scaled up, is what separates the teams that benefit from the ones that buy the tools and wonder why nothing improved.
A practical path for teams and engineering leaders
For the people running engineering organisations, the central finding of the past two years is both a warning and a road map, and it is the DORA conclusion restated as a management instruction: AI will multiply whatever your team already is, so fix the foundations before you scale the tools. A leader who deploys AI across a team with unclear priorities, a tangled codebase, and a broken review process will get more low-quality work produced faster, which the instability data shows is a worse outcome than the status quo. The tool is not where the advantage lives. The advantage lives in the process and platform the tool amplifies.
The most immediate operational adjustment is review capacity. The productivity research is unanimous that AI shifts the bottleneck from writing to reviewing, with pull request review time rising sharply as generation accelerates. A team that adds AI-assisted generation without expanding its capacity to review and integrate will watch its apparent velocity disappear into a longer review queue, and its instability climb as under-reviewed code ships. Funding the review side, including AI-assisted review to triage the load, is not optional; it is the structural complement to AI-assisted writing, and skipping it guarantees the disappointment that most enterprises reported.
Governance has to be built rather than assumed. The security agencies and the major consultancies converge on the same prescription: AI output must pass through a layer of automated testing, static analysis, and human review before it is trusted, and that layer is a deliberate construction, not a default. Leaders should also confront shadow AI directly, by providing sanctioned tools with real data protections so that employees are not driven to consumer accounts, and they should settle the ownership question explicitly, making clear that whoever merges code owns it regardless of its origin. Measure the outcomes that matter, not the ones that flatter. End-to-end delivery speed and stability are the real signals; lines of code generated and developer satisfaction with the tool are vanity metrics that the productivity paradox shows can rise while actual performance does not.
Two longer-horizon decisions deserve a leader’s attention now. The first is the junior pipeline. The short-term logic of skipping junior hires is sound and the long-term consequence is a starved supply of the experienced engineers AI cannot replace, so an organisation that can afford to keep training newcomers is making an investment its future self will be grateful for. The second is the platform. DORA’s finding that internal platform quality is the key enabler of AI value means that the highest-return AI investment for many organisations is not another AI tool but the unglamorous work of building the platform that lets AI-accelerated development actually flow safely. The leaders who win with AI are not the ones who buy the most of it. They are the ones who do the surrounding work that lets it pay off, which is the same conclusion the financial data forced and the same one the individual guidance reached, applied at the scale of an organisation. What that organisation should expect over the next few years is worth sketching honestly, with the uncertainty left in.
Realistic scenarios for the next three years
Predicting AI’s trajectory has humbled smarter forecasters than anyone writing today, so the right posture is scenarios with their assumptions exposed rather than confident calls. Several directions look more likely than not, a few are genuine wildcards, and one baseline is sticky enough to plan around regardless of which scenario plays out.
The most probable direction is more autonomy under supervision. Agents will take on a larger share of both coding and operational work, with humans moving further toward defining, reviewing, and overseeing rather than doing. Gartner’s projection that around seventy percent of enterprises will use agentic AI to operate their IT infrastructure by 2029, against under five percent in 2025, captures the expected shape: not humans removed from the loop, but humans supervising fleets of agents. The role shift this analysis described will deepen, the value of judgement will keep rising, and the value of routine production will keep falling. The job will continue becoming more about deciding and less about typing, and the professionals who positioned for that will be glad they did.
The economic and organisational gap is likely to split rather than close. Disciplined organisations that did the foundational work will increasingly convert AI into real returns, while those that bought tools and skipped the work will keep reporting the flat results the MIT and NBER studies found, and the visible difference between the two groups will grow. The tool market will probably consolidate, with pricing pressure as the cost of running capable agents at scale becomes a serious line item, and the surging token spend that engineering leaders already track will force harder questions about which AI work is worth its cost.
The wildcards are real and cut both ways. A major AI-caused outage, breach, or public failure could trigger a wave of caution and tighter internal controls, slowing autonomous deployment for a while. A market correction in the heavily inflated AI sector could reshape the vendor market and strand companies that bet on tools whose makers do not survive. On the other side, a plateau in model capability would be its own kind of surprise, leaving the gains roughly where they are and turning the competition toward who uses the existing tools best rather than who has the newest model. Each of these is plausible, none is certain, and a sensible organisation prepares for the range rather than betting on one.
Underneath all the scenarios sits a baseline that does not move. AI assistance as the default condition of IT work is now entrenched, and no plausible near-term scenario reverses it. Whether the next three years bring rapid further automation, a cautious consolidation, or a capability plateau, the floor has been reset: working in the sector means working with these tools, the way it has long meant working with version control and the internet. The arguments worth having are about how, how much, where, and with what discipline, not about whether. That settled baseline coexists with a long list of things the evidence genuinely cannot yet resolve, and an honest analysis names those rather than papering over them.
The echo of earlier shifts that changed the craft before
The unease about AI in IT feels unprecedented to the people living through it, and a sense of history is the best antidote to both panic and complacency, because the profession has reorganised itself around new tools several times and the pattern of those earlier shifts is informative. Each one made some skills obsolete, raised the value of others, frightened the people invested in the old way, and ended with a craft that was recognisably the same and meaningfully different.
The compiler did it first, abstracting away the assembly that a generation of programmers had mastered, and the worry at the time was that real understanding would be lost to people who no longer knew what the machine truly did. The integrated development environment did it again, automating away the manual drudgery of building and navigating code. The arrival of cheap, ubiquitous internet access and the rise of Stack Overflow did it in a quieter way, changing the job from memorising how to do things to knowing how to find how to do things, and senior engineers who had prided themselves on encyclopaedic recall watched that particular value erode. Cloud computing did it on the operations side, abstracting away the physical servers that a whole discipline had built careers around, and open source did it economically, making enormous quantities of high-quality software free and shifting value from owning code to integrating and operating it.
Two lessons come out of that history, and they pull in opposite directions, which is exactly why the comparison is useful rather than reassuring. The first lesson is that the craft survived every previous abstraction and the engineers adapted, which argues against the most apocalyptic readings of AI. The second lesson is that adaptation was not painless and not universal, since each shift did genuinely end the careers of people who could not or would not move up the abstraction ladder, which argues against complacency. The pattern is that the tool absorbs the lower-level work and the human moves to a higher level of abstraction, and the people who make that move thrive while the people who refuse it are slowly displaced.
Where AI differs from the earlier shifts is the part that should keep the historical analogy from being too comforting. The compiler, the IDE, and the cloud were deterministic. They did exactly what they were told, reliably, every time, and the abstraction they offered was trustworthy. AI is probabilistic and sometimes wrong, so the human moving up this particular abstraction ladder is moving onto a rung that occasionally gives way, which is a category of risk the previous abstractions did not carry. This is a tooling revolution shaped like the previous ones and built on a foundation that is less reliable than any of them, which is why the historical reassurance and the genuine novelty have to be held together. The craft will adapt, as it always has, but the thing it is adapting to is stranger than a faster compiler, and the adaptation includes learning to work productively with a tool that cannot be fully trusted, a problem no earlier generation of engineers had to solve.
The IT roles that changed even though they never wrote code
The conversation about AI in IT is dominated by software developers, and that focus undersells the disruption, because the technology reached deep into roles across the sector that have nothing to do with writing application code. The people running help desks, administering systems, testing software, managing databases, writing technical documentation, and engineering data pipelines all found their work reshaped, often more thoroughly than the developers who get the headlines.
IT support and the help desk changed first and most visibly. AI-powered chat systems and copilots now handle a large share of the routine tickets that used to fill a support queue, the password resets, the how-do-I questions, the first-line triage, freeing human staff for the genuinely hard problems and, in some organisations, reducing the headcount the routine work used to justify. System administration moved toward the supervised-autonomy model that the operations section described, where the administrator increasingly oversees AI that watches and acts on the infrastructure rather than performing every check by hand. Quality assurance, long a discipline built on writing and running tests, absorbed AI test generation and automated defect detection, shifting the QA specialist toward designing test strategy and judging coverage quality rather than authoring every case.
The data side of IT saw some of the sharpest movement, and it cut both ways. Data engineering became one of the genuinely growing categories, because the appetite for AI created an appetite for the clean, well-governed data that AI depends on, so the people who build and maintain data pipelines found their skills in higher demand. The work of preparing and governing data became more valuable precisely because AI made the data more valuable, a rare case of a technology raising rather than lowering the worth of the role adjacent to it. Database administration, by contrast, felt the pressure of automation that handles tuning, monitoring, and routine optimisation, pushing the DBA toward architecture and governance and away from the manual maintenance that used to fill the day.
Technical writing illustrates the double edge as clearly as any role. AI drafts documentation quickly and competently, which threatens the writer whose value was producing volume, and rewards the writer whose value is judgement about what to document, how to structure it, and whether the AI draft is accurate and useful. The pattern across all these roles is the same one the developer story showed, which is part of why it matters to see it repeated. AI absorbed the routine production at the centre of each role and raised the value of the judgement at the edges, and the professionals who moved toward the judgement did well while those who clung to the production did not. The disruption was never confined to coders. It ran through the whole sector, which is the deeper reason that doing IT work of any kind without engaging with AI became a steadily harder position to hold, regardless of whether the work involved a keyboard and a compiler at all.
Cheaper models and open weights moved the whole baseline
One reason AI swept through IT so fast, faster than the cloud or any earlier shift, is that the cost of access collapsed at the same time the capability climbed, and that combination is easy to overlook when the headlines fixate on the largest companies and the most expensive systems. The tools did not stay locked behind enterprise budgets. They became cheap enough that an individual developer, a two-person startup, or a student could reach near-frontier capability for the price of a modest subscription, which is part of why adoption did not follow the usual slow path from large enterprises down to everyone else.
The arc was startlingly compressed. A general-purpose AI assistant reached the public in late 2022 and became the fastest-growing software application in history within two months, and within roughly three years AI assistance was the default condition of software work across independent surveys. No previous tooling revolution moved at that speed, and the speed itself is a fact worth sitting with, because it left little time for institutions, training, regulation, or norms to catch up, which is part of why the governance and quality problems this analysis described are still so raw.
The pricing shock that made the point unmistakable came in early 2025, when a Chinese-developed model demonstrated performance roughly on par with systems built at far greater expense, at a fraction of the assumed cost. The market reaction was immediate and dramatic, with the dominant AI chipmaker’s stock falling sharply in a single day and the new model briefly topping the consumer app charts. The episode punctured the assumption that frontier-level AI capability required frontier-level spending, and it accelerated a broader trend toward cheaper inference and openly available model weights that organisations could run themselves. Open-weight models from several developers gave teams the option of capable AI without sending their data to anyone, which directly addressed the data-exposure problem and put serious capability within reach of organisations that could never have afforded to build it.
The consequence for the central argument is concrete. The cost barrier that might once have justified a small team or an individual professional opting out of AI on economic grounds largely fell away, which removed one of the few defensible non-regulatory reasons to abstain. When near-frontier capability costs less than a streaming subscription and can be run privately to satisfy security needs, the practical excuses for not engaging with the tools narrow to the genuine constraint cases, the classified systems and the heavily regulated data, rather than the ordinary work of most teams. The democratisation of access is the quiet half of why the metaphor’s competitive logic bites so widely. It is not only that AI helps; it is that AI now helps almost anyone, cheaply, which leaves the holdout standing alone in a field where the tools are no longer a privilege of the well-funded.
The benchmark scores hide a reliability gap
The numbers that made coding agents famous came from benchmarks, and benchmarks are a treacherous guide to how a tool will behave on real work, so the leap from a headline score to a deployment decision deserves more scrutiny than it usually gets. The progress on SWE-bench from roughly four percent in 2023 to seventy or ninety percent by 2026 is real and genuinely impressive, and it would be wrong to wave it away. It is also measuring something narrower than the impression it creates, and the gap between what the benchmark tests and what production demands is where a lot of disappointed expectations are born.
A benchmark is a controlled exam. The tasks are curated, the success criteria are crisp, the problem is well-defined, and there is a clear right answer the system either reaches or does not. Production software work is almost the opposite on every axis. The requirements are ambiguous and change midway, the success criteria are contested, the relevant context lives in people’s heads and in years of undocumented decisions, and “correct” is a judgement rather than a test that passes. A system that resolves a high share of clean, self-contained benchmark issues has demonstrated competence on the part of the work that was already the easiest to specify, and that is not the part where engineering teams spend their hardest hours. A high benchmark score proves the tool can do well-defined work, which was never the part of the job that most needed help.
The harness problem compounds the gap. The recurring lesson that an agent is the model plus its scaffolding means that a benchmark result reflects a particular harness tuned for that benchmark, and the same model dropped into a different environment, your repository, your build system, your idiosyncratic conventions, can perform very differently. Scores achieved in a benchmark’s controlled harness do not transfer cleanly to a codebase the harness was never built for, and the transfer loss is rarely advertised. Benchmark saturation and contamination make the picture murkier still, since once a benchmark becomes a target, results drift upward for reasons that have as much to do with optimisation against the test as with genuine capability, and the possibility that test-like problems leaked into training data is a standing concern that inflates apparent performance.
The most important gap is the one the averages conceal. A tool that succeeds on eighty percent of tasks fails on the other twenty, and in production the cost is not distributed evenly across that split. The failures are unpredictable, they cluster on exactly the novel and complex cases where the stakes are highest, and you cannot tell in advance which task will land in the failing fifth. A reliability profile of eighty percent success with unpredictable, high-cost failures is a very different thing from a tool you can trust, because trust depends on knowing when the tool will fail, not merely on how often. A junior who is right eighty percent of the time and flags their uncertainty is useful; a system that is right eighty percent of the time with total confidence on every answer is a hazard, because the twenty percent arrives looking exactly like the eighty.
The demo deepens the illusion in the other direction. Vendor demonstrations are curated to show the tool at its best on tasks it handles well, and they are real, not faked, which is what makes them persuasive. The distance between a demo that works once and a system that works consistently across thousands of varied real cases is the entire difference between a marketing artifact and a dependable tool, and it is precisely the distance benchmarks and demos are worst at measuring. The practical posture that follows is the one experienced teams arrive at the hard way. Treat benchmark scores as a ceiling on plausible capability rather than a promise of delivered reliability, discount demos as best-case theatre, and run the only evaluation that actually predicts how a tool will behave for you, which is trying it on your own representative work and measuring what happens. The published numbers tell you the tool is capable of impressive things. They do not tell you it will be reliable on yours, and conflating the two is a recurring way that AI adoption decisions go wrong, which is one more reason the genuinely open questions about agent reliability matter as much as they do.
Vendor lock-in and the cost of building on someone else’s model
Beneath the question of whether to use AI sits a quieter strategic question that few teams asked carefully in the rush to adopt: what does it mean to build the daily operation of your engineering on a capability you rent from someone else and do not control? The convenience of calling a model through an interface hides a dependency that, once it runs through enough of a team’s workflow, becomes very hard to unwind, and the costs of that dependency are real even when they never make the headlines.
Pricing is the most concrete of them. Running capable models at scale is not free, and the token and inference costs that were a rounding error during experimentation become a serious budget line once AI is woven into everyday development and operations. That cost sits on someone else’s price list, and the provider can change it. A team that built its productivity on a particular price point inherits the risk that the price moves against it, and the surging spend that engineering leaders already track is the early version of a question that will sharpen: how much is this actually costing, and what happens to our economics if the rate changes. The cheapest moment to depend on a vendor is the moment before you actually need them, and pricing power tends to follow dependence rather than precede it.
Behaviour risk is subtler and catches teams off guard. Models are updated, and an update can change how a model responds to the same input, so a prompt, a workflow, or an automation that was carefully tuned against one version can quietly degrade or break when the version behind it shifts. The team did not change anything, and the behaviour changed anyway, because the thing they built on is not stable in the way a library with a pinned version is stable. Deprecation is the harder edge of the same problem, where a model a workflow depends on is retired on the provider’s schedule rather than the customer’s, forcing a migration no one planned. Availability adds a third dimension, since outages, rate limits, and capacity throttling at the provider become outages in your own operation, and a team that routed critical work through an external model has handed part of its reliability to a system it cannot see inside.
These risks converge into lock-in. Once prompts, tooling, integrations, and habits are built around one provider’s model and its particular quirks, the cost of switching rises steadily, and what felt like a reversible choice during a trial becomes a structural commitment that is expensive to exit. Dependence accumulates quietly through a hundred small integrations, and by the time switching looks attractive, the switching cost has grown to match. The strategic exposure is not hypothetical; it is the same dynamic that has shaped every platform dependency in computing, applied to a capability that now sits close to the centre of how software gets built.
The hedges are known, and they connect directly to the democratisation that cheaper and openly available models created. Teams that take the risk seriously build an abstraction layer between their work and any single provider, so that swapping the model behind the interface is a configuration change rather than a rewrite. They keep more than one provider viable, treating multi-model flexibility as insurance against pricing, behaviour, and availability shocks. And the most exposed or most cost-sensitive operations increasingly run open-weight models on their own infrastructure, accepting more operational burden in exchange for control over pricing, stability, and data, which removes the dependency entirely for the workloads that warrant it. The underlying decision is the old build-versus-buy question wearing new clothes, and it has no universal answer, only a trade-off each organisation has to weigh: the speed and capability of renting a frontier model against the control and resilience of owning a smaller one. How that trade-off resolves across the industry, and whether the convenience of dependence outweighs its accumulating risks, is one more thing the next few years will decide rather than something today’s evidence can settle.
The questions the evidence cannot answer yet
An honest analysis has to mark the edge of what is known, and the field is full of genuinely open questions that the current evidence cannot settle, no matter how confidently various parties answer them. Naming them is not a hedge; it is the difference between analysis and prophecy, and the people making large bets on AI in IT deserve to know which parts rest on data and which rest on guesswork.
The largest open question is whether the organisational productivity gains will ever match the individual experience. The studies are clear that individuals feel faster and that companies mostly have not seen the financial return, and the optimistic reading is that this is an early-stage lag that disciplined organisations will close as they adapt. The pessimistic reading is that the gap is structural and that AI’s gains are real but smaller than the spending assumes. The evidence to date genuinely does not distinguish between a lag that will close and a ceiling that will not, and anyone who claims certainty either way is reasoning past the data.
A cluster of questions surrounds the agents. Whether they will become reliable enough for genuinely unsupervised production work, or whether the need for human oversight is a durable feature rather than a temporary limitation, is unresolved, and the answer shapes everything from headcount to architecture. The deleted database and the vibe-coding hangover suggest the supervision requirement runs deep; the benchmark progress from four percent to ninety percent suggests caution about betting against further gains. Both readings are defensible, which is precisely the point.
The human questions are the hardest to study and the most consequential. Whether years of heavy AI reliance measurably erode the underlying skills that let professionals evaluate AI output is a real worry with little long-term data behind it yet, because the practice is too new for the longitudinal evidence to exist. Whether the thinning junior pipeline produces a genuine shortage of senior engineers a few years out, or whether AI changes the path to seniority enough to make the worry moot, cannot be known until it happens. Whether the falling trust in AI accuracy stabilises into healthy calibration or curdles into broader disillusionment is open. And the deepest question, whether the competitive logic that makes opting out of AI so costly today could ever reverse, has no answer, because it depends on technological, economic, and even cultural developments that no one can currently foresee, including the possibility of a backlash that re-values human-made work or a capability plateau that levels the field. Liability law is still settling, the environmental cost of large-scale inference at this tempo is still being reckoned with, and the question of whether current capability is a plateau or a waypoint is exactly the kind of thing the field has been wrong about before. The responsible position is to act on what the evidence shows, which is substantial, while holding these open questions open, rather than collapsing them into whichever answer is most convenient. That posture, applied to the metaphor that opened this analysis, is what produces the only verdict the evidence can actually support.
The human cost the productivity numbers leave out
Almost every measurement in this field tracks output, speed, adoption, or money, and almost none of it tracks how the work feels to the people doing it, which leaves a real part of the story uncounted. The DORA research is one of the few sources that looked, and what it found complicates the optimistic narrative in a quiet, human way. Higher AI adoption correlated with greater individual effectiveness and, in the 2025 data, with a stronger sense of pride in the work, but friction and burnout stayed roughly unchanged. The tool made people feel more capable without making the job less draining, which is not the outcome the productivity story implies.
The reason is rooted in the role shift this analysis kept returning to. Writing code, for many engineers, was the rewarding part, the absorbing flow state where hours pass unnoticed and a hard problem yields to focused effort. Reviewing machine-generated code is a different and more taxing mode of work. It demands constant vigilance, a sustained skepticism toward output that looks correct, and the cognitive labour of holding a system’s context in mind while checking someone else’s reasoning, except the someone else is a model that gives no account of why it did what it did. The job moved from the satisfying work of creating toward the draining work of verifying, and verifying is harder to sustain because it offers less of the intrinsic reward that made the old work bearable.
A subtler tax comes from the trust erosion documented across the surveys. When a developer cannot fully trust the output, every suggestion carries a small ongoing cost of evaluation, a low-grade vigilance that never switches off. Researchers studying automation in other fields have long described the fatigue that comes from monitoring a system that is usually right but occasionally, unpredictably wrong, because the human can never relax into either trusting or ignoring it. That same vigilance fatigue now sits inside the daily experience of working with AI, and it is precisely the kind of cost that output metrics cannot see.
The pace pressure compounds all of it. If AI makes everyone faster, the baseline expectation rises to match, and the time the tool frees up does not always become slack; it often becomes more work expected in the same window. The treadmill speeds up, the individual runs harder to stay in place, and the promised relief from drudgery turns into a higher quota of the work that remains. The productivity gain can be captured by the organisation as higher output rather than returned to the worker as easier days, which is one plausible reading of why individual effectiveness rose while burnout did not fall.
None of this argues against using AI, and it would be a mistake to read it that way. It argues for honesty about what the tools do and do not solve, and for attention to a dimension the dashboards ignore. A team that adopts AI and treats the freed time purely as capacity to extract will get the output and the unchanged burnout the data describes. A team that uses some of that freed time to reduce pressure, protect focus, and let people do the more interesting work the tool uncovered will get a genuinely better experience, not just a faster one. The choice is not made by the technology. It is made by the organisation around it, which is the same lesson, in a more human register, that runs through everything else in this analysis. The professionals weighing whether to engage with AI at all should know that the honest answer includes this: the tools will likely make the work faster and may not make it lighter, and whether the difference lands as relief or as a higher treadmill speed depends on decisions that have nothing to do with the model and everything to do with the people in charge of how it is used.
The part of the metaphor that holds up
After everything the evidence complicates, the comparison between doing IT work without AI and travelling without a vehicle survives on the one axis it was really built to test, which is the competitive one. On that axis it is close to exactly right. A professional or a team can do the work without AI, the way a person can cross a country on foot. Nothing stops them, and they will arrive. They will simply arrive later, having spent more effort, while almost everyone around them moves faster, and in a field defined by speed and relentless competition, arriving later is a steadily heavier penalty. The title’s claim is the honest one: doing IT work without AI in 2026 is possible, and for most of the sector it is increasingly pointless, in the precise sense that the cost of the choice keeps rising while the benefit keeps shrinking.
The metaphor earns that verdict because the adoption data is no longer arguable. Ninety percent of developers using AI daily, eighty-five percent in independent surveys, a median of two hours a day, near-total penetration in startups, AI woven into testing, review, documentation, operations, and security, and a hiring market that now names AI skills in nearly half of job postings together describe a profession that crossed a threshold. The default flipped. Using AI stopped being the choice that needed explaining, and not using it became the one that does. For the ordinary work of most teams, the walker is now genuinely alone on the road.
What the metaphor cannot capture, and what this analysis spent most of its length on, is everything that makes AI in IT stranger and harder than a faster car. The vehicle is not neutral about the destination, because its output is the destination, and that output can be wrong, insecure, or unmaintainable in ways a car ride never is. It is not fully under the operator’s control, because it can act in ways no one intended. It loads hidden cargo into the trunk in the form of technical debt, security vulnerabilities, and data exposure that surface long after the apparent speed-up. It can quietly erode the very skills that let a professional judge its output, the way relying on the car would let the walker’s fitness fade. And the data is unambiguous that buying the vehicle does almost nothing on its own, since AI multiplies whatever an organisation already is, helps individuals feel faster than they measurably are, and shifts the bottleneck from writing to reviewing rather than removing it. The metaphor is a provocation that gets people to take the threshold seriously. It is a poor blueprint for what to actually do once they are across it.
The synthesis that the whole body of evidence supports is more demanding than either the boosters or the refusers want it to be. AI in IT is a powerful and imperfect collaborator that reshapes the work rather than merely accelerating it, rewards judgement over routine production, multiplies sound foundations and magnifies broken ones, and relocates risk to places its users do not naturally watch. The professional who engages with it through a discipline, using it where it is strong, checking it where it is weak, distrusting the feeling of speed, protecting their fundamentals, handling data responsibly, and owning what they ship, comes out genuinely ahead. The professional who accepts its output uncritically generates faster failure, and the one who refuses it wholesale pays a competitive cost that, outside the genuine constraint cases, almost never makes sense. The tool does not decide the outcome. The discipline around the tool does, which is the most consistent finding across every study, every sector, and every failure mode in this analysis.
That leaves the holdouts in a smaller and more specific position than the metaphor’s blunt verdict suggests, and the precision matters. The engineer on a classified system, the team bound by data-residency law, the senior developer who has measured that AI slows them on the complex work they know best, and the professional deliberately protecting a skill they refuse to let atrophy are all making defensible choices, and lumping them in with simple resistance does them an injustice. The position that the evidence does condemn is the blanket refusal to engage with the tools an entire field now runs on, made on principle rather than circumstance, because that is the choice that combines the competitive cost of walking with none of the legitimate reasons to walk.
The cheap, fast, and increasingly private nature of the tools removed most of the remaining excuses, and the speed of the whole transition, from a novelty in late 2022 to a baseline three years later, left the profession adapting in real time to something it has not fully metabolised. The craft will adapt, as it adapted to the compiler, the IDE, the search engine, and the cloud, and the engineers who move up the abstraction ladder will thrive while those who refuse it are displaced, exactly as before. The difference this time is that the new rung is probabilistic and occasionally gives way, so the adaptation includes the genuinely novel skill of working productively with a tool that cannot be fully trusted. The right relationship to AI in IT is therefore neither the surrender the boosters sell nor the abstinence the holdouts defend, but the calibrated, skeptical, disciplined use of a tool that is too powerful to ignore and too unreliable to trust blindly. Travel without a vehicle is possible. Almost no one chooses it. The instructive part is not that the walker is slow, which is obvious, but that the vehicle everyone else is driving is faster, less predictable, and stranger than the simple comparison lets anyone see, and the professionals who understand that, rather than the ones who merely climbed in, are the ones the next decade will reward.
Questions IT professionals keep asking about working with AI
Possible, yes; pointless, mostly. For the ordinary work of most teams the competitive cost of refusing AI keeps rising while the tools get cheaper and more capable, so opting out increasingly means moving slower than everyone around you for no gain in quality. The real exceptions are the genuine constraint cases, such as classified systems, heavily regulated data, or specific complex tasks where AI has been measured to slow an expert down, where refusing is a sound choice rather than a stubborn one.
Independent surveys put it between roughly eighty-five and ninety percent. Google Cloud’s DORA research reported about ninety percent of developers using AI, JetBrains found around eighty-five percent regular use, and Stack Overflow found about eighty-four percent using or planning to use the tools. The adoption is near-universal, with only a small minority holding out.
Sometimes, and less reliably than the marketing claims. The gains are real on contained, well-specified tasks such as boilerplate, tests, and unfamiliar-language snippets, but a controlled study by METR found experienced developers were about nineteen percent slower on large codebases they knew well, despite feeling faster. The honest answer is that AI helps on specific work and that the feeling of speed is an unreliable guide to actual speed.
It is the gap between how fast AI feels and how much it measurably improves delivery. Individuals report large productivity gains while company-level results stay modest, because AI accelerates writing code but not the reviewing, integrating, and verifying that follow, and the extra output piles up at the review stage. Telemetry showing pull request review time rising sharply alongside higher output captures the effect.
GitHub Copilot is the most widely adopted, with roughly twenty million users, followed by AI-first editors and agents such as Cursor, Claude Code, and OpenAI’s Codex. Each took a different approach, from IDE assistant to terminal agent to multi-surface platform, and the market is large, competitive, and still consolidating.
An assistant suggests code and waits for the human to act, while an agent works more autonomously, planning a task, editing across multiple files, running tests, fixing its own errors, and opening a pull request for review. The shift from assistants to agents is the main capability change of the past two years, and agents are best treated like a capable junior whose work is always reviewed.
Not by default. Generated code reproduces the insecure patterns present in its training data and presents them with fluent confidence, so it needs more review than hand-written code, not less. National security agencies have warned that AI assistants used without oversight can introduce vulnerabilities, and new attack vectors such as slopsquatting exist specifically because models invent things that are not real.
Vibe coding means building software by describing what you want and accepting the AI output without reading or fully understanding it. It is genuinely useful for prototypes, demos, and throwaway tools where the consequences are low, and dangerous for production systems someone has to maintain, as shown when an AI agent deleted a production database during one such session.
It is a supply-chain attack that exploits AI hallucinations. Models sometimes recommend installing software packages that do not exist, and attackers register those invented names with malicious code inside, so a developer who copies the AI suggestion pulls in the attacker’s payload. It is a fresh reason to verify every dependency before installing it.
It is reshaping the work more than eliminating it, though the effect is uneven. Entry-level hiring fell sharply while demand for experienced, AI-augmented engineers held up, and official projections still show overall growth in software employment through the early 2030s. The roles are changing rather than vanishing, with new categories appearing in AI engineering, data engineering, and AI security.
Entry-level software postings dropped roughly twenty-eight percent from their 2022 peak, driven by a mix of post-boom correction, global remote hiring, and AI absorbing the routine work juniors used to do. The short-term logic of skipping junior hires is sound for any single company and collectively risks starving the pipeline of future senior engineers.
The abilities AI does poorly and the human must supply: system design and architecture, critical review and evaluation of machine output, security awareness, deep domain knowledge, and clear communication, plus fluency in directing the tools themselves. The skills losing value are the routine production tasks AI now handles, so the move is toward judgement and away from volume.
It can, if AI is used to avoid the thinking that builds competence rather than to accelerate it. The defence is deliberate practice, including choosing sometimes to do work the long way to keep the underlying skills sharp, and using AI to explain and teach rather than only to produce. The goal is to finish each AI-assisted task understanding more, not less.
It means AI amplifies whatever a team already is rather than fixing it. Strong teams with clean process and architecture use AI to get better, while weak teams use it to produce poor work faster, and the research found AI still increases delivery instability even as it raises throughput. The summary that speed without stability is just accelerated chaos captures the warning to leaders.
Treat every prompt as company data leaving the building, because that is what it is. Use sanctioned tools with contractual data protections, keep secrets and regulated data out of prompts entirely, and run models privately on your own infrastructure for the most sensitive work. The unsanctioned use of consumer tools, often called shadow AI, is where most real data exposure happens.
At the organisational level, mostly not yet. A widely cited study found about ninety-five percent of surveyed companies reported no revenue improvement from AI, because returns come from embedding the tools in sound processes rather than from buying them. At the individual level the economics are very different, since a per-seat tool costs a tiny fraction of a salary and pays for itself with even a modest genuine gain.
In operations, AIOps cuts mean time to resolution by roughly forty to sixty percent and is moving toward agents that run infrastructure under human supervision, with Gartner projecting about seventy percent of enterprises using such agents by 2029. In security, AI is both shield and sword, slashing investigation times for defenders while arming attackers with faster, more convincing automated attacks, though current research suggests it favours defenders on balance.
Wholesale refusal is rarely justified, but selective, contextual refusal often is. A professional who declines AI on classified systems, on regulated data, or on specific complex work where it has been measured to slow them down is making a sound choice, while refusing to engage with the tools an entire field now runs on, purely on principle, carries a competitive cost that usually makes no sense. The skill is knowing which situation you are in.
Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

This article is an original analysis supported by the sources cited below
Announcing the 2025 DORA report Google Cloud’s announcement of the 2025 DORA findings, including the roughly ninety percent developer adoption figure and the analysis of AI’s effect on software delivery throughput and stability.
DORA report 2025 on AI and software development Google’s overview of the 2025 DORA research, summarising the survey of nearly five thousand professionals and the headline conclusions about adoption and team performance.
The State of Developer Ecosystem 2025 JetBrains’ annual developer survey covering tens of thousands of developers, with data on AI assistant adoption, regular use, and the share of developers not yet using the tools.
Stack Overflow Developer Survey 2025 The widely referenced developer survey documenting AI usage rates and the notable decline in positive sentiment and trust toward AI tools through 2025.
METR research on AI and experienced developers The nonprofit research institute behind the randomised controlled trial that found experienced open-source developers were slower with AI tools on familiar codebases despite feeling faster.
Vibe coding Reference overview of the practice named by Andrej Karpathy in early 2025, its uses and risks, and notable incidents including AI agents making destructive changes.
AI-assisted software development Background on the tools, techniques, and history of applying AI to software engineering, including assistants, agents, and the benchmarks used to measure them.
Slopsquatting Explanation of the supply-chain attack that exploits AI package-name hallucinations, including how it works and why it emerged as a new software security risk.
Codex (AI agent) Reference material on OpenAI’s Codex agent, its evolution across multiple surfaces, and its reported growth in weekly active users.
The state of AI security in 2026 An industry analysis of how AI is reshaping both cyber defence and attack, including reduced investigation times and the faster pace of breaches.
Artificial intelligence in cybersecurity Fortinet’s reference on AI’s defensive applications in security operations, threat detection, and automated response.
Gartner predicts AI agents will transform IT operations Coverage of the Gartner prediction that around seventy percent of enterprises will use agentic AI to operate IT infrastructure by 2029, up from under five percent in 2025.
The AI effect on entry-level jobs IEEE Spectrum’s reporting on the decline in entry-level technology hiring and the data on younger developers’ employment since 2022.
The software engineering job market in 2026 An analysis of how hiring shifted toward aptitude over syntax, the rise of assessments and proctoring, and the demand for AI-related skills.
Applications of artificial intelligence Broad reference on where AI is applied across industries and IT functions, useful for the sector-by-sector context.
AI bubble Background on the scale of AI investment, market concentration, valuation concerns, and the early-2025 pricing shock from a low-cost competing model.
AI’s mirror effect and the 2025 DORA report An interpretation of the DORA finding that AI amplifies an organisation’s existing capabilities, including the throughput and instability dynamics.
Enterprise AI coding assistant adoption and scaling Faros telemetry across more than a thousand teams, including the data on increased tasks and merged pull requests alongside sharply longer review times.
Agentic coding in 2026 An overview of the agentic coding tools and capabilities defining the current generation of AI-assisted development.
AI coding assistant statistics A compilation of adoption, usage, and trust statistics for AI coding assistants drawn from multiple industry surveys.
What the research shows about AI coding assistant productivity A review of the evidence on real productivity effects, including the gap between perceived and measured gains and the limits of vendor claims.
2025 AI metrics in review Jellyfish’s analysis of AI coding adoption growth across companies and the share of organisations where AI authors a significant portion of code.















