Claude Science bets on the workflow, not a smarter model

Claude Science bets on the workflow, not a smarter model

On June 30, 2026, Anthropic put a new application in front of scientists and called it Claude Science, an AI workbench built for research rather than for chat. The reveal came at an invitation event in San Francisco called The Briefing: AI for Science, staged at the Yerba Buena Center, where CEO Dario Amodei shared the room with pharmaceutical leaders including Novartis chief executive and Anthropic board member Vas Narasimhan, Bristol Myers Squibb chief executive Chris Boerner, GLP-1 drug pioneer Lotte Knudsen, and Genentech executive vice president Aviv Regev. The product lead, Alexander Tarashansky, ran an extended live demo on stage. The framing Amodei chose was blunt and repeated by almost every outlet in the room: do for life science what Claude Code did for software engineering — turn a general-purpose model into a craft tool that a researcher opens every working day.

Table of Contents

The launch that turned Claude into a lab instrument

Claude Science is available immediately in beta to anyone on a Claude Pro, Max, Team, or Enterprise plan. It runs on macOS and Linux, and it uses Anthropic’s existing models, including Claude Opus 4.8 and the newly released Sonnet 5. That last point matters more than it sounds, and it is where most of the analysis in this piece will keep returning: Claude Science is not a new model. It is not a version of Claude that is secretly better at biology, and it does not sit behind a special access gate. It is a work environment wrapped around the same models the company already sells, plus a set of connectors, agents, and reproducibility features designed for the specific texture of scientific work.

The description Anthropic gave is that Claude Science integrates the tools and packages researchers most commonly use, produces auditable artifacts, and provides flexible access to computing resources. Underneath that sentence is a coordinating agent that a scientist talks to in plain language, with access to more than 60 curated skills and connectors already configured for genomics, single-cell analysis, proteomics, structural biology, and cheminformatics. That coordinating agent can spin up specialist sub-agents, hand tasks to specialist agents a user has built, and run a separate reviewer agent whose only job is to check the work — verifying citations and calculations, flagging errors, and correcting them as the pipeline moves.

What the app is trying to remove is friction, not thinking. Amodei described the ambition to STAT and Reuters in general-purpose terms, saying the tool is meant to help researchers make sense of that complexity, in its full complexity, better. The company is careful about the size of the claim. In his October 2024 essay Machines of Loving Grace, Amodei predicted that AI-enabled biology could compress fifty to a hundred years of progress into five to ten. At the June event he did not declare that this compression had arrived. He said he hopes to see some success over the next year in using AI to find new targets for drug discovery, and framed the launch as a step toward a longer goal rather than the goal itself.

The reception was a mix of genuine interest and healthy skepticism, and the market reaction was immediate. Shares of AI-adjacent drug-discovery companies fell on the news, with Schrödinger dropping as much as 8.3 percent intraday, Recursion Pharmaceuticals off about 3.3 percent, and IQVIA down more than 2.3 percent at points during the session. That is the shape of the story: a working tool that arrives cheap and wide, aimed at the daily grind of computational research, launched by a company weeks away from one of the largest technology IPOs ever attempted, into a field where three of the most powerful AI labs on earth are already fighting over the same scientists.

Anthropic’s route from chatbot to research bench

Claude Science did not appear from nowhere. It is the visible endpoint of roughly eight months of deliberate building that moved Anthropic from a horizontal model provider into a company with a dedicated life-sciences division and its own products for the sector. The first clear marker was Claude for Life Sciences, launched on October 20, 2025. That release was not a separate application. It was the Claude Enterprise stack augmented with MCP connectors and biomedical skills that let the model reach into the systems pharmaceutical teams already run — Benchling, 10x Genomics, PubMed, BioRender, Synapse.org, and others — so the assistant could help with statistical code, paper summaries, and hypothesis generation without the researcher leaving those systems.

Anthropic signed anchor customers at that launch that read like a who’s-who of large pharma: Novo Nordisk, Sanofi, AbbVie, AstraZeneca, and Genmab. Over the following months the platform expanded past preclinical discovery into clinical-trial operations and regulatory work, adding connections to Medidata, ClinicalTrials.gov, and other resources, in a stated push toward an end-to-end toolkit that covers early discovery through translation and commercialization. In January 2026 the company introduced Claude for Healthcare, a HIPAA-compliant environment with features for drafting clinical-trial protocols and preparing regulatory submissions.

The next move was capital, not code. In April 2026 Anthropic acquired Coefficient Bio, a stealth New York startup with fewer than ten employees, in a stock deal reported at around $400 million. Coefficient’s founders came out of serious machine-learning-for-biology work: chief executive Aris Theologis, previously chief business officer at Evozyne, alongside co-founders Nathan Frey and Samuel Stanton, both formerly at Roche’s Genentech drug-discovery unit Prescient Design. The team folded into Anthropic’s healthcare and life-sciences group under Eric Kauderer-Abrams, the company’s head of biology and life sciences, who has said the goal is for a meaningful share of the world’s life-science work to run on Claude. The purchase functioned as an acqui-hire that bought domain expertise in protein design, biomolecule modeling, and drug-development planning.

Around the same time, the governance layer tightened its ties to the industry. Novartis chief executive Vas Narasimhan joined Anthropic’s board of directors in April 2026, selected by the company’s independent Long-Term Benefit Trust, which added a sitting big-pharma leader to the group overseeing a company whose products increasingly serve big pharma. Earlier in February, Anthropic announced founding life-sciences research partnerships with the Allen Institute and the Howard Hughes Medical Institute, both of which show up later as testing grounds for the workbench.

Read as a sequence, the pattern is consistent. A vertical product in October, a healthcare environment in January, a partnership announcement in February, an acquisition in April, a board seat in April, and a flagship product launch at the end of June, all pointing at the same market. Claude Science is the piece that turns a set of connectors and skills into a place to actually do the work, and it is the piece Anthropic clearly intends to be the operating layer for computational biology the way Claude Code became the operating layer for software teams.

The fragmentation problem the product is built to attack

To understand why a workbench is even a product, it helps to look at how a computational biologist actually spends a day. Very little of that day is discovery. Most of it is plumbing. A single question — does this gene’s expression change across these cell types, and is the pattern consistent with a known pathway — can require a researcher to pull sequences from one database, structures from a second, variant annotations from a third, and expression matrices from a fourth, each with its own schema and query language. The files arrive in formats that need bespoke pipelines and specialized viewers. The analysis runs in a Jupyter notebook or in R. The heavy computation gets shipped to a cluster over a terminal. The literature lives in PubMed and on preprint servers. Nothing talks to anything else without glue code that the scientist writes, debugs, and forgets.

Anthropic’s own framing names the tools by hand: PubMed, Jupyter, R, a cluster terminal, and more. The company’s argument is that the switching cost between these tools is a tax on research, and that the tax is largely invisible because everyone has always paid it. A scientist who spends three days wrangling file formats and cluster job scripts to run an analysis that takes twenty minutes of actual compute has lost three days that had nothing to do with biology. Multiply that across a lab, a year, and a field, and the waste is enormous. The bottleneck is not intelligence. It is coordination.

This is the specific claim that separates Claude Science from the model-first strategies of its rivals. Anthropic is not betting that the field needs a smarter reasoner about proteins. It is betting that the field needs a single environment where the reasoner it already has can reach every tool, hold the context of a long analysis in memory, and carry the work from a literature question through to a figure ready for a manuscript without the researcher ever assembling the pipeline by hand. Claude Science is pitched as the place where all stages of a project happen in one conversation — literature analysis, multi-step research, data processing, visualization, and manuscript drafting — with an auditable history attached to every output.

The comparison Anthropic keeps drawing is to a Jupyter Notebook, and it is a useful one. A notebook is not a model. It is an interface that meets scientists where they already work, mixing code, results, and prose in one document. Claude Science borrows that idea and extends it: you can run it locally on your own machine, or on a remote box over SSH, or on the login node of a high-performance computing cluster, so it sits inside the infrastructure a lab already trusts rather than pulling everything out to a vendor’s servers. The design decision is telling. Anthropic is trying to be adopted the way a notebook is adopted — quietly, by individual researchers, on their own hardware — rather than sold top-down as a platform migration.

There is a limit to how far the analogy carries. A notebook does exactly what you type. Claude Science acts. It plans, reaches for resources, writes and submits jobs, and edits its own code in response to plain-language instructions. That shift from a passive document to an active agent is the source of both the productivity claim and the reliability risk, and it is the tension that runs through the rest of this analysis.

A coordinating agent, specialists, and a built-in critic

The architecture under Claude Science is a multi-agent system, and the shape of it is worth describing precisely because the shape is the product. A researcher interacts with one generalist coordinating agent. That agent behaves less like a chatbot and more like a project manager: it interprets a high-level request, decides which resources and skills the task needs, and delegates. It has access to the 60-plus curated skills and connectors, and it can spin up specialist sub-agents for narrower jobs — one to query a genomics database, another to run a structure prediction, another to assemble a figure. It can also hand work to specialist agents that a user has built and saved, which means the system is extensible by the people using it rather than fixed at launch.

The most consequential piece is the reviewer agent. As a pipeline runs, a separate agent inspects the outputs. It checks citations against their sources, flags numbers it cannot trace back to the data, and catches figures that do not match the code that supposedly produced them, correcting itself as it goes. Anthropic describes this as an actor-critic arrangement, borrowing a term from reinforcement learning: one agent generates content, a second evaluates it for accuracy and citation fidelity. In the Allen Institute case study, this pattern is explicit — one agent writes a section of a review, a separate reviewer agent evaluates that section before it is accepted.

The reason to build a critic into the system is not academic. AI-assisted writing has pushed fabricated citations and untraceable statistics into a growing share of scientific and technical documents, and a research tool that produced confident, well-formatted, wrong output would be worse than useless in a field where a single incorrect number can invalidate months of work. The reviewer agent is Anthropic’s technical answer to that problem, and it is fair to say the credibility of the whole product rests on how well it works. A coordinating agent that orchestrates scientific computing and touches genomic data has to be reliable at the level a peer-reviewed publication demands. The critic is the mechanism meant to get it there.

There are two structural advantages to keeping context in a running session. First, because the agents work inside a live session that holds state in memory, even massive datasets only need to be loaded once rather than re-read at every step. A genomics matrix that takes real time to load stays loaded while the agent works through a chain of analyses on it. Second, the researcher can fork the session at any point — branching to try a second approach to the same problem without destroying the original thread, then comparing the two. That is a workflow scientists understand from version control, applied to an interactive analysis.

Delegation and self-checking are also where the honest caveats live. A system that spins up sub-agents and lets them call tools autonomously is harder to audit than a single deterministic script. The reviewer agent reduces that risk but does not eliminate it, and Anthropic has released the product as a beta precisely so that researchers stress-test the delegation and the checking on real problems. The company’s own message is that it is sharing the tool early so scientists can use it on real work and tell it how to refine the system. Treating the multi-agent orchestration as a hypothesis to be validated, rather than a finished guarantee, is the correct posture for anyone adopting it in the first months.

Reproducibility carried inside every artifact

The feature Anthropic leans on hardest, and the one most likely to matter for adoption in serious labs, is reproducibility. Scientific research is judged not only on results but on whether those results can be reconstructed, and the reproducibility crisis in several fields has made provenance a live concern rather than a bureaucratic one. Claude Science attaches an auditable history to every output. When it generates a figure, it does not hand back an image alone. It packages the figure with the exact code and the computing environment that produced it, a plain-language description of how it was made, and the full message history of the conversation that led there.

The practical value of this is easy to state. A collaborator, a reviewer, or the original author six months later can open the artifact and see precisely how the number on the axis was computed, which version of which package ran, and what the researcher asked for at each step. The work is easier to validate and reproduce because the recipe travels with the result. In a field where a wrong figure can cost months and, in a drug program, millions, that provenance is not a nicety. It is the difference between an output a lab can defend and one it cannot.

The editing model reinforces the same principle. A scientist can annotate a figure or a manuscript in-line and talk to the agent about any detail, asking for changes in plain language — remove the gridlines, switch an axis to a log scale, drop an outlier. The agent does not paint over the image. It reads the code that generated the figure and edits that code, then regenerates. The output stays consistent with its source because the source is what actually changed. This is a meaningful distinction from a general image tool that would edit pixels and leave the underlying analysis untouched and now inconsistent with the picture.

The reproducibility story extends to how compute is handled, which the product treats as part of the audit trail rather than a separate concern. Anthropic’s framing is that Claude Science generates figures and manuscripts alongside the code that created them, and does the same for the environment and the data path. The reviewer agent then polices consistency across all of it, flagging figures that do not match their underlying code as one of its explicit checks. The intent is a closed loop: every visible artifact points back to a reproducible chain, and an automated critic guards the chain against drift.

For teams that already invest heavily in reproducibility infrastructure — pinned environments, workflow managers, data versioning — the honest question is whether Claude Science’s built-in provenance is rigorous enough to satisfy their standards, or whether it becomes one more layer to reconcile with the systems they already run. For teams that have no such infrastructure, and there are many, the built-in provenance is a genuine upgrade over the status quo of undocumented notebooks and forgotten parameters. The value of the reproducibility features scales inversely with how disciplined a lab already is, which means the labs most likely to feel the benefit are exactly the ones that most need it.

Rendering biology natively instead of describing it

Scientific work is visual in a way that most knowledge work is not, and a tool that could only describe a protein in words would be close to useless at the bench. Claude Science renders rich scientific artifacts natively: three-dimensional protein structures, genome browser tracks, chemical structures, and more, displayed inside the workbench rather than exported to a separate viewer. A researcher asking about a protein sees the protein. A question about a genomic region produces a browser track. A cheminformatics query returns a chemical structure drawn in the interface.

This matters for two reasons that are easy to underrate. The first is speed of judgment. A scientist evaluating a structure prediction or an expression pattern makes a fast visual assessment that is hard to replicate from a table of numbers. Putting the render in the same place as the analysis collapses the loop between asking and seeing. The second is trust. A native render tied to the code that produced it is harder to fake or misread than a screenshot pasted from somewhere else, and it keeps the visual and the underlying data in the same provenance chain that the reviewer agent polices.

The rendering also drives the plain-language editing described earlier. Because the figure is generated from code the agent controls, a visual change and a code change are the same action. Ask for a cleaner axis and the agent rewrites the plotting code; the new figure is not a retouched version of the old one but a fresh output of corrected code. For anyone who has spent an afternoon manually adjusting a matplotlib figure for a manuscript, the appeal is obvious, and the reproducibility benefit is real because the figure never drifts away from its source.

There is a domain-specific point here that separates this from generic data visualization. Rendering a 3D protein structure or a genome track is not the same as drawing a bar chart. It requires the tool to understand structural biology and genomics file formats, coordinate systems, and conventions, and to connect to the specialized libraries and models that produce those objects. Claude Science reaches that capability through its connections to domain resources and model libraries rather than reinventing them, which is consistent with the whole design philosophy: connect to the validated tools scientists already trust rather than replace them with something new and unproven.

The limitation worth naming is that native rendering is only as good as the underlying models and the data feeding them. A beautifully rendered protein structure that came from a low-confidence prediction is still a low-confidence prediction, and the visual polish can create false confidence if a researcher forgets to check the provenance the tool so carefully preserves. The render is a window onto the analysis, not a verdict on it, and the discipline of reading the attached code and confidence estimates remains the scientist’s responsibility, not the tool’s.

Compute that scales from a laptop to hundreds of GPUs

The hardest part of large-scale computational research is often not the analysis but the logistics of running it. Folding a protein or pushing a genomics pipeline over a massive dataset means a researcher has to stop thinking about the science, set up a computing job, submit it to a cluster, wait, check whether it succeeded or failed, and pull the results back — a context switch that can eat a day and break concentration. Claude Science is built to handle that process end to end. It drafts a plan for the computation, and then, before it touches anything, it asks.

The permission model is deliberate and worth spelling out because it is where the tool’s autonomy meets a scientist’s control. Claude Science asks before reaching new resources, and it lets the researcher review or revoke any decision before it writes and submits a job. Only after that check does it write the batch script, submit it, and manage the pipeline. The compute itself runs on infrastructure the lab already uses: its own HPC cluster over SSH, or a Modal account for on-demand GPUs, scaling an analysis from a single GPU up to hundreds as the work requires. The researcher describes the job in plain language; the agent handles the engineering stack — batch scripts, environment containers, pipeline management — that normally requires a separate skill set.

The design choice that makes this credible for regulated and sensitive work is that the heavy data never has to move. Claude Science runs on the lab’s own machines — a laptop, a Linux box, or an HPC login node — so large or sensitive datasets stay on the systems they already live on, and only the context each step of the analysis actually needs is sent to Claude. For a lab handling clinical or patient genomic data, that is not a convenience feature. It is a precondition for using the tool at all, because moving protected datasets off-premises is often legally or contractually impossible. The architecture treats data locality as a requirement rather than an option.

Holding context in a running session pays off doubly on compute. Because the agents operate inside a live session that keeps state in memory, a massive dataset loads once and stays loaded across a chain of analyses, rather than being re-read each time a step runs. That reduces both wall-clock time and the amount of data shuffling a large workflow needs. Combined with the ability to fork a session, a researcher can load an expensive dataset once, branch to try two competing analyses on it, and compare the results without paying the loading cost twice or losing the original thread.

Modal’s role deserves a specific mention because it is part of the launch, not an afterthought. Modal provides the on-demand compute layer that lets Claude Science reach GPUs a lab does not own, and the two companies have tied the integration to the launch program directly. For labs without their own large clusters — which describes most academic groups and many smaller biotechs — the ability to burst to rented GPUs on demand, orchestrated by the agent, lowers a real barrier. The compute story is what turns Claude Science from a smart notebook into something that can actually run production-scale scientific workloads, and it is a large part of why the product is being taken seriously by teams with heavy computational needs.

Sixty databases and model libraries wired in on day one

A research tool is only as useful as the knowledge it can reach, and scientific knowledge is scattered across hundreds of specialized sources that do not share a schema, a query language, or a viewer. Claude Science arrives pre-wired to more than 60 sources through its skills and connectors, so a researcher can ask a question in plain language and let specialist agents query and synthesize across those sources rather than navigating each one by hand. In biology alone, the named resources include UniProt for proteins, the Protein Data Bank for three-dimensional structures, Ensembl and ClinVar for genomics, Reactome for biological pathways, ChEMBL for pharmaceutical chemistry, and GEO for gene-expression data, alongside journals, preprint servers, and domain-specific open models.

Each of those databases is its own small world. UniProt and PDB use different identifiers and different query conventions; Ensembl and ClinVar answer different genomic questions; ChEMBL speaks the language of medicinal chemistry. A working scientist normally holds the quirks of a handful of these in their head and consults documentation for the rest. Claude Science’s coordinating agent farms the query out to a specialist agent that knows the source, then synthesizes the answers across sources into a single response. The value is not access to any one database — those are mostly public — but the removal of the translation layer between them.

On the predictive side, Claude Science connects natively to NVIDIA’s BioNeMo Agent Toolkit, which gives the agent tools to reach the life-science models and libraries in BioNeMo. The named models include Evo 2, Boltz-2, and OpenFold3 — a genomic language model, a structure-and-interaction predictor, and an open protein-structure model respectively. This is a partnership rather than a build. Anthropic is not training its own protein-structure model to compete with these; it is wiring them in so a researcher can call them from the same conversation where they query databases and write figures. The company frames this as benefiting from its partners’ specialized platforms while more scientists reach those tools through Claude.

The extensibility is the part that turns a fixed product into a durable one. Scientists already have models, datasets, and pipelines they trust, and Claude Science can connect to those as well — saving any pipeline as a reusable skill, or reaching a lab’s preferred tool through a connector, with future sessions inheriting them automatically. That means a lab’s proprietary methods and validated internal tools become first-class citizens in the workbench, accessible in the same conversation as the public databases and the partner models. Over time, a group builds up a private library of skills that encodes its own way of working.

This is the clearest expression of the workflow bet. Anthropic is not claiming to know biology better than the field’s own tools. It is claiming to be the connective tissue that lets a researcher use Claude, their proprietary data, the public databases, and the validated open models together in one place. Whether 60 connectors is enough, whether the synthesis across sources is accurate, and whether the connectors stay maintained as the underlying databases change are open questions. But the strategy is coherent: own the coordination layer, partner for the specialized capability, and let users extend the system with the tools they already trust.

The decision not to ship a new science model

The single most important design decision behind Claude Science is what Anthropic chose not to build. There is no new biology model, no science-tuned variant of Claude, and no gated access to a special reasoner. The workbench runs on the models already available to every subscriber, including Claude Opus 4.8, released in May, and Sonnet 5. Anthropic characterized the product plainly as not a new AI model. What is new is everything around the model: the specialized agents, the connected databases, the compute orchestration, and the automated quality control.

That decision separates Anthropic from both of its main rivals, and it is a genuine strategic fork rather than a marketing distinction. OpenAI’s answer to life sciences, GPT-Rosalind, is a purpose-built, domain-optimized reasoning model. Google DeepMind’s contribution runs through owned foundational models like AlphaFold. Both companies are, in different ways, betting that the field needs a better brain for biology. Anthropic is betting that the field needs a better bench — that the models are already capable enough for most computational research, and that the returns now come from integration, provenance, and workflow rather than from another increment of raw model capability.

There is a defensible logic to this. A great deal of scientific research already involves coding, and tools like Claude Code have shown that a capable general model plus the right environment can meaningfully lift the productivity of scientists who are not expert software engineers. Anthropic’s head of life sciences has framed the aim as the tedious intermediate work of science — data analysis, annotation, coordination — rather than one-shot, AlphaFold-style discoveries. If the bottleneck really is coordination rather than intelligence, then building a workbench around existing models is the faster, cheaper, and more broadly useful move.

The decision also has commercial and safety consequences that cut in Anthropic’s favor. Commercially, using existing models means Claude Science ships inside existing plans at no extra model cost, which is how it can be offered wide and cheap rather than gated and expensive. On safety, it means the models powering the workbench have already passed the company’s biological-capability evaluations and carry its existing safeguards, rather than introducing a new, more capable biology reasoner that would need its own risk review. A workflow product layered on already-cleared models is easier to deploy responsibly than a new frontier model tuned for the exact domain where dual-use risk is highest.

The risk in the strategy is symmetric to its logic. If a rival’s specialized model turns out to be meaningfully better at the reasoning that matters — proposing targets, interpreting assays, designing experiments — then no amount of workflow polish closes that gap, and the connective tissue becomes a commodity layer that any lab can rebuild on top of the better brain. Anthropic is wagering that integration is the durable moat and model capability is the fast-commoditizing part. The next several quarters, as scientists actually compare outputs across the three approaches on real problems, will show whether that wager is right. For now, the workflow-not-model bet is the defining characteristic of the product and the clearest place where Anthropic’s strategy diverges from the field.

Manifold Bio and end-to-end target nomination

Anthropic named three beta users as evidence that the workbench does real work, and the first is Manifold Bio, a company that designs tissue-targeting medicines — drugs engineered to home to a specific organ or cell type so they act where they are needed and spare the rest of the body. Manifold’s method involves testing how millions of candidate binders, corresponding to hundreds of targets, distribute through a living body at once, which is a genuinely large-scale computational and experimental problem.

Manifold used Claude Science to nominate the targets for its latest experiments. For each tissue and target combination, the workbench assessed surface expression, trafficking, and safety, then ranked candidates against criteria the company had learned from its own internal proprietary data. That last detail is the important one. This was not a generic literature summary. The tool applied Manifold’s private, hard-won selection criteria — the judgment the company had built up across past programs — to a fresh problem, and it did so end to end: gathering the right data, running the right analysis, and applying the right judgment in one pass.

What Manifold said set Claude Science apart from a general coding assistant is precisely that end-to-end quality with context of past programs built in. A coding assistant can help write a script. It cannot, on its own, know which surface markers matter for a given tissue, which trafficking behaviors signal a problem, or how this company weighs safety against potency, because that knowledge lives in the company’s accumulated experience rather than in any public database. The distinction Manifold is drawing is between a tool that helps you code and a tool that carries your domain context through a full analysis — and it maps directly onto the extensibility feature that lets a lab encode its proprietary methods as reusable skills.

The Manifold case is a clean illustration of the workflow bet in practice. The intelligence doing the target assessment is the same Claude model any subscriber can use. The value came from wiring that model to the right data sources, letting it apply the company’s own criteria, and running the whole pipeline in one place. It is exactly the kind of tedious, judgment-laden, multi-step coordination work that Anthropic argues has been the real bottleneck — and exactly the kind of work a general model plus the right environment can plausibly accelerate.

The honest caveat is that a target-nomination pipeline produces hypotheses, not validated drugs. Ranking candidates well is useful and time-saving, but the ranking still has to be confirmed at the bench, and a confidently wrong ranking that sends a team down a dead-end target is a real cost. Manifold’s use of the tool to nominate rather than to decide is the appropriate scope, and it is a reminder that the current generation of these tools accelerates the front of the pipeline — where being fast and roughly right is valuable — rather than replacing the experimental validation that remains the expensive, slow, decisive part of drug discovery.

The Allen Institute review pipeline

The second named case is the most striking on time savings, and it comes from Jérôme Lecoq, a neuroscientist at the Allen Institute. Lecoq used Claude Science to build a multi-agent computational review template — a pipeline of roughly 20 custom skills aimed at writing long-form scientific reviews, the dense synthesis papers that map the state of a field by pulling together thousands of prior studies. Writing one of these by hand is a major undertaking, and Lecoq’s team had experienced how major: before Claude Science, producing such a review could take as long as two years.

The pipeline he built is a good picture of what the workbench enables when a researcher invests in customizing it. Sub-agents read through thousands of papers, each pulling out the central claim and the key quantitative finding and storing them in an evidence-state database. The pipeline then constructs a narrative arc for the review, writes it section by section, and delegates each section to its own specialized sub-agent. Within a section, dedicated agents generate quantitative cross-study figures directly from the evidence database, so the figures are grounded in the extracted data rather than assembled by hand. The whole thing runs on the actor-critic pattern: one agent creates content while a separate reviewer agent evaluates it for accuracy and citation fidelity.

The result Lecoq reported is the kind of number that gets attention. He now has about 10 reviews, many more than 100 pages each, with citations that were checked over by the reviewer agents. A process that took up to two years for a single review produced ten long reviews. Even discounting heavily for the difference between a first draft and a publication-ready paper, that is a change in kind rather than degree, and it is the concrete version of Anthropic’s claim that the tool attacks the tedious intermediate work rather than the flash of insight.

The critic is doing real work in this case, and Lecoq’s team treats it as a component to be refined rather than trusted blindly. The group is working with domain experts to further refine the AI-based critic agents, which is the right instinct. A review paper lives or dies on citation fidelity and the accuracy of its quantitative claims, and those are exactly what the reviewer agent is meant to guard. Refining the critic with domain experts is how you turn a plausible-looking review into a defensible one, and it signals that the team understands the automated check is a starting point, not a guarantee.

This case also quietly demonstrates the extensibility story that the launch materials describe in the abstract. Lecoq did not use Claude Science out of the box. He built roughly 20 custom skills and assembled them into a bespoke pipeline tuned to his team’s way of writing reviews. That is the workbench working as intended — a platform a researcher shapes into their own instrument — and it suggests the ceiling on the tool’s usefulness is set less by the product and more by how much a lab is willing to invest in customizing it. The labs that get the most out of Claude Science will be the ones that treat it as a system to build on, not a chatbot to query. The two-year-to-ten-reviews figure is a customization result, not an out-of-the-box one, and reading it that way keeps expectations honest.

Glioma epidemiology at one-tenth the time

The third case is the one with the clearest independent check on the results. Stephen Francis, an associate professor and epidemiologist at the UCSF Brain Tumor Center, used Claude Science to support studies on the molecular epidemiology of glioma, a primary brain tumor that begins in the glial cells. His lab studies the genetic basis of susceptibility — specifically how thousands of small-effect germline variants combine to shape an individual’s risk — which is statistically demanding work that involves large genomic datasets and careful analysis across multiple methods.

The work predated the tool, which makes the comparison meaningful rather than hypothetical. Francis had a research program already underway, and he applied Claude Science to it. His report is that the app dramatically accelerated the analysis, enabling comprehensive germline workups across multiple approaches in roughly one-tenth the time it previously took. A tenfold speedup on the analytical portion of an ongoing study is a large claim, and it is the kind of number that would reshape how a lab allocates its time if it holds up across projects.

What makes this case more persuasive than a raw speed figure is what the group did next. Francis’s team independently validated Claude Science’s results, confirming that the tool can produce analyses that are both fast and sound. That validation step is the crux. A tool that is fast but unreliable is a liability in epidemiology, where a subtle error in a germline analysis can propagate into a wrong conclusion about disease risk. The value of Francis’s account is not the tenfold number on its own; it is that an experienced epidemiology group checked the fast output against its own standards and found it held.

The germline-variant problem is a good fit for what the workbench does well. It requires pulling and harmonizing large genomic datasets, running established statistical approaches across them, and producing figures that summarize the combined effect of many small signals — a coordination-heavy, code-heavy, judgment-laden task where the analysis is well understood but the plumbing is laborious. That is precisely the territory Anthropic argues has been the real bottleneck, and Francis’s experience is a data point in favor of that argument: the intelligence needed for the analysis was available; the acceleration came from removing the friction around it.

The measured reading is that this is one experienced group, one class of analysis, and a self-reported figure that the group itself validated rather than an external audit. That is stronger evidence than a vendor demo and weaker evidence than a published, peer-reviewed methods comparison, which the field does not yet have. The most useful thing about the UCSF case is the validation, not the speedup — it shows a serious lab treating the tool’s output as something to be checked, and finding that on this problem it survived the check. Whether that generalizes to other analyses and other labs is exactly the question the beta period is meant to answer.

The reviewer agent against the hallucinated-citation problem

The reviewer agent deserves its own treatment because it addresses a problem that has grown into a genuine threat to the scientific literature. As AI-assisted writing has spread through research, so have fabricated citations, references to papers that do not exist, and statistics that cannot be traced to any source. A well-formatted, confident, and wrong document is more dangerous than an obviously sloppy one, because it passes casual inspection. In a field where the citation is the unit of trust, a tool that quietly invents them would corrode the very thing it claims to support.

Anthropic’s design response is to make checking a first-class, always-on part of the system rather than a step a user might remember to run. As a pipeline executes, the reviewer agent inspects the outputs, flags incorrect citations and untraceable numbers, and catches figures that do not match their underlying code, self-correcting as it goes. The three checks are well chosen because they map onto the three most common failure modes of AI-assisted scientific writing: a citation that does not support the claim, a number with no derivation, and a figure that has drifted from the analysis it supposedly depicts.

A compact comparison of the checks the reviewer agent performs

Check performedFailure mode it targetsWhy it matters in research
Citation verificationReferences to nonexistent or unsupported sourcesA fabricated citation invalidates a claim and can survive peer review
Number traceabilityStatistics with no derivable sourceAn untraceable figure cannot be defended or reproduced
Figure-to-code consistencyA figure that no longer matches its analysisA drifted figure silently misrepresents the underlying data

The table above summarizes the reviewer agent’s three explicit checks and the specific integrity problem each one is meant to catch, which together define the standard Anthropic is asking the scientific community to hold the product to.

The actor-critic structure is what makes this more than a spell-check. In the Allen Institute pipeline, the separation is explicit: one agent writes, a separate agent evaluates, and the two operate as a pair. The value of separating the roles is that the critic is not invested in the content it reviews. A single agent asked to both write and check its own work tends to rationalize its output; a dedicated critic with a narrow evaluation task is more likely to catch the error. Borrowing the actor-critic idea from reinforcement learning is a reasonable engineering bet on this being true in practice.

The limits are real and Anthropic does not pretend otherwise by shipping in beta. An automated critic can only check what it can verify, and citation verification depends on the reviewer being able to reach and correctly parse the cited source. Number traceability depends on the derivation being present in the session’s context. A critic can miss a subtle statistical error that is internally consistent but conceptually wrong, because such an error violates no traceability rule. The reviewer agent reduces the rate of the most common and most embarrassing failures; it does not turn an AI-assisted analysis into a peer-reviewed one.

This is why the beta users who take the tool most seriously treat the critic as something to refine rather than trust. Lecoq’s team is working with domain experts to sharpen the critic agents; Francis’s team independently validated the outputs. Both are doing what any careful lab should: using the automated check as a first line of defense and keeping human expert judgment as the last line. The reviewer agent is the proving ground on which the scientific community will judge Claude Science over the coming months, and its real-world catch rate — how many fabricated citations and untraceable numbers it stops versus how many slip through — is the single most important number the beta period will produce.

Pricing, plans, and who can open the app today

The access model is unusually broad for a product aimed at a specialized professional market, and the breadth is itself a strategic statement. Claude Science is available in beta on macOS and Linux to anyone on a Claude Pro, Max, Team, or Enterprise plan. There is no separate license, no per-seat science fee on top of the subscription, and no qualification gate for individual paying users. A graduate student with a personal Pro subscription can open it on the same terms as a pharmaceutical R&D team. That is a deliberate contrast with the gated, enterprise-only rollouts favored by some competitors.

For organizational plans, there is one administrative step. Team and Enterprise users need their admin to enable Claude Science before it becomes available, which gives institutions control over rollout and lets IT and security teams review the tool before staff use it on sensitive work. That control is a feature rather than a friction for regulated environments, where uncontrolled adoption of a new tool that touches proprietary data would be a governance problem.

Anthropic also introduced a plan aimed squarely at the academic and nonprofit world. There is now a Team plan offering discounted seats for active scientific labs at academic institutions and nonprofit research organizations. The pricing detail beyond “discounted” was not spelled out at launch, but the intent is clear: lower the cost barrier for exactly the population — university labs and nonprofit research groups — that has the least budget and, arguably, the most to gain from removing computational friction. It is also the population most likely to publish, which builds the kind of visible, citable track record a new scientific tool needs to earn credibility.

The requirement that the models are the standard Claude models, with no special access, has a pricing consequence worth making explicit. Because Claude Science does not run a new or more expensive model, it can be bundled into existing subscription tiers rather than sold as a premium add-on. The cost a user pays is the cost of their existing plan plus whatever compute they consume — on their own hardware for free, or through a Modal account for on-demand GPUs at Modal’s rates. The workbench itself is included; the compute is what scales the bill, and for a researcher running on a local machine or an institutional cluster, that marginal cost can be low.

The strategic read on the access model connects to everything else. OpenAI gated GPT-Rosalind to qualified US enterprise customers behind a safety and qualification review. Anthropic went the opposite direction, putting Claude Science in front of every paying subscriber on day one. That choice trades some control for reach, and it is bet on land-and-expand adoption — get individual scientists using it on real problems, build a base of published results and word-of-mouth credibility, and convert that into institutional and pharmaceutical contracts. Whether wide beats narrow in a market where reliability and trust matter as much as availability is one of the genuine strategic questions the launch raises, and it will not be settled quickly.

The AI for Science credits program

Alongside the product, Anthropic opened a funding program aimed at academic and exploratory research, and its structure reveals what kind of use the company most wants to encourage. Anthropic will support up to 50 Claude Science AI for Science projects, providing up to $30,000 in credits each. Modal, the on-demand compute partner, will add up to $2,000 in compute for select projects, which covers the GPU costs for work that a lab cannot run on its own hardware. Together the credits and compute are meant to let a research group run a real project on the workbench without the cost being a barrier.

The selection criteria point at breadth and ambition rather than safe, incremental work. Anthropic said it is looking for projects that span domains and explore the boundaries of science, with an early focus on biology and biomedical research. TechCrunch’s account added that the company framed the target population as postdoctoral and graduate projects, which fits the discounted academic Team plan and the broader strategy of seeding adoption among early-career researchers who will carry tool habits through their careers. The emphasis on cross-domain, boundary-pushing work also serves Anthropic’s own learning: novel projects stress the tool in ways routine ones do not, and the company gets to see where it breaks.

The timeline is concrete and short, which suggests Anthropic wants results and feedback quickly. Applications are open through July 15, 2026. Award notifications go out by July 31. The funded projects run from September 1 to December 1, 2026 — a three-month window that will produce a first wave of real-world usage and, presumably, a set of case studies and published results the company can point to. Interested researchers apply through a form linked from the announcement, and the company pointed applicants to an AI for Science community forum for product updates, feedback, and shared learning.

The program is a familiar move for a platform company, and it is effective for reasons that go beyond generosity. Credits lower the barrier to trial; a three-month structured project produces concentrated feedback; funded academic work generates citations and published results that build the tool’s scientific credibility far more durably than any marketing could. It also creates a cohort of researchers with hands-on experience who become advocates, reviewers, and, in the case of graduate students and postdocs, future decision-makers about which tools their labs adopt. A modest credit budget spread across 50 projects buys Anthropic a great deal of real-world validation and goodwill in the exact community whose trust it most needs.

There is a fair critique that the program’s economics are small relative to the stakes. Up to $30,000 in credits plus $2,000 in compute per project is meaningful for a graduate student but trivial next to the pharmaceutical contracts the launch is ultimately chasing. The generosity is aimed at the academic base that builds credibility, not at the pharma customers that build revenue. That is a reasonable allocation, but it is worth naming clearly: the credits program is a credibility-and-adoption engine for the scientific community, and the commercial payoff Anthropic is actually pursuing sits with the deep-pocketed pharmaceutical companies that were on stage at the launch, not with the funded postdocs.

Anthropic’s own drug programs for neglected diseases

One of the more consequential announcements at the event was easy to miss because it was not about the product at all. Anthropic said it will use Claude Science to pursue its own research into drug candidates, with a focus on rare and neglected diseases. This is Anthropic stepping across the line from tool provider to drug researcher, at least in a preliminary way, and it changes the company’s relationship to the industry it is selling into.

The stated rationale is twofold. The humanitarian half is straightforward: neglected diseases are, by definition, the ones the traditional pharmaceutical market underserves because the patient populations are small or poor, and an AI tool that lowers the cost of early discovery could make some of those programs viable. Anthropic’s head of life sciences, Eric Kauderer-Abrams, said the company will target areas outside the scope of what the traditional pharma and biotech landscape might consider attractive targets — blue-ocean territory that the industry’s economics normally ignore. The practical half is that running its own programs gives Anthropic a clearer view of how Claude Science works on real drug-discovery problems, feedback it cannot fully get from watching customers.

The strategic tension in this move is obvious and worth stating directly. Anthropic is selling Claude Science to pharmaceutical companies while simultaneously using it to pursue drug candidates of its own. For now, the company’s targets are the neglected diseases the industry has passed over, which keeps it out of direct competition with its customers’ commercial priorities. But the capability it is building — the ability to run its own preclinical discovery — is the same capability its customers pay for, and the boundary between complementary and competitive could blur if Anthropic’s programs ever moved toward commercially attractive targets. The neglected-disease framing is both genuinely humanitarian and strategically convenient, because it lets the company build drug-discovery muscle without threatening the pharma relationships that matter to its IPO.

Amodei was careful about expectations here in a way that reads as credible rather than promotional. Over the next year he said he hopes to see some success in using AI to come up with new targets for drug discovery — a deliberately modest bar. New target identification is the early, cheap, hypothesis-generating end of the pipeline, not the expensive clinical validation that actually produces approved drugs. Setting the near-term goal at “some success” on targets, rather than a drug in the clinic, is the right calibration for a technology that has repeatedly disappointed when oversold, and it stands in contrast to the industry’s history of grand AI-for-drug-discovery promises.

The move also has to be read against the medium-term ambition Anthropic has stated elsewhere: expanding the platform into clinical research, regulatory documentation, and eventually AI-powered laboratory robotics. Running its own programs is a way to develop and pressure-test those later-stage capabilities on real work before selling them. It is a slower, more grounded strategy than promising a cure, and it fits the overall posture of the launch — big long-term ambition, deliberately modest near-term claims, and a product shipped in beta so the reality can be checked. Whether Anthropic’s own programs produce a viable drug-discovery track record within a few years is one of the clearest tests of whether the compression Amodei has predicted is real or rhetorical.

The hires that bought Anthropic scientific credibility

A product launch is a claim, and in science, claims need credentials. Anthropic spent the first half of 2026 assembling exactly the kind of credibility that a language-model benchmark cannot buy, and the most visible piece landed eleven days before the launch. On June 19, 2026, John Jumper — who shared the 2024 Nobel Prize in Chemistry for creating AlphaFold — announced on X that he was leaving Google DeepMind after nearly nine years to join Anthropic. He said he would take time to recharge before starting, and neither he nor the company disclosed his role. The timing, days before a flagship AI-for-science event, was not accidental.

Jumper is not a symbolic hire. AlphaFold solved a problem that had been open in structural biology for over fifty years — predicting a protein’s three-dimensional structure from its amino-acid sequence — and its released database of predicted structures has been used by more than two million scientists across 190 countries, accelerating work on malaria vaccines, cancer treatments, and drug-resistant bacteria. If Amodei’s thesis is that AI can compress decades of biological progress into years, Jumper is the person who has most concretely demonstrated that such compression is possible. Hiring him narrows the perceived credibility gap between Anthropic and every rival for life-science work in a way no product feature could.

Jumper was also not the first decorated arrival. In May 2026, Andrej Karpathy — an OpenAI founding member and one of the most influential AI researchers of the past decade — joined Anthropic’s pre-training team. The two hires, read alongside the founding partnerships with the Allen Institute and Howard Hughes Medical Institute announced in February and the Coefficient Bio acquisition in April, form a deliberate sequence: partnerships for reach, an acquisition for domain depth, and marquee hires for credibility, all culminating in the product launch. The pattern is what a company builds when it intends to lead a field rather than dabble in it.

The talent flow tells a structural story that goes beyond any single hire. According to SignalFire’s 2025 State of Talent Report, engineers at Google DeepMind were nearly 11 times more likely to leave for Anthropic than the reverse, and Anthropic led two-year retention at 80 percent. Jumper’s departure came days after Noam Shazeer, a Google vice president and co-lead of its Gemini models, left for OpenAI, and Alphabet’s stock reportedly fell around 7 percent after the pair of star departures. The proof of concept for AI-for-science that DeepMind generated with AlphaFold is now, awkwardly, the primary credential its most important departing hire brings to a competitor.

The governance layer added credibility of a different kind. Vas Narasimhan, chief executive of Novartis, joined Anthropic’s board in April 2026, selected by the independent Long-Term Benefit Trust. At the launch event he was pointed rather than promotional, saying the industry has made a lot of bold proclamations and now needs to actually show results for patients. Having a sitting big-pharma CEO on the board — and on stage tempering the hype — signals that Anthropic’s life-sciences push is wired into the industry it serves at the highest level. The honest caveat on all of this is that credibility is not capability: a Nobel laureate on the payroll and a pharma CEO on the board make the company’s ambitions believable, but the deployable product is still the beta workbench and the multi-agent system, not the reputations attached to it. The hires make the bet credible; they do not make it won.

Where GPT-Rosalind fits and how it differs

Anthropic is not entering an empty field, and the sharpest contrast is with OpenAI’s answer to life sciences. On April 16, 2026, OpenAI announced GPT-Rosalind, named for Rosalind Franklin, whose X-ray diffraction work was central to discovering the structure of DNA. Rosalind is a frontier reasoning model purpose-built for life sciences — biochemistry, genomics, protein engineering — rather than a general conversationalist. It is, in the most direct sense, the opposite of Anthropic’s approach: OpenAI built a specialized brain, where Anthropic built a bench around existing ones.

Rosalind’s credentials are model benchmarks. On BixBench, a test that hands an agent an empty Jupyter notebook, raw data files, and freedom to plan its own bioinformatics analysis, GPT-Rosalind reportedly hit 0.751 pass@1, ahead of every frontier model tested. A June 3, 2026 update folded in GPT-5.5’s agentic coding and tool use alongside deeper medicinal-chemistry and genomics reasoning, with OpenAI reporting gains over GPT-5.5 on new in-house benchmarks while using fewer tokens. The framing throughout is capability-first: Rosalind is meant to be a better reasoner about biology, and the benchmarks are the argument.

The distribution strategy is where the two companies diverge most visibly, and it matters as much as the technical approach. Rosalind launched as a research preview limited to qualified enterprise customers in the United States, gated behind a qualification and safety review, with early access for partners including Amgen, Moderna, the Allen Institute, Thermo Fisher, and Novo Nordisk. That is narrow and controlled — a small number of vetted, well-resourced organizations, chosen partly because a more capable biology model raises the dual-use stakes and warrants tighter access. OpenAI paired it with a biodefense effort, Rosalind Biodefense, detailed in late May, which underscores that the company treats a specialized biology model as something requiring extra safeguards.

The overlap in customers is telling. Novo Nordisk and the Allen Institute appear on both companies’ rosters — Novo Nordisk as an early Rosalind partner and a longtime Claude for Life Sciences customer, the Allen Institute as a Rosalind partner and the home of the Lecoq review pipeline built on Claude Science. That is a clear signal that large research organizations are not choosing one AI vendor; they are working with several at once, comparing them on real problems. For the labs themselves that is rational hedging. For the vendors it means the fight is not for exclusive relationships but for share of a research team’s daily work.

The clean way to state the difference is that OpenAI is betting the field needs a smarter, specialized reasoner and is willing to gate access to it tightly, while Anthropic is betting the field needs a better-integrated workbench around already-capable models and is willing to offer it wide. Rosalind’s BixBench score is evidence for the model-capability thesis; Claude Science’s connectors, provenance, and compute orchestration are evidence for the workflow thesis. Neither has been settled by a published, head-to-head comparison on real research yet, and the most useful thing a researcher can do this year is treat both as hypotheses and test them on their own problems rather than trusting either company’s framing.

Google DeepMind’s owned-model advantage

The third player is the one with the deepest scientific pedigree and a genuinely different asset. Google DeepMind does not merely call scientific models as tools — it owns foundational ones. AlphaFold, the protein-structure predictor that won Jumper and Demis Hassabis a Nobel Prize, is DeepMind’s. So is AlphaGenome, a model for genomics. These are not integrations bought from a partner; they are proprietary models that DeepMind built and controls, which Anthropic and OpenAI can, at best, reach as external tools rather than own. In a field where the foundational models are the crown jewels, that ownership is a structural advantage nobody else has.

DeepMind wraps those models in Gemini for Science, a platform that bundles AlphaFold, AlphaGenome, and more than 30 life-science databases into a single skill set for researchers. It also runs Co-Scientist, a system made available to individual researchers through Gemini for Science, which has been credited with helping identify new targets for liver fibrosis and surfacing fresh approaches to ALS. And through Isomorphic Labs, the drug-discovery company spun out of DeepMind and built on AlphaFold, the group has a commercial vehicle that raised roughly $2.1 billion in a Series B in May 2026 and holds reported deals worth around $3 billion with Eli Lilly and Novartis to co-design drug candidates. DeepMind is simultaneously the deepest researcher, a platform provider, and a drug-discovery company.

The comparison exposes what each company is really selling. DeepMind sells proprietary scientific capability — models nobody else has, wrapped in a platform and monetized through a drug-discovery spinout. Anthropic sells integration and workflow — a bench that connects existing models, public databases, partner libraries, and a lab’s own tools, offered wide and cheap. These are not the same product competing on price; they are different theories of where the value in AI-for-science actually sits. If the value is in the foundational models, DeepMind is best positioned. If the value is in removing the coordination friction around models that are already good enough, Anthropic is.

There is a distribution contrast here too, and it completes the three-way picture. DeepMind leans on owned, proprietary models that others can only call into. Anthropic goes wide with broad subscription access. OpenAI goes narrow and enterprise-gated. Three of the most capable AI organizations on earth are pursuing three genuinely different strategies for the same scientific market, which is unusual — more often, rivals converge on a similar approach. The divergence here reflects real disagreement about what scientists need and where the durable advantage lies.

The irony that hangs over the comparison is the Jumper hire. The person who built DeepMind’s most celebrated scientific achievement now works for Anthropic, a company whose strategy explicitly does not rely on owning foundational models like the one he built. Whether Jumper’s arrival signals that Anthropic intends to move toward owned scientific models over time, or simply that it wanted the credibility and judgment of the field’s most decorated practitioner, is not something the company has said. But it complicates the clean story that Anthropic is purely a workflow company and DeepMind purely a model company. The strategies are distinct today, but the talent is flowing in a direction that could blur them tomorrow.

Three bets on the same set of customers

Stepping back from the individual companies, the AI-for-science market as of mid-2026 is defined by three coherent and incompatible strategies, and it is worth laying them side by side because the choice between them is the choice a research organization is actually being asked to make.

How the three leading AI labs are approaching scientific research

CompanyCore betDistributionSignature asset
AnthropicWorkflow and integration around existing modelsWide, all paid subscribers, beta on macOS and LinuxClaude Science workbench, 60+ connectors, reviewer agent
OpenAIA specialized, domain-tuned reasoning modelNarrow, qualified US enterprise customers, gatedGPT-Rosalind, strong bioinformatics benchmarks
Google DeepMindOwned foundational scientific modelsPlatform plus a drug-discovery spinoutAlphaFold, AlphaGenome, Isomorphic Labs

The table above sets the three approaches against each other on the dimensions that most shape a research organization’s decision — what each company is betting the field needs, how it lets customers reach the capability, and the asset that anchors its position.

The practical consequence of this divergence is that the three companies are, for now, hard to compare directly. A benchmark that flatters Rosalind’s reasoning says little about whether Claude Science removes more friction from a real workflow, and neither speaks to whether AlphaFold’s owned models produce better structures than the partner models Claude Science calls. The dimensions of comparison are different because the products are answers to different questions. That is genuinely confusing for a research leader trying to choose, and it is why the sensible near-term posture is to run pilots on all three rather than pick from marketing.

There is a real risk of fragmentation that cuts against all three vendors. Pharmaceutical and academic research leaders do not want to manage a dozen gated AI relationships, each with its own access rules, data terms, and interface. The early signal — the same organizations appearing on multiple vendors’ rosters — suggests labs are hedging rather than committing, which is rational but unsustainable at scale. Over time, research organizations will likely consolidate toward whichever approach proves most reliable and least burdensome, and the vendor that wins will be the one that reduces total friction, not the one with the best single benchmark.

The deeper point is that this is a bet on where AI value accrues in specialized domains. The leading labs would clearly rather own the high-value domain stacks — science, law, finance, medicine — than cede them to vertical-only startups, which is why all three are pushing into science at once rather than leaving it to specialists. Anthropic’s Claude Science, OpenAI’s expected sequence of domain models, and DeepMind’s steady stream of science models are three moves in the same larger game: capturing the operating layer of entire professions. Science is simply the arena where the contest is currently most visible, and the outcome there will shape how the same fight plays out in every other high-value field.

The pharma stakes and the size of the prize

To understand why three of the world’s most valuable AI companies are fighting over scientists, follow the money in drug development. Bringing a new medicine to market typically takes more than ten years and costs billions of dollars, and a large share of that time and money is spent in the early stages — identifying and testing potential drug candidates long before anything reaches a clinical trial. Most candidates fail. The economics of the industry are dominated by the cost of those failures and the length of the path, which means even a modest acceleration or a modest improvement in early candidate selection is worth an enormous amount.

Claude Science is aimed directly at that early, expensive, uncertain stage. The tasks Anthropic highlights — analyzing molecular and biological data, comparing research findings, identifying promising compounds, prioritizing experiments — are exactly the front-of-pipeline work where being faster and better at triage compounds into large savings downstream. If a tool helps a team spend its expensive wet-lab capacity on better-chosen targets, or reach a go/no-go decision months sooner, the value is not measured in the tool’s subscription cost. It is measured against the multi-year, multi-billion-dollar cost of the programs it shapes. That asymmetry is why pharmaceutical budgets, not academic ones, are the real prize.

The pharmaceutical companies on stage at the launch made the demand side concrete. Novartis, Bristol Myers Squibb, Novo Nordisk, and Genentech are not curious bystanders; they are potential customers with R&D budgets in the billions and a direct financial interest in compressing discovery timelines. The presence of GLP-1 pioneer Lotte Knudsen — the inventor behind the class of drugs that reshaped the obesity and diabetes markets — signaled that the conversation was about serious therapeutics, not demos. When Narasimhan tempered the hype by insisting the industry needs to show real results for patients, he was speaking as someone whose company would pay for the tool if it delivers and walk if it does not.

There is a strategic reason Anthropic and its rivals emphasize drug discovery so heavily, and it is worth being clear-eyed about it. The observation that general models mostly help people write better emails, while life sciences and healthcare is where AI can actually make money and affect human life at scale, has been made bluntly by industry figures, and it captures the commercial logic. Drug discovery is a domain where the value of AI is legible, the customers are wealthy, and the humanitarian framing is genuine. For an AI company that needs to justify an enormous valuation with real revenue, pharma is close to an ideal target market — deep pockets, clear use cases, and a story about curing disease.

The counterweight is the field’s history of disappointment, which every serious participant knows. No AI-designed small-molecule drug had won approval by the end of 2025, and grand promises about AI transforming drug discovery have been made and missed before. Prior corporate bets on healthcare AI, from Google’s Verily to IBM’s Watson Health, are cautionary tales about the gap between a compelling demo and a delivered therapy. The prize is real and enormous, but so is the graveyard of companies that chased it and fell short. Anthropic’s deliberately modest near-term goal — some success on new targets within a year — reads as a company that has studied that history and is trying not to repeat it.

Market tremors in drug-discovery stocks

The clearest evidence that investors take Claude Science seriously came from the stock market’s reaction on launch day. Shares of several publicly traded AI-drug-discovery and research-services companies fell as the news landed. Schrödinger dropped as much as 8.3 percent intraday, Recursion Pharmaceuticals fell about 3.3 percent, and IQVIA declined more than 2.3 percent at points during the session. Some of those losses were pared back later, but the direction of the move was unambiguous: the market read a broadly available AI research workbench from a company of Anthropic’s scale as a threat to incumbents.

The logic behind the sell-off is a question about business models rather than any single product feature. Companies like Schrödinger and Recursion have built their value on proprietary computational platforms for drug discovery — the very capability that a general-purpose, widely available workbench threatens to commoditize. If a pharmaceutical team can assemble much of that capability inside Claude Science, using public databases, partner models, and its own data, the differentiated platform that a specialist company sells looks less unique. Contract research organizations and research-services firms like IQVIA face a related worry: if AI automates more of the analytical work they are paid to perform, the demand for their services could shrink.

The fear is real but almost certainly overstated in the short term, and it is worth separating the signal from the reflex. Claude Science is a beta tool that accelerates computational analysis; it does not run wet-lab experiments, it does not replace the deep domain platforms specialist companies have spent years building, and its reliability at publication and regulatory standards is unproven. The gap between “a scientist can do more analysis faster” and “the specialist drug-discovery company is obsolete” is enormous, and it is filled with exactly the experimental validation, regulatory expertise, and accumulated data that incumbents own. The market’s move reflects a directional concern about the long run more than a considered judgment about the next year.

There is also a plausible reading in which the incumbents benefit rather than suffer. A tool that lowers the cost of computational research could expand the total amount of drug-discovery work being done, and specialist platforms that integrate well with tools like Claude Science — offering the validated models, proprietary datasets, and wet-lab capabilities the workbench cannot replicate — could find themselves more valuable as complements rather than casualties. The companies most at risk are those whose entire value proposition is the general computational capability a workbench now provides; the companies best positioned are those with defensible assets a workbench must call out to. The sell-off treated the whole category as threatened, when the real effect is a sorting between commoditized capability and defensible assets.

The broader pattern is one the market has seen repeatedly in 2026, and it is becoming a recognizable reflex. When Anthropic introduced a tool for automating certain legal work earlier in the year, it helped trigger a sharp sell-off in legal-services stocks. The launch of Claude Science produced the same shape in drug-discovery names. Each time, a major AI company enters a professional vertical and the incumbents in that vertical lose value on the fear of disruption. The reaction is now almost automatic, which means it carries less information than it appears to. Whether the fear is justified in any given case depends on details the initial sell-off does not pause to examine — and in drug discovery, those details still favor the incumbents more than the launch-day panic suggested.

Biosecurity and the dual-use question

No serious discussion of an AI tool for biology can avoid the dual-use problem, and Claude Science sits closer to it than almost any product Anthropic has shipped. The same capabilities that help a legitimate researcher analyze molecular data, query genomic databases, and reason about biological mechanisms could, in principle, aid a bad actor trying to engineer a pathogen. Frontier AI models carry structural biological risks, and a workbench that wires those models to specialized biological databases and predictive models is exactly the kind of tool that concentrates the concern. Anthropic knows this, which is part of why the company has built its safety framework the way it has.

Anthropic’s governing framework is its Responsible Scaling Policy, first published in September 2023 and revised several times since. The RSP defines AI Safety Levels that tie a model’s evaluated capabilities to the safeguards required before it ships. In May 2025, Anthropic activated ASL-3 safeguards — the tier aimed at chemical and biological weapons risk from actors with modest resources and expertise — for its most capable models, after evaluations of improving CBRN-relevant capabilities. The ASL-3 deployment standard relies on input and output classifiers designed to block content of concern, a defense-in-depth approach with multiple layers meant to catch misuse that slips past earlier barriers.

The relevant point for Claude Science is that it runs on models that have already been through those evaluations and carry those safeguards. Because the workbench uses standard Claude models rather than a new, more biologically capable one, it does not introduce a fresh frontier of biological capability that would demand its own risk review. This is one of the underappreciated safety advantages of the workflow-not-model strategy: a tool built on already-cleared models inherits their safeguards, whereas a new domain-tuned biology model — the path OpenAI took with Rosalind — raises the capability ceiling and, with it, the stakes of any failure in the safeguards. OpenAI’s decision to gate Rosalind tightly and pair it with a biodefense effort reflects that higher-stakes calculus.

The RSP itself has become more contested, and honesty requires naming the criticism rather than only the framework. In February 2026, Anthropic released version 3.0 of the policy, which critics argued weakened its most important commitment: the earlier categorical promise to pause or restrict deployment when a model’s capabilities outran the safety measures that could contain them. The revised policy replaced that categorical pause with a conditional promise to delay development only if Anthropic judges itself the industry leader and considers catastrophic risk significant, citing competitive pressure and the absence of binding federal regulation. Later updates refined the chemical-biological weapons threshold further. Observers in the biosecurity community read the change as loosening the enforcement mechanism that had made the ASL thresholds binding, even as the level definitions stayed in place.

That tension — between shipping a powerful biology tool wide and cheap, and holding a safety line that the company has itself relaxed — is the central unresolved question hanging over the launch. Anthropic’s defenders point to the data-locality design, the reliance on already-cleared models, the classifier-based content blocking, and the reviewer agent as concrete safeguards. Critics point to the wide distribution, the weakened pause commitment, and the general difficulty of screening biological misuse in an agentic tool that can query databases and reason about mechanisms. Both are describing the same product. The safety story is neither reassuringly airtight nor obviously reckless; it is a set of real safeguards layered on a genuine risk, shipped by a company that has recently made its own rules more permissive.

The industry-level response Anthropic has proposed is worth noting because it acknowledges that no single company can solve this. Around the launch window, the company has proposed an industry-wide framework for scoring jailbreak severity, developed with other major AI companies, and Narasimhan used the stage to call for appropriate AI regulation before a crisis forces it — saying it would be a shame if a crisis is what pushes the industry to act. That is the right instinct: the dual-use risk of AI-for-biology is a collective problem that voluntary company policies and reactive regulation both handle poorly. Whether the industry and regulators can build a durable framework before an incident tests one is, at this point, an open and uncomfortable question.

The regulatory picture around a working lab tool

Beyond the biosecurity framework, Claude Science lands in a messy and consequential regulatory environment, and the mess is partly about Anthropic’s own fraught relationship with the US government. The company has had a public falling-out with the Trump administration. In March 2026, the Department of Health and Human Services told employees they could no longer use Anthropic’s Claude tool, part of a broader effort by the administration to blacklist the company from federal government use — a dispute reported to stem in part from Anthropic’s refusal to allow Claude to be used in certain military applications. That friction matters for a science tool because it complicates adoption at federally funded institutions and agencies.

The complication is sharpened by how deeply Claude is already embedded in the federal scientific apparatus. Anthropic’s technology underpins the FDA’s Elsa tool for drug reviews, which means the same company facing a government blacklisting effort also provides the AI behind a core regulatory function. A tool used to review drugs and a company the administration has tried to bar from government use are, in this case, the same company. How that contradiction resolves — whether the blacklisting effort expands, narrows, or is reversed — will shape whether federally funded labs and agencies can freely adopt Claude Science.

For the researchers who will actually use the workbench, the more immediate regulatory questions are about compliance in their own domains. A tool that touches clinical or patient genomic data has to satisfy HIPAA in the United States and equivalent regimes elsewhere, and it has to fit into the validated, auditable workflows that regulated research demands. Anthropic’s design answers part of this directly: because Claude Science runs on the lab’s own infrastructure and keeps sensitive data local, the highest-risk data does not leave the systems already cleared to hold it. The reproducibility and audit-trail features also map onto the documentation that regulated research requires. But institutional review, data-governance sign-off, and validation for regulated use remain the adopting organization’s responsibility, and the beta status of the tool means those reviews are happening in real time.

Narasimhan’s call for regulation was aimed at a genuine gap. There is no comprehensive, binding framework governing the use of frontier AI in scientific research and drug development, and the existing instruments — the EU AI Act, sector-specific rules, and voluntary company policies like the RSP — operate on different terms and leave real seams. The absence of clear rules cuts both ways: it lets tools like Claude Science reach researchers quickly, and it leaves both the companies and their customers exposed to the possibility that rules written after a problem could be stricter and more disruptive than rules written before one. For a research leader, the regulatory uncertainty is a reason for caution in high-stakes, regulated applications, and a smaller concern for exploratory computational work that does not touch protected data or feed regulatory submissions.

The pragmatic reading is that Claude Science is safest to adopt, from a regulatory standpoint, exactly where it is also most clearly useful in the near term: internal, exploratory, computational research that accelerates analysis without yet feeding directly into regulated decisions. As the tool moves toward the clinical, regulatory, and eventually laboratory-robotics applications Anthropic has signaled, the regulatory burden rises sharply, and the current uncertainty becomes a much larger factor. The company’s own staged ambition — start with research acceleration, expand toward regulated stages later — tracks the regulatory risk gradient, whether by design or necessity.

Data handling and where sensitive datasets stay

For any lab working with proprietary, clinical, or patient data, the first question about a new AI tool is not what it can do but where the data goes. Claude Science’s answer is the feature that makes it usable in serious settings at all: it runs on the lab’s own infrastructure, and sensitive datasets never have to leave the systems they already live on. The workbench operates locally on macOS or Linux, or on a remote machine over SSH, or on an HPC login node, which means the data stays put and only the specific context each step of an analysis needs is sent to Claude.

This is a meaningfully different posture from a cloud service that ingests your data to process it. A pharmaceutical company’s proprietary compound library, a hospital’s patient genomic data, or an unpublished dataset that represents years of work are all things an organization is often legally or contractually forbidden to move off-premises, and certainly reluctant to hand to a third-party AI vendor. By keeping the heavy, sensitive data on local infrastructure and transmitting only the minimal context per step, Claude Science is designed to fit inside those constraints rather than ask organizations to relax them. Data locality is not a privacy feature bolted on; it is the architectural precondition for the tool being adoptable in regulated research at all.

The permission model reinforces the data-governance story. Because Claude Science asks before reaching new resources and lets a researcher review or revoke any decision before a job runs, an organization retains control over what the tool touches and when. Combined with the Team and Enterprise requirement that an admin enable the product before staff can use it, this gives institutions the control points they need — admin-level enablement, per-action permission, and local data residency — to bring the tool through a security and data-governance review rather than having it appear uncontrolled on researchers’ machines.

The honest limits are worth stating precisely, because “data stays local” is a claim that deserves scrutiny rather than trust. Only the sensitive data stays local by default; the context needed for each analysis step is still sent to Claude, which means an organization has to understand and be comfortable with what that per-step context contains for their particular workflow. The provenance features that record the full message history also mean a detailed record of the analysis exists, which is good for reproducibility and something a data-governance review will want to examine for what it captures. And as with any new tool, the security of the local deployment, the SSH connections, and the Modal integration are things an organization’s security team should assess rather than assume.

The comparison to rivals sharpens the point. Rosalind’s gated, enterprise-controlled access is one way to manage data sensitivity — tight control over who can use the model at all. Claude Science’s approach is different: wide access, but with the sensitive data kept on the customer’s own infrastructure. For an organization whose main concern is data residency and control over its proprietary datasets, the local-execution model is a strong fit. For one whose main concern is limiting who inside the organization can use a powerful biology tool, the admin-enablement and permission controls do part of the job, but the wide availability to any individual subscriber is a factor to weigh. The data architecture is one of Claude Science’s genuine strengths, and it is the feature most likely to get the tool past a serious institution’s security review.

Limits the launch does not try to hide

A fair assessment has to give as much weight to the constraints as to the capabilities, and Anthropic’s own framing — shipping in beta, calling it not a new model, setting modest near-term goals — invites that scrutiny rather than deflecting it. The most basic limit is maturity. Claude Science is a beta product, released early so that scientists use it on real problems and report back. Beta is not a formality here; it means the reliability, the connector coverage, the reviewer agent’s catch rate, and the compute orchestration are all still being refined based on real-world use, and early adopters are, in effect, participating in that refinement.

The platform constraints are concrete. Claude Science runs on macOS and Linux only at launch, which leaves Windows-based researchers out for now — a real gap in some fields and institutions where Windows is standard. The reliance on the lab’s own infrastructure, while a strength for data control, also means a researcher without access to suitable local hardware or a cluster depends on the Modal integration for anything compute-heavy, with the associated cost. The tool meets scientists where many of them already work, but not all of them.

The deepest limit is the one the whole product is built to manage but cannot fully solve: reliability at the standard science demands. An agent that orchestrates computing, queries genomic data, and drafts manuscripts has to be right at a level where a single wrong number can waste months of work or, in a drug program, millions of dollars. The reviewer agent is the answer to this, but an automated critic catches the failures it can verify and misses the ones it cannot — a subtle statistical error that is internally consistent, a conceptual mistake that violates no traceability rule, a citation that exists and is real but does not actually support the claim. The beta users who take the tool most seriously validate its outputs independently, which is precisely the discipline the tool’s limits require.

There is also a scope limit that the excitement around drug discovery can obscure. Claude Science accelerates computational and analytical work. It does not run wet-lab experiments, it does not validate a hypothesis at the bench, and it does not replace the experimental and clinical work that turns a promising analysis into a real result. Its named successes — target nomination, review writing, germline analysis — are all front-of-pipeline, analysis-heavy tasks. That is genuinely valuable, but it is a narrower claim than “AI is doing science,” and conflating the two overstates what the current tool does.

Finally, there is the interpretive risk that the tool’s polish can mask. Native rendering, plain-language editing, and clean provenance make outputs look authoritative, and a beautifully rendered figure from a low-confidence prediction is still low-confidence. The tool preserves the information a researcher needs to check — the code, the environment, the confidence estimates, the message history — but it cannot force anyone to read it. The largest risk with a tool this smooth is not that it fails loudly but that it succeeds convincingly on work that was subtly wrong, and guarding against that remains a human responsibility that no reviewer agent fully discharges. The measured conclusion is that Claude Science is a serious, useful tool with real limits, best treated as a powerful assistant whose work is checked rather than an oracle whose output is trusted.

What it changes for an academic lab

For an academic research group, the appeal of Claude Science is specific and, in the near term, probably larger than for a well-resourced pharmaceutical company. Academic labs are chronically short on two things the tool addresses directly: software-engineering capacity and time. A biology or chemistry PhD student is often expected to write and debug their own analysis pipelines despite having no formal training in software engineering, and the days lost to that work are days not spent on the science that advances a dissertation. A tool that lets a scientist describe an analysis in plain language and have the pipeline built, run, and documented removes a tax that falls hardest on exactly this group.

The economics have been arranged to fit academic budgets, which is not incidental. The discounted Team plan for active academic and nonprofit labs, the AI for Science credits program offering up to $30,000 in credits and up to $2,000 in Modal compute for 50 projects, and the availability of the workbench to any individual Pro subscriber all lower the cost barrier for a population that has little to spare. A graduate student can run a funded three-month project, or a lab can equip its members with discounted seats, without the capital outlay that a proprietary computational platform would require. For a field where the alternative is often unfunded evenings spent wrestling with cluster job scripts, that is a real change.

The Lecoq case at the Allen Institute is the clearest picture of the upside for research-heavy academic work. A review that took up to two years became ten reviews, because a researcher invested in building a customized multi-agent pipeline. The lesson for academic groups is that the largest gains come from treating the tool as a platform to build on — encoding a lab’s methods as reusable skills, assembling specialist agents for recurring tasks — rather than as a chatbot to query occasionally. The labs that invest in customization will pull far ahead of those that use it casually, which creates both an opportunity and a new form of divide.

The Francis case points at the discipline academic use requires. His group got a tenfold speedup and then independently validated the results before trusting them. That validation step is the academic norm the tool should slot into, not replace. Peer review, replication, and skeptical scrutiny of methods are how science guards against error, and an AI tool that accelerates analysis has to be held to those same standards. Academic culture, at its best, is well suited to using such a tool responsibly, precisely because checking each other’s work is what the culture is built to do.

There are real concerns specific to academia that the tool does not resolve. Training the next generation of scientists matters, and there is a legitimate worry that students who lean on an agent to build their pipelines may not develop the underlying computational skills they will need to judge when the agent is wrong. There is a reproducibility upside — the built-in provenance is better than the undocumented notebooks that plague the field — but there is also a risk of a two-tier system where labs that can afford heavy compute and customization outpace those that cannot. For academic science, Claude Science is a genuine accelerant and a genuine pedagogical and equity question at the same time, and how departments handle the training and access dimensions will matter as much as the tool itself.

What it changes for biotech and pharma research

For biotech and pharmaceutical R&D, the calculation is different, because these organizations have the money, the compute, and often the software talent that academic labs lack. What they are buying is not primarily capacity — it is speed and integration. The value proposition is compressing the time from question to answer in the early, expensive stages of discovery, and doing it inside workflows that touch proprietary data and validated internal tools. The Manifold Bio case is the template: a company using the workbench to nominate targets end to end, applying its own proprietary selection criteria, in a single pipeline that a general coding assistant could not match.

The extensibility features are what make Claude Science plausible for these organizations rather than a toy. A biotech’s competitive advantage lives in its proprietary data, its validated models, and its accumulated judgment about what works, and a tool that could not incorporate those would be useless. Claude Science’s ability to save proprietary pipelines as reusable skills and connect to a lab’s preferred internal tools, with sessions inheriting them automatically, means a company can bring its private methods into the workbench rather than leaving them outside it. Combined with local data residency, this lets a pharmaceutical organization use the tool on sensitive programs without exposing the crown jewels.

The reproducibility and audit features carry particular weight in a regulated commercial setting. Drug development generates documentation that must withstand regulatory scrutiny and internal review, and a tool that attaches an auditable history — code, environment, plain-language description, message history — to every artifact fits the documentation burden of the industry. The reviewer agent’s citation and calculation checks matter more here than almost anywhere, because in a drug program a wrong number is not an academic embarrassment but a financial and sometimes safety consequence. For pharma, the provenance and checking features are not conveniences; they are the difference between a tool that can touch a regulated program and one that cannot.

The competitive reality is that these organizations are not committing to one vendor. Novo Nordisk works with both Anthropic and OpenAI; the Allen Institute appears on multiple rosters; large pharma companies are running pilots across tools and comparing them on their own problems. That is rational, and it means Anthropic is competing for share of a research team’s daily work rather than for exclusive contracts. The winner in any given organization will be the tool that proves most reliable and least burdensome on the problems that organization actually faces, and that is being decided in pilots right now, not in launch announcements.

The strategic wrinkle for pharma leaders is Anthropic’s decision to pursue its own drug programs. A vendor that is also, in some domains, a competitor is a more complicated partner than a pure tool provider. For now the neglected-disease focus keeps Anthropic away from commercial priorities, but a cautious pharma R&D leader will watch where the company’s own programs go and will weigh how much proprietary context to encode in a tool built by a company with its own discovery ambitions. The right posture for biotech and pharma is to adopt Claude Science where it clearly accelerates internal computational work, to pilot it against rivals on real problems, and to keep the most sensitive strategic context governed carefully — using the tool’s genuine strengths in provenance, integration, and data locality while staying clear-eyed about the vendor’s dual role.

Adjacent fields that could be next

Claude Science launched with a heavy focus on biology and biomedical research, but the architecture is not intrinsically biological. A coordinating agent, specialist sub-agents, a reviewer, connectors to databases, compute orchestration, and native rendering of domain artifacts are all general patterns. The biology focus reflects where Anthropic sees the clearest near-term value and the most eager, well-funded customers, not a limit on where the workbench could go. The credits program’s stated interest in projects that span domains and explore the boundaries of science is a hint that the company wants to learn where else the pattern applies.

The most natural adjacent fields are the other data-and-compute-heavy sciences. Chemistry and materials science share the shape of biology’s problems: large specialized databases, predictive models, structures to render, and analysis pipelines to build and run. Cheminformatics is already among the launch domains, and materials discovery — searching for compounds with desired properties — maps closely onto the drug-discovery workflow. Physics, climate science, and astronomy are similarly compute-heavy and involve large datasets, simulation, and visualization, all of which the workbench pattern could support with the right connectors and rendering.

Within biology and medicine, Anthropic has signaled a staged expansion along the drug-development pipeline itself. The company has indicated plans to move the platform into clinical research, regulatory documentation, and eventually AI-powered laboratory robotics. Each step raises the stakes and the regulatory burden — clinical research touches patients, regulatory documentation feeds legal submissions, and laboratory robotics moves AI from advising on experiments to physically running them. That last step, connecting an agent to instruments that manipulate physical samples, is where the vision of an autonomous research loop becomes concrete and where the safety and reliability questions become sharpest.

The pattern Anthropic is following is bigger than science, and the science launch is best understood as one front in it. The company has pushed into legal work with tools for contract review and legal briefings, and it has stated ambitions across other professional verticals. The strategy is to own the operating layer of high-value professions — the daily working environment where the expensive, credentialed work of a field actually happens. Claude Code did this for software; Claude Science is the attempt in research; legal and other domains are following. Science is the most visible current front because the contest with OpenAI and DeepMind is most direct there, but it is a template the company intends to reuse.

The realistic caveat is that expansion is a plan, not an achievement, and each new domain brings its own connectors to build, its own reliability bar to clear, and its own regulatory and safety questions to answer. A workbench that works well for computational biology does not automatically work for clinical trials or materials discovery; the general architecture has to be specialized for each field’s tools and standards, and the reviewer agent has to be taught what “wrong” looks like in each domain. The adjacency is real and the ambition is clear, but the near-term reality is a biology-and-biomedicine tool in beta, and the broader vision will be judged one hard-won domain at a time rather than granted on the strength of the pattern.

A measured way to start without overcommitting

For a researcher or a research leader deciding what to do about Claude Science, the sensible path is neither to ignore it nor to reorganize around it, but to run a disciplined trial that produces real information. The first practical step is to confirm eligibility and access. Claude Science is available in beta on macOS and Linux to Pro, Max, Team, and Enterprise subscribers; Team and Enterprise users need an admin to enable it first. Academic and nonprofit labs should look at the discounted Team plan and, if the timing fits, the AI for Science credits program with its July 15, 2026 application deadline and September-to-December project window.

The second step is to choose a pilot that is real but low-stakes. The best first project is a piece of genuine computational work — a literature review, a data-analysis task, a visualization pipeline — that the group already understands well enough to judge the output, but where an error would not be catastrophic. Using a problem where the right answer is roughly known lets a lab assess the tool’s reliability against a standard it can verify, which is far more informative than a novel problem where the tool’s output cannot be checked. The point of the pilot is to learn how much you can trust the tool, and you can only learn that on work you can independently validate.

The third step is to lean into what the tool does distinctively rather than testing it as a generic assistant. Its differentiated features are the connectors to scientific databases, the compute orchestration, the reproducibility artifacts, and the reviewer agent. A pilot that exercises those — querying across several databases, running a compute-heavy analysis on the lab’s own infrastructure, generating a figure with its full provenance, and checking whether the reviewer agent catches deliberately introduced errors — produces a far better picture of the tool’s value than treating it as a chat window. The Lecoq case shows the ceiling comes from customization, so a serious pilot should also test how hard it is to encode one of the lab’s own methods as a reusable skill.

The fourth step is to validate outputs against the lab’s normal standards, exactly as the beta users did. Francis’s group independently validated its results before trusting them; Lecoq’s group is refining the critic agents with domain experts. A pilot that skips validation learns nothing useful, because the whole open question about the tool is reliability. Treating the reviewer agent as a first line of defense and expert human judgment as the last is the discipline the tool’s maturity requires, and it is the discipline that separates a useful adoption from a risky one.

The final consideration is governance, especially for organizations handling sensitive data. Before a pilot touches proprietary or regulated data, the security and data-governance review should examine what per-step context is sent to Claude, how the local deployment and SSH and Modal connections are secured, and whether the tool’s provenance records fit the organization’s compliance requirements. The local-data-residency design makes this review passable for many organizations, but it does not make it unnecessary. The measured approach is to pilot on real-but-safe work, use the distinctive features, validate everything, and clear governance before scaling — an approach that captures the tool’s genuine value while respecting that it is a powerful, early-stage system rather than a proven one. None of this is investment or legal advice; each organization should weigh its own compliance obligations and risk tolerance with its own advisors.

The IPO backdrop and the revenue logic

Claude Science cannot be read purely as a product; it is also a move in Anthropic’s approach to the public markets. The company confidentially filed draft IPO paperwork on June 1, 2026, at a reported valuation around $965 billion, with an initial public offering possible as soon as the fall. That timing sits directly behind the launch. A company weeks from one of the largest technology IPOs ever attempted needs to show public-market investors not just impressive models but growing, durable, high-value revenue, and vertical products aimed at wealthy industries are how it makes that case.

The revenue trajectory is the backdrop that makes the strategy legible. Anthropic’s annualized revenue has been reported to have grown from roughly $9 billion at the end of 2025 to a run rate reported around $47 billion in mid-2026, and the company has said it is set to see its first profitable quarter. Those are reported and run-rate figures rather than audited numbers, and they should be read with that caveat, but the direction is the point. Much of the recent growth has come from Claude Code becoming the default tool for many software teams, which is the exact template Claude Science is trying to repeat in a new, wealthier vertical.

The logic of vertical products is straightforward for a company chasing revenue that justifies its valuation. General-purpose models are valuable but face intense price competition and the persistent observation that a lot of their use amounts to writing better emails. Vertical, workflow-level products aimed at high-value professions — software, science, law, finance — let a company capture more of the value it creates and charge for it more durably. Drug discovery in particular offers deep-pocketed customers, legible value, and a humanitarian story, which is close to ideal for a company that needs to convince investors its revenue can keep growing. Claude Science is the science entry in a deliberate portfolio of vertical bets, and its commercial job is to open pharmaceutical revenue ahead of the IPO.

The market’s reaction to Anthropic’s vertical moves shows both the promise and the risk of this strategy. Each time the company enters a professional vertical — legal work earlier in the year, drug discovery now — incumbents in that vertical lose value on disruption fears, and the moves have been described as rattling markets and reflecting broader concern about which businesses AI will render obsolete. That reaction is a sign investors take the strategy seriously, but it also raises the stakes: a company whose product launches move markets is a company whose products are expected to deliver, and the gap between disruption priced in and disruption delivered is where a high valuation becomes fragile.

The uncomfortable question the IPO backdrop raises is whether the science framing is primarily about science or primarily about revenue, and the honest answer is that it is genuinely both, with the balance impossible to disentangle from outside. The humanitarian case for accelerating drug discovery is real, and Amodei’s long-stated conviction about compressing biological progress is not obviously cynical. But pharmaceutical companies have far deeper pockets than academic labs, and new pharma contracts would help keep Anthropic profitable as an IPO approaches. Both things are true at once. The most accurate reading is that Claude Science serves a sincere scientific ambition and a hard commercial need simultaneously, and that the commercial need is why it shipped now, wide, and aimed at pharma, rather than as a quiet academic tool released whenever it was ready.

Open questions the beta cannot yet settle

The most useful way to close an assessment of a beta product is to be precise about what is not yet known, because the answers to these questions, not the launch-day coverage, will determine whether Claude Science matters. The first and largest open question is reliability at scientific standards. Can the reviewer agent catch fabricated citations, untraceable numbers, and drifted figures at a rate that makes the tool trustworthy for publication-grade work, and how often do subtle, internally consistent errors slip through? The beta period will produce the real-world catch rate, and that single measure will do more to determine the tool’s fate than any feature.

The second question is validation and evidence. The named case studies are self-reported and, in the UCSF instance, internally validated, which is stronger than a demo and weaker than a published, peer-reviewed methods comparison. The field does not yet have an independent head-to-head evaluation of Claude Science against GPT-Rosalind and DeepMind’s tools on real research problems. Until funded projects and independent labs publish results — some of which the credits program is designed to produce by year-end — the evidence for the productivity claims remains promising rather than proven.

The third question is adoption and durability. Will scientists actually integrate the tool into their daily work, or try it and drift back to familiar workflows? The Claude Code template suggests deep vertical adoption is possible, but science is not software, and the reliability bar is higher. Whether the wide-and-cheap distribution strategy beats OpenAI’s narrow-and-gated approach and DeepMind’s owned-model strategy is genuinely undecided, and the answer depends on which theory of value — integration, model capability, or proprietary models — proves right in practice.

The fourth cluster of questions is safety and regulation. Can the dual-use risks of a widely available biology tool be managed by classifier-based content blocking and reliance on already-cleared models, especially after the RSP’s pause commitment was weakened? Will the industry and regulators build a durable framework before an incident forces a reactive one, as Narasimhan urged? And how will Anthropic’s fraught relationship with the US government affect adoption at federally funded institutions? These are not resolvable by the product’s design alone; they depend on collective choices the company does not control.

The final question is the one Amodei himself keeps modest, and it is the one that matters most in the long run. Will AI-accelerated research actually deliver — new drug targets, then candidates, then approved therapies for real patients — or will it join the history of healthcare-AI efforts that promised transformation and delivered less? Amodei’s near-term bar is deliberately low: some success on new targets within a year. Narasimhan’s challenge is the right one for the whole endeavor — the industry has made bold proclamations and now has to show results for patients. Claude Science is a serious, well-designed attempt to remove the friction that has slowed computational research, built on a coherent bet that integration matters more than raw model capability. Whether that bet pays off in real scientific results, and how safely it does so, are questions the next several years will answer — and the honest state of things today is that the tool is promising, the ambition is credible, and the proof does not yet exist.

The compressed 21st century behind the bet

The intellectual foundation for everything Anthropic is doing in science sits in a single essay. In October 2024, Amodei published Machines of Loving Grace, in which he argued that AI-enabled biology and medicine could compress the progress human biologists would have achieved over the next fifty to a hundred years into five to ten. He called the effect the compressed 21st century. That sentence is, in a real sense, the reason every item in Anthropic’s science timeline exists — the partnerships, the acquisition, the hires, and now the workbench are the operational expression of a thesis the company’s CEO put in writing two years earlier.

The thesis rests on a specific view of what limits biological progress. Amodei’s argument is not that biology lacks smart people or good ideas, but that the pace of discovery is throttled by the sheer difficulty of working through biological complexity — the number of experiments, the volume of data, the coordination across disparate tools and fields. If AI can dramatically raise the throughput of that work, the reasoning goes, the field could move through decades of expected progress far faster. It is a claim about acceleration through removing bottlenecks, which maps precisely onto Claude Science’s design philosophy of attacking coordination friction rather than trying to be smarter about biology than biologists.

What is notable about the launch is how carefully Amodei declined to claim the compression had arrived. At the June event he did not say the compressed 21st century was underway or imminent. He emphasized that he does not expect the full effect to transpire in the next couple of years, and set a deliberately modest near-term goal of some success on new drug-discovery targets within a year. That restraint is a meaningful signal. A CEO known for dramatic predictions, launching the product meant to realize his most famous prediction, chose to lower rather than raise expectations at the moment of maximum attention. The gap between the ten-year thesis and the one-year goal is where the honesty of the whole endeavor lives.

The essay’s framing also explains why the company treats science as more than a revenue vertical, even as revenue is clearly part of the motivation. If you believe AI could compress a century of biological progress into a decade, then building the tools to do it is not merely good business; it is close to the central justification for building powerful AI at all. Curing disease is the upside AI leaders most often cite when defending the technology’s risks, and it is the upside Amodei has staked his intellectual reputation on. That sincerity and the commercial imperative are not in conflict — they reinforce each other — but the sincerity is why the science push reads as conviction rather than opportunism.

The measured view is that the thesis is a hypothesis about the future, not a description of the present, and Claude Science is a test of it rather than proof. The compression Amodei predicts would require not just faster analysis but faster validated discovery — targets that become candidates that become approved therapies — and the current tool accelerates only the computational front of that pipeline. Whether removing coordination friction actually compounds into the dramatic acceleration the essay imagines, or whether the real bottlenecks lie in the wet-lab and clinical work the tool does not touch, is the deepest open question of all. Claude Science is the most concrete bet yet on the compressed-21st-century thesis, and the coming years will show whether the bottleneck was ever really coordination.

What the connected databases actually are

The claim that Claude Science reaches more than sixty scientific sources means little without understanding what those sources are and why stitching them together is hard. The named biological databases each answer a different question and speak a different language, and a working researcher normally has to know the quirks of each. Laying them out shows why the synthesis layer is the actual product.

UniProt is the central reference for protein sequence and function — what a protein is, what it does, and what is known about it. The Protein Data Bank, or PDB, holds experimentally determined three-dimensional structures of proteins and other large biological molecules, the atomic-level coordinates that let researchers see how a molecule is shaped. These two answer complementary questions: UniProt tells you about the protein as information, PDB shows you its physical structure, and connecting them requires mapping between their different identifier systems.

On the genomics side, Ensembl is a genome browser and annotation resource that maps genes, transcripts, and regulatory features across the genome, while ClinVar catalogs the relationships between genetic variants and human health — which mutations are associated with which conditions. A question about whether a variant matters clinically often requires pulling gene structure from one and clinical significance from the other. GEO, the Gene Expression Omnibus, stores gene-expression data from thousands of experiments, the raw material for understanding which genes are active in which conditions. Each uses its own schema, and harmonizing them is exactly the tedious integration work that eats research time.

Reactome is a curated database of biological pathways — the networks of molecular interactions that carry out cellular processes — which lets a researcher move from individual genes or proteins to the systems they participate in. ChEMBL is the medicinal-chemistry counterpart, a database of bioactive molecules with drug-like properties and their measured effects, essential for anyone reasoning about compounds and drug candidates. A drug-discovery question might touch a target’s structure in PDB, its function in UniProt, its pathway in Reactome, and known active compounds in ChEMBL — four databases, four query languages, for one question.

Claude Science’s proposition is that a researcher asks the question once, in plain language, and specialist agents handle the querying and the synthesis across all of these. The databases themselves are mostly public and free; the value the tool adds is removing the translation and integration layer between them — the layer that has historically required either deep familiarity with each resource or laborious glue code. Whether the synthesis is accurate, whether the connectors stay current as the underlying databases evolve, and whether sixty sources is the right coverage for a given field are the practical questions, but the design correctly identifies integration, not access, as the problem worth solving.

The reproducibility problem this is meant to address

Claude Science’s heavy emphasis on auditable artifacts and a citation-checking reviewer agent is a response to a specific, well-documented crisis in modern science, and understanding that crisis clarifies why the features matter. Across several fields, a significant fraction of published results have proven difficult or impossible to reproduce, and the causes are mundane rather than dramatic: undocumented analysis steps, lost or unversioned code, unrecorded software environments, and parameters that no one wrote down. When the recipe for a result is not preserved, the result cannot be checked, and science’s core mechanism of verification breaks down.

The rise of AI-assisted research has added a sharper, newer failure mode on top of the old one. As researchers use language models to help write papers and analyze data, fabricated citations and untraceable statistics have crept into the literature — references to papers that do not exist, numbers with no derivable source, claims that sound authoritative and are simply wrong. A confident, well-formatted, fabricated citation is more dangerous than an obvious error because it survives casual review. The tool that helps write the paper can, if unguarded, be the tool that corrupts it.

Claude Science’s design targets both the old and the new versions of the problem. Against the old undocumented-analysis problem, it attaches the exact code, the computing environment, a plain-language description, and the full message history to every artifact, so the recipe travels with the result and a reviewer or the original author can reconstruct exactly how a figure or number was produced. The provenance is not a report generated after the fact; it is a record captured as the work happens, which is precisely what undocumented notebooks fail to provide.

Against the new fabricated-citation problem, the reviewer agent runs continuously, checking citations against sources, flagging numbers it cannot trace, and catching figures that have drifted from their code. This is a direct architectural answer to the specific way AI-assisted writing degrades the literature. It cannot catch every conceptual error, but the three failure modes it targets — unsupported citations, untraceable numbers, and figure-code mismatches — are exactly the ones that AI-assisted research has made more common. Building the check into the tool that does the work is a more reliable safeguard than hoping a human reviewer catches the error later.

The honest limit is that a tool cannot single-handedly fix a cultural problem. Reproducibility depends on incentives — journals, funders, and institutions rewarding careful, checkable work — as much as on tooling, and a tool that makes provenance easy still requires researchers to use it and reviewers to demand it. Claude Science lowers the cost of doing reproducible work and raises the odds of catching fabricated citations, which is a genuine contribution. But it sits inside a scientific culture that has to value verification for the tooling to matter. The reproducibility features are a real answer to a real crisis, and they will help most in the labs and journals that already care about the problem — which is both their promise and their limit.

The Claude Code template and why Anthropic keeps invoking it

Every account of the launch returns to the same comparison, because Anthropic put it at the center of its own framing: Claude Science is meant to do for life science what Claude Code did for software engineering. Understanding what actually happened with Claude Code explains both the ambition and why investors and rivals take the science bet seriously rather than dismissing it as hype.

Claude Code turned a general-purpose model into a tool that software developers open every working day. Rather than a chat window where a developer pastes code and copies answers back, it became an agent that works inside a developer’s actual environment — reading the codebase, running commands, editing files, and carrying out multi-step tasks from high-level instructions. The shift was from a model you consult to a tool you work inside, and it drove a large share of Anthropic’s recent revenue growth as it became something close to a default for many engineering teams. The lesson Anthropic drew is that the value was not only in the model’s capability but in meeting the professional inside their real workflow, with access to their real tools.

Claude Science applies that lesson almost point for point. Where Claude Code reads a codebase and runs commands, Claude Science queries databases and runs analyses. Where Claude Code edits files, Claude Science edits the code behind figures. Where Claude Code carries out multi-step engineering tasks from high-level instructions, Claude Science carries out multi-step research from plain-language requests. The coordinating agent, the delegation to specialists, and the operation inside the user’s own infrastructure are the same architectural ideas transposed from software to science. The workbench is, in a real sense, Claude Code’s design philosophy pointed at a new profession.

The comparison also carries the strategic logic. Claude Code showed that owning the operating layer of a profession — the daily environment where the credentialed work happens — is more valuable and more durable than selling raw model access, because it captures more of the value created and is harder for a rival to displace once adopted. Science is the next high-value profession Anthropic is trying to own the operating layer of, following the same playbook that worked in software and that the company is also running in legal work. The rivals understand this, which is why OpenAI and Google are racing to establish their own positions in science rather than ceding the operating layer.

The limit of the analogy is the one that determines whether the bet works, and it is worth stating plainly. Software has a forgiving error model — a bug is usually caught by tests, revealed at runtime, and fixed cheaply — while science has an unforgiving one, where a wrong number can waste months and survive review. Claude Code could be adopted fast partly because its mistakes were cheap and visible. Claude Science’s mistakes can be expensive and hidden, which raises the reliability bar far higher and is exactly why the reviewer agent and independent validation matter so much. The Claude Code template is a real and credible reason to take the science bet seriously, but the higher cost of error in science is the reason the template may not transfer as cleanly as the framing suggests.

The agentic shift and the voices in the room

Claude Science would not have been buildable eighteen months earlier, and the reason is a broader shift in what AI systems can do. Since late 2025, agents powered by large language models — including Anthropic’s Opus series — became capable of useful, independent work when given high-level instructions, rather than only responding turn by turn. That shift from conversational assistant to autonomous agent is what makes a coordinating agent that plans, delegates, runs jobs, and checks its own work possible at all. Claude Science is a product of the agentic era, and it inherits both that era’s new capability and its unresolved reliability questions.

The scale of what agents can now attempt in science has been noted by working researchers, not only by the companies selling the tools. In one account cited by Anthropic, a Harvard physicist estimated, on the basis of his own work with Claude Code and related tools, a striking degree of acceleration in his research. Whether any single such estimate holds up, the pattern is that scientists who are also serious users of these tools have been surprised by how much independent work the agents can do, which is the grassroots demand that a product like Claude Science is built to meet. The tool is riding a wave of researcher experimentation that predates it.

The launch event itself was unusually candid for a product reveal, and the candor is worth recording because it tempered the company’s own hype. Alongside Amodei, the stage held GLP-1 drug inventor Lotte Knudsen, Bristol Myers Squibb CEO Chris Boerner, Novartis CEO Vas Narasimhan, and Genentech executive Aviv Regev, with STAT’s Matthew Herper interviewing. This was not a room of AI enthusiasts; it was a room of pharmaceutical leaders whose companies would pay for the tool if it works and who have seen expensive technology promises fail before. Their presence signaled that the conversation was about serious therapeutics and real budgets rather than demos.

Narasimhan’s remarks were the sharpest, and they matter precisely because he sits on Anthropic’s board and had every incentive to cheerlead. Instead he insisted the industry has made a lot of bold proclamations and now needs to actually show results for patients, and he called for appropriate AI regulation before a crisis forces it, warning that it would be a shame if a crisis is what pushes the industry to act. Coming from a board member and major potential customer, that is a striking public setting of expectations — a demand that the technology prove itself and be governed, delivered from the same stage as the launch. The most credible thing about the event was that the pharma leaders on it declined to oversell, which reads as an industry that has been burned by hype and intends to judge Claude Science by delivered results rather than promises.

The broader talent and capital context frames why this moment is happening now across the whole industry. The leading AI labs are pouring resources into science at once — Anthropic’s workbench, OpenAI’s Rosalind and its expected sequence of domain models, DeepMind’s steady stream of science models and Isomorphic’s multi-billion-dollar raises — and pulling elite researchers toward the labs that let them move fastest. The convergence of new agentic capability, enormous capital, and a talent war has made mid-2026 the moment the AI-for-science contest went fully public. Claude Science is one company’s entry in a race that all the major labs are now running at full speed, and the pharma leaders in that San Francisco room were there to decide, eventually, which entries actually deliver for patients.

Questions researchers are asking about Claude Science

What is Claude Science?

Claude Science is an AI workbench for scientists that Anthropic launched in beta on June 30, 2026. It brings scientific databases, code, compute, and analysis into one environment where a coordinating AI agent can carry a research project from a literature question through to publication-ready figures and manuscripts, with an auditable history attached to every output.

Is Claude Science a new AI model?

No. Claude Science is an application, not a model. It runs on Anthropic’s existing models, including Claude Opus 4.8 and Sonnet 5, with no special access or biology-specific tuning. The new part is the environment around the models: specialist agents, more than sixty connected databases and skills, compute orchestration, and an automated reviewer agent.

Which plans include Claude Science?

It is available in beta to Claude Pro, Max, Team, and Enterprise subscribers. There is no separate license or per-seat science fee on top of the subscription. Team and Enterprise users need an administrator to enable it first.

What operating systems does Claude Science run on?

At launch it runs on macOS and Linux. It can run locally on your machine, on a remote machine over SSH, or on a high-performance computing login node. Windows is not supported in the beta.

How does Claude Science handle sensitive data?

It runs on the lab’s own infrastructure, so large or sensitive datasets stay on the systems they already live on and only the context each analysis step needs is sent to Claude. It asks before reaching new resources and lets you review or revoke decisions before a job runs. This design is intended to fit clinical and proprietary-data constraints.

What databases does Claude Science connect to?

It ships pre-configured with more than sixty scientific sources through skills and connectors. Named biological databases include UniProt, the Protein Data Bank, Ensembl, ClinVar, Reactome, ChEMBL, and GEO. It also connects to NVIDIA’s BioNeMo Agent Toolkit and models such as Evo 2, Boltz-2, and OpenFold3.

What is the reviewer agent?

The reviewer agent is a separate agent that runs alongside the work and checks it. It verifies citations, flags numbers it cannot trace to a source, and catches figures that do not match their underlying code, correcting errors as the pipeline runs. It is Anthropic’s technical answer to the problem of fabricated citations and untraceable statistics in AI-assisted research.

How does Claude Science make research reproducible?

When it generates a figure, it includes the exact code and computing environment that produced it, a plain-language description of how it was made, and the full message history. The recipe travels with the result, so the work can be validated and reproduced later, and figure edits are made by changing the underlying code rather than the image.

How is Claude Science different from GPT-Rosalind?

GPT-Rosalind is OpenAI’s specialized biology reasoning model, gated to qualified US enterprise customers. Claude Science is a workbench built around existing general models, offered wide to all paid subscribers. OpenAI is betting on a smarter specialized model; Anthropic is betting on better integration and workflow around models that are already capable enough.

How is it different from Google DeepMind’s approach?

DeepMind owns foundational scientific models like AlphaFold and AlphaGenome and bundles them in Gemini for Science, plus a drug-discovery spinout, Isomorphic Labs. Claude Science does not own such models; it connects to partner models as tools. DeepMind sells proprietary scientific capability, while Anthropic sells integration around existing models.

Can Claude Science design drugs?

It accelerates the early, computational stages of drug discovery — analyzing molecular and biological data, comparing findings, identifying promising compounds, and prioritizing experiments. It does not run wet-lab experiments or validate candidates physically. In the Manifold Bio case it was used to nominate targets, not to make final decisions.

What have beta users done with it?

Manifold Bio used it to nominate drug targets end to end using its proprietary criteria. Allen Institute neuroscientist Jérôme Lecoq built a multi-agent review pipeline that produced about ten long reviews from a process that once took up to two years each. UCSF epidemiologist Stephen Francis ran glioma germline analyses in roughly one-tenth the previous time and independently validated the results.

What is the AI for Science credits program?

Anthropic will support up to fifty projects with up to $30,000 in credits each, and Modal will add up to $2,000 in compute for select projects. Applications are open through July 15, 2026, awards go out by July 31, and funded projects run from September 1 to December 1, 2026, with an early focus on biology and biomedical research.

Is there a discount for academic labs?

Yes. Anthropic introduced a Team plan with discounted seats for active scientific labs at academic institutions and nonprofit research organizations, aimed at lowering the cost barrier for university and nonprofit researchers.

Does Claude Science manage its own compute?

Yes. It drafts a computing plan, asks before reaching new resources, and then writes and submits jobs to the resources a lab already uses — its own HPC cluster over SSH or a Modal account for on-demand GPUs — scaling from a single GPU to hundreds as needed. Datasets held in the running session load once and stay in memory.

What are the main limits of Claude Science?

It is a beta product, runs only on macOS and Linux, and accelerates computational work rather than replacing experimental or clinical validation. Its reliability at publication and regulatory standards is unproven, and the reviewer agent catches verifiable errors but can miss subtle conceptual ones. Independent validation of its outputs remains necessary.

How does Claude Science handle biosecurity risk?

It runs on models that have already passed Anthropic’s biological-capability evaluations and carry existing ASL-3 safeguards, so it does not introduce a new, more biologically capable model. Because it uses standard models rather than a new domain-tuned one, it inherits their safeguards rather than raising the capability ceiling.

Why did drug-discovery stocks fall when Claude Science launched?

Investors read a widely available AI research workbench as a potential threat to companies whose value rests on proprietary computational platforms or research services. Schrödinger fell as much as 8.3 percent, Recursion about 3.3 percent, and IQVIA more than 2.3 percent intraday. The concern is largely about the long run; the tool does not replace wet-lab work or deep domain platforms today.

How does Claude Science fit Anthropic’s IPO plans?

Anthropic filed confidential IPO paperwork on June 1, 2026, at a reported valuation around $965 billion, with an offering possible in the fall. Vertical products aimed at wealthy industries like pharma help show the growing, durable revenue public-market investors want, following the template of Claude Code’s success in software.

Should a lab adopt Claude Science now?

A measured approach is to run a pilot on real but low-stakes computational work that the lab can independently validate, exercise the distinctive features (database connectors, compute orchestration, reproducibility artifacts, and the reviewer agent), validate all outputs against normal standards, and clear a security and data-governance review before scaling. Treat it as a powerful assistant whose work is checked, not an oracle.

Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

Claude Science bets on the workflow, not a smarter model
Claude Science bets on the workflow, not a smarter model

This article is an original analysis supported by the sources cited below

Claude Science, an AI workbench for scientists, is now available Anthropic’s official announcement of Claude Science, detailing the coordinating agent, reviewer agent, reproducible artifacts, compute management, connected databases, beta availability, named case studies, and the AI for Science credits program.

Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists TechCrunch’s analysis of the launch, emphasizing that Claude Science uses existing models, comparing the distribution strategies of Anthropic, OpenAI, and Google DeepMind, and describing the coordinating and reviewer agents.

Claude Science is Anthropic’s newest flagship product MIT Technology Review’s report on the launch event, Anthropic’s own neglected-disease drug programs, the John Jumper hire, the profitability and IPO context, and the Claude Code comparison.

Anthropic releases Claude Science, a product aimed at researchers, the pharma industry STAT’s coverage of the San Francisco launch event, including Dario Amodei’s framing and the pharmaceutical-industry orientation of the product.

5 takeaways from Anthropic’s big science event Fast Company’s account of The Briefing: AI for Science event, the on-stage demo by product lead Alexander Tarashansky, and the panel of pharmaceutical and research leaders.

Anthropic releases Claude Science for automating research Reporting on the launch including Vas Narasimhan’s remarks on delivering results for patients and the need for AI regulation, the use of Opus 4.8, and the company’s valuation and IPO timing.

Anthropic Launches Claude Science to Enter Life Sciences, AI Drug Discovery Stocks Tumble in Response Analysis of the market reaction, including declines in Schrödinger, Recursion, and IQVIA shares, and Anthropic’s internal preclinical drug-discovery initiative.

Anthropic Launches Claude Science AI Workbench for Researchers A detailed overview of the workbench, its multi-agent architecture, the sixty-plus databases and toolkits, beta availability on macOS and Linux, and the credits program.

Claude Science: Anthropic Brings AI into Scientific Labs An explainer covering the named databases, the BioNeMo models Evo 2, Boltz-2, and OpenFold3, the Jupyter-like interface, local and HPC execution, and the reliability standard the reviewer agent must meet.

Anthropic launches Claude Science AI research workbench An editorial analysis of the workflow-versus-model tradeoff, the named beta use cases, and the practical evaluation criteria of connector security, provenance fidelity, and reviewer-agent reliability.

GPT-Rosalind: OpenAI’s 2026 Life Sciences AI Model Background on OpenAI’s specialized life-sciences model, its BixBench performance, and its positioning against Google DeepMind’s Isomorphic Labs and AlphaFold lineage.

OpenAI’s life sciences push energizes VCs PitchBook’s analysis of the competitive landscape in AI for life sciences, including DeepMind’s multi-year head start in domain-specific biological reasoning.

GPT-Rosalind: OpenAI’s Life Sciences AI Upgrade Coverage of the June 2026 GPT-Rosalind update, its enterprise partner roster including Novo Nordisk and the Allen Institute, and Isomorphic Labs’ Series B funding.

AI Giant Anthropic Leans Into Life Sciences With $400M Coefficient Bio Catch Reporting on Anthropic’s acquisition of Coefficient Bio and the October 2025 launch of Claude for Life Sciences, with its pharmaceutical customer base.

Why AI maker Anthropic’s deal with Coefficient Bio could be a pharma turning point Analysis of Anthropic’s life-sciences strategy, the expansion of Claude for Life Sciences into clinical and regulatory stages, and the wider wave of pharma-AI deals in 2026.

Novartis CEO joins Anthropic board, embedding AI in the heart of biopharma Coverage of Vas Narasimhan’s appointment to Anthropic’s board and the company’s broader turn toward healthcare and life sciences.

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic Reporting on the AlphaFold creator’s move to Anthropic, part of the company’s deliberate AI-for-science buildout.

John Jumper to leave Google DeepMind for Anthropic CNBC’s account of Jumper’s departure, AlphaFold’s impact, and the broader AI talent war among leading labs.

Responsible Scaling Policy Anthropic’s framework tying model capabilities to required safeguards, including the ASL-3 deployment standard aimed at chemical and biological weapons risk.

Responsible Scaling Policy Version 3.0 Anthropic’s February 2026 policy revision, including its account of activating ASL-3 safeguards in May 2025 and the reasoning behind changes to its deployment commitments.

AI models for Life Sciences in 2026 A comparison of life-sciences AI offerings from Anthropic, OpenAI, Google DeepMind, NVIDIA, Microsoft, and Meta, including the connectors and anchor customers of Claude for Life Sciences.

NVIDIA launches BioNeMo Agent Toolkit, giving AI agents the tools to accelerate scientific discovery NVIDIA’s announcement of the BioNeMo Agent Toolkit that Claude Science uses to connect to life-science models and libraries including Evo 2, Boltz-2, and OpenFold3.

Citing this article? Brief excerpts are welcome. Please credit Webiano.digital, name the author where stated, and include a link to https://webiano.digital and to this original article. Full or substantial republication requires prior written permission. Read our Copyright and Content Use Policy.