Open source AI agents will live or die by trust

Open source AI agents will live or die by trust

AI agents are no longer only chat interfaces with a better memory. They are becoming software actors that read files, call tools, write code, move data, trigger workflows, coordinate with other agents and act across business systems. That change makes open source AI more valuable, but it also makes trust harder. A model that answers a question can be wrong. An agent that acts on a question can change records, leak data, approve transactions, alter repositories, or start a chain of events no human reviews in time.

Table of Contents

Agentic AI has crossed from software helper to system actor

The news is not simply that agents are improving. The sharper story is that the agent stack is becoming open, shared and reusable at the same time that the risk surface is moving from text generation into operational control. The Linux Foundation announced the Agentic AI Foundation in December 2025 with founding project contributions including Anthropic’s Model Context Protocol, Block’s goose and OpenAI’s AGENTS.md. The stated aim is to give agentic AI a neutral open foundation and shared infrastructure as autonomous systems start to work together.

That matters because open source is not a side channel in agentic AI. It is becoming part of the core infrastructure. MCP is meant to connect models and agents to tools, data and applications. AGENTS.md gives coding agents a predictable place to find project-specific instructions. Goose gives users a local open source agent that can run through desktop, command-line and API workflows. The same pattern is visible in LangChain, LlamaIndex, CrewAI, OpenAI’s Agents SDK and many smaller frameworks that let developers wire models to tools with a few lines of code.

The control problem is easy to state and hard to solve. An agent becomes useful only when it receives authority. It needs access to email, calendars, customer data, source code, cloud consoles, payment rails, ticketing tools, browsers, databases, identity providers and internal documentation. The more access it receives, the less it resembles a passive application. It starts to look like a privileged non-human worker. Enterprises know how to manage employees, service accounts, bots, APIs and workloads. Agentic AI blends pieces of all four, then adds probabilistic reasoning and tool selection on top.

Open source intensifies this tension. It gives developers transparency, portability, community review and freedom from a single vendor’s roadmap. It also spreads powerful orchestration patterns faster than most security teams can evaluate them. A weak agent framework can be forked. A dangerous plug-in can be packaged attractively. A tool server can be installed by a developer before procurement or security has seen it. A misconfigured local agent can move from a personal productivity toy to a bridge into corporate systems.

The future of agentic AI will not be decided only by which model plans better or writes cleaner code. It will be decided by whether agents can be identified, scoped, monitored, interrupted, audited and governed in ways that survive real deployment. Trust, identity and security are no longer support functions. They are the operating layer of autonomous AI.

Open source gives agentic AI its path to adoption

The open source route is attractive for AI agents because the agent layer is still unstable. Developers are not choosing one finished platform. They are assembling models, retrieval systems, tool protocols, workflow engines, memory stores, sandboxes, registries, policy engines and observability systems. In that setting, open code and open standards reduce the fear of lock-in. A company can start with one model provider, connect tools through a shared protocol, switch orchestration frameworks later, and retain some control over its own integration layer.

That is the practical reason MCP spread so quickly. The Model Context Protocol frames tool access as a standard interface between AI applications and external data sources or tools. The Linux Foundation announcement said MCP had more than 10,000 published servers and adoption across Claude, Cursor, Microsoft Copilot, Gemini, VS Code, ChatGPT and other platforms. The MCP project’s own announcement later cited more than 97 million monthly SDK downloads, 10,000 active servers and first-class client support across major AI platforms.

Open standards also reduce integration waste. Before a shared tool protocol, every vendor had an incentive to build its own connector system. That made each agent stack a private island. Developers had to rewrite adapters, duplicate security reviews and maintain parallel integration code. MCP and A2A do not remove complexity, but they give the industry a common vocabulary: MCP for agent-to-tool communication, A2A for agent-to-agent communication, and files such as AGENTS.md for predictable instructions in code repositories. Google introduced A2A in April 2025 as an open protocol for agents built by different vendors or frameworks to communicate, securely exchange information and coordinate actions across enterprise platforms.

The same openness helps smaller organizations. A startup cannot negotiate deep integrations with every major SaaS provider. A public protocol lets it build once and reach many environments. A security researcher cannot inspect a proprietary agent runtime deeply. An open framework gives researchers a place to test assumptions and report weaknesses. A regulated enterprise may distrust a black-box agent that sends instructions through a vendor’s cloud. A local, auditable or self-hosted agent is easier to evaluate, even if it is not automatically safer.

The Open Source Initiative’s Open Source AI Definition 1.0 sets a high bar for the phrase “open source AI.” It says an open source AI system must grant freedoms to use, study, modify and share the system, and it requires access to the preferred form for modification, including data information, code and parameters under appropriate terms. The definition is not just a legal debate. For agents, openness affects operational trust. A company needs to know not only the model weights, but also how the agent chooses tools, how it stores memory, how it handles prompts, how it loads plug-ins, and how it records actions.

The agent layer is where openness may matter most. A closed model with an open, inspectable agent runtime is very different from a closed model behind a closed agent platform with opaque tool access. An open model running inside an unsafe agent harness is not a trustworthy system. The unit of trust is no longer the model. It is the full agentic system: model, tools, permissions, memory, policy, logs, runtime and supply chain.

Open source therefore gives agentic AI a route to adoption, but not a free pass. The market will reward open projects that show clean governance, disciplined releases, secure defaults and clear accountability. Projects that treat openness as a badge rather than an engineering obligation will create the opposite effect. They will prove that visibility without control is only partial trust.

The agent stack is becoming an operating environment

An AI agent is best understood as an operating environment for decisions. The model is only one component. The useful agent has a loop: receive a goal, interpret context, select tools, call those tools, observe results, update state, decide whether to continue, and produce an answer or action. LangChain’s documentation describes agents as systems that combine language models with tools, reason about tasks, decide which tools to use and keep working until a stop condition is met. LlamaIndex describes an agent as an automated reasoning and decision engine that can break a question into smaller pieces, choose external tools, plan tasks and store completed work in memory.

That loop changes software governance. Traditional applications execute code paths written in advance. Agents may assemble a path at runtime. A human says, “reconcile these customer invoices,” and the agent may search documentation, query a database, call a billing API, write a draft message, compare records, and ask another agent for a missing field. The developer did not write every branch as an explicit workflow. The agent selected a sequence under constraints.

This does not make agents magical or conscious. It makes them operationally unusual. The model is not only producing content; it is selecting capabilities. The risk is not only hallucination; it is misapplied authority. A confident but wrong answer is a quality issue. A confident but wrong tool call can become a security event, compliance breach or financial error.

The operating-environment analogy explains why agent security keeps borrowing from older systems. Microsoft’s Agent Governance Toolkit announcement explicitly compared agent governance to operating-system kernels, service meshes and site reliability practices. The toolkit is designed to intercept agent actions before execution, add cryptographic identity for agent-to-agent communication, use execution rings, support kill switches and collect compliance evidence.

This is the right direction. Agents need kernel-like mediation because the model should not be the final authority on whether an action is allowed. They need service-mesh-like identity because agents will call tools and other agents across distributed environments. They need site-reliability patterns because autonomous workflows fail in cascades. They need supply-chain controls because plug-ins, skills, MCP servers and model packages are installable components, not abstract ideas.

The open source world has built pieces of this architecture before. Kubernetes taught organizations to manage distributed workloads declaratively. Service meshes normalized workload identity, mTLS and policy enforcement. OpenSSF projects gave maintainers tools to assess dependency risk, sign artifacts and record provenance. The task now is not to invent trust from nothing. It is to adapt proven controls to AI systems whose behavior is partly generated at runtime.

The hardest adjustment is mental. Security teams often ask, “Which application is this?” Agentic AI often answers, “It is a runtime that can become many workflows.” Procurement teams ask, “Which vendor is responsible?” Open source agent stacks may involve a foundation, a model provider, a framework maintainer, plug-in authors, cloud infrastructure, internal developers and downstream users. Compliance teams ask, “Where is the system boundary?” Agents blur that boundary by calling external tools and delegating work.

The companies that deploy agents safely will treat the agent stack as infrastructure, not as a productivity feature. They will inventory it, patch it, test it, restrict it, observe it and assign owners. They will not let agent frameworks grow as unofficial developer utilities hidden inside laptops and CI jobs.

Trust means control, not belief

Trust in agentic AI is often discussed as if it means confidence in a model’s intelligence. That framing is too soft for production systems. For agents, trust means control. A trustworthy agent is not one that sounds reasonable. It is one whose authority is bounded, whose actions are visible, whose identity is verifiable, whose dependencies are known, whose failures are contained and whose decisions can be challenged.

The distinction matters because agents will often behave persuasively before they behave reliably. They can write explanations for actions they should not have taken. They can produce polite rationales for tool calls that violated policy. They can hide uncertainty behind fluent text. A human reviewer who sees only the final answer may miss the dangerous step in the middle. The audit trail must cover the action path, not only the output.

Open source helps here by allowing inspection of the scaffolding around the model. Developers can examine how a framework handles tool schemas, approvals, memory, retries, exceptions, prompt templates and callbacks. Researchers can test whether tool descriptions are treated as trusted text. Enterprises can fork or patch code if a maintainer does not move fast enough. Community review can expose unsafe patterns before they become hidden dependencies inside thousands of products.

But open source does not equal trust by itself. A public repository can still be abandoned, underfunded, poorly reviewed or compromised. A permissive license does not prove secure design. A large star count does not prove safe defaults. The Reuters report on China’s warning about OpenClaw illustrates the point: the warning focused on improper configuration, public network exposure, identity authentication and access controls, not merely on whether the agent was open source.

The same is true for protocols. MCP’s security best practices document lists attacks and mitigations, including confused deputy risks in proxy servers. The MCP specification requires user consent and control for data access and tool invocation, warns that tools represent arbitrary code execution, and says tool behavior descriptions should be treated as untrusted unless obtained from a trusted server. These are strong words, but they become trust only when clients, servers, frameworks and enterprises implement them consistently.

Trust is therefore a layered property. It includes governance at the project level, assurance at the build level, authentication at the runtime level, authorization at the tool level, validation at the input level, monitoring at the operation level and accountability at the organizational level. Remove one layer and the system may still look open while acting unsafe.

The agentic market is likely to split along this line. Some tools will sell convenience: connect everything, automate anything, approve later. Others will sell controlled autonomy: prove who the agent is, limit what it can do, record what it did, stop it when it drifts. The second category will define enterprise agentic AI, because real systems do not run on vibes. They run on permissions.

Identity is the missing center of agent governance

The identity question is deceptively simple: who is acting? In a normal web application, the answer is usually a user, a service account or a workload. In agentic AI, the answer may be a chain: a human user delegated a task to an agent, the agent called an MCP server, the server used a connector to reach a SaaS API, another specialized agent handled a subtask, and a background workflow completed the action after the user left.

If the final API call appears only as the original user, the organization cannot tell whether the user clicked a button or an agent acted on their behalf. If the final API call appears only as a generic bot account, the organization loses the human source of delegation. If each tool receives broad static credentials, the agent becomes a privileged credential spreader. Agent identity needs to preserve both the non-human actor and the human or business authority behind it.

Existing standards provide part of the answer. OpenID Connect gives applications a way to receive verifiable assertions about users based on OAuth 2.0. RFC 8707 Resource Indicators let a client signal the protected resource for which it is requesting access, reducing token ambiguity in multi-resource settings. The MCP authorization specification points directly to this issue by requiring MCP clients to include the resource parameter when applicable and MCP servers to validate that tokens were specifically issued for their use.

Workload identity also matters. SPIFFE and SPIRE provide strongly attested cryptographic identities to workloads across varied platforms. They are designed for distributed systems where static secrets and network location are weak trust anchors. That pattern fits agent infrastructure because agents will run in containers, CI jobs, desktops, SaaS runtimes and multi-cloud environments.

The gap is that user identity, workload identity and agent identity are not the same. A human user has rights, intent and accountability. A workload has deployment metadata, attestation and lifecycle. An agent has delegated goals, tools, policies, memory and behavioral constraints. A secure enterprise architecture needs all three signals. The policy engine should know: this is the invoice-reconciliation agent, running in this approved environment, released from this signed build, acting for this finance analyst, limited to these customers, allowed to call these APIs, until this time, with this approval threshold.

This is where open source could shape the industry. If every vendor invents its own private agent identity model, agent traffic will become impossible to govern across organizational boundaries. Open standards and open implementations can let companies verify agents without surrendering control to one platform provider. Agent identity should be portable, cryptographically verifiable and understandable to existing IAM and security tools.

Identity also has a user-experience dimension. A user should know when an agent is acting as them, when it is acting as itself, and when it is asking another agent or tool to act. Consent screens that say “allow access to your data” are too vague for autonomous workflows. Consent should be specific to task, resource, duration, capability and risk. A well-designed agent identity system will not bury that detail; it will make authority legible.

Agentic AI will not be governed by prompt instructions alone. A prompt saying “do not access confidential files” is a weak substitute for an access token that cannot access confidential files. A system prompt saying “ask before making changes” is weaker than a policy gate that blocks state-changing tool calls without approval. Identity is where human intent becomes enforceable machine authority.

Permission boundaries matter more than model cleverness

A highly capable model inside a weak permission model is not an asset. It is an accelerant. The better the model becomes at planning, code writing, interface navigation and social reasoning, the more damage it can cause if permissions are loose. That is why the agentic AI race cannot be measured only by benchmark performance. The decisive question is what the agent is allowed to touch.

The principle of least privilege is old, but agents make it newly urgent. An agent that helps answer support tickets does not need write access to billing refunds by default. A coding agent does not need unrestricted access to production secrets. A research agent does not need permission to send external emails. A browser agent does not need access to every authenticated session in a user’s profile. The default should be narrow authority with explicit escalation, not broad access with hoped-for restraint.

MCP’s specification recognizes this by requiring explicit user consent before exposing user data to servers and before invoking tools. It also says users should understand what each tool does before authorizing its use. The hard part is implementation. Real users approve prompts quickly. Developers may bundle multiple actions behind one tool name. Tool descriptions may be confusing or malicious. A user may not know that “sync workspace” includes reading private documents, creating tickets and posting messages.

Permission boundaries therefore need technical shape. Tools should advertise capabilities in machine-readable form. Clients should display state-changing operations differently from read-only operations. Dangerous actions should require step-up approval. Agents should receive short-lived tokens with audience binding and scoped claims. Memory stores should be partitioned by user, project and sensitivity. Tool calls should carry purpose metadata, not only parameters.

For open source projects, secure defaults matter more than long security pages. Most developers copy examples. If the example grants a global API key to an agent, production systems will repeat that pattern. If the quickstart exposes a local server without authentication, someone will deploy it that way. If a tutorial hides approval hooks to reduce friction, the ecosystem pays later. The first five minutes of developer experience often become the first year of security debt.

Agent frameworks should make the safe path the short path. A developer building a file-editing agent should get sandboxed execution, path restrictions, audit logging and human approval patterns by default. A developer building an MCP server should get authentication, authorization, input validation and rate limiting from the template. A developer installing a skill should see publisher identity, version pinning, requested capabilities and signature status.

This is not anti-open-source. It is how open source earns adoption in high-stakes environments. The best open projects do not ask enterprises to choose between openness and control. They package control into the architecture. Kubernetes did not succeed because containers were free; it succeeded because orchestration, desired state, access control and ecosystem conventions made containers manageable. Agentic AI needs the same maturation.

Transparency is necessary, but incomplete

Open source AI discussions often place transparency at the center, and rightly so. A system that cannot be inspected is harder to trust. The OSI definition stresses the freedom to study how an AI system works and inspect its components, with access to data information, code and parameters as preconditions for meaningful modification. In agentic AI, transparency also means understanding tools, permissions, logs, prompts, policies, memory and external calls.

Yet transparency alone does not stop an unsafe action. A glass box can still be dangerous. An open protocol can still be misimplemented. A public repository can still ship insecure defaults. A visible tool description can still poison an agent. The Stanford Foundation Model Transparency Index found that the industry remained opaque on training data, training compute, model use and societal impact, and it warned that openness does not guarantee transparency.

The agent layer adds another transparency problem: runtime behavior. A model card may describe a foundation model. A repository may describe an agent framework. But the actual deployed agent is a composition. It depends on which tools were enabled, which MCP servers were installed, which memory was available, which policies were configured, which model version was called, which user delegated the task and which data appeared in the context window.

That means transparency must become operational. Enterprises need agent manifests, tool registries, permission inventories, prompt-change histories, memory-retention rules, signed releases, evaluation reports, incident logs and traceable execution records. A one-time disclosure is not enough. Agents evolve as tools and data change. A safe agent in a test environment may become unsafe when connected to production systems.

Open source can support operational transparency by making metadata standard. AGENTS.md is a modest but important example. It gives coding agents a predictable place for repository-specific instructions, separate from README files aimed at humans. That kind of convention reduces hidden behavior. A code repository can tell agents how to test, what style rules to follow and what boundaries apply. It is not a complete security system, but it makes agent behavior more inspectable.

The next step is stronger metadata for authority. Agent cards, tool manifests, signed plug-ins, SBOMs for agent bundles, and policy declarations should become normal. A security team should be able to ask, “What can this agent do?” and receive a structured answer. A compliance team should be able to ask, “Which model and tool chain produced this action?” and receive a trace. A user should be able to ask, “Did the agent send my data outside the organization?” and receive evidence.

Transparency becomes trust only when it is tied to control points. Seeing that an agent can call a dangerous tool is useful. Blocking that call unless the right identity, policy and approval are present is trust.

Security risk has moved into the tool layer

Early generative AI risk focused heavily on model outputs: hallucination, bias, unsafe content, copyright exposure and misinformation. Those risks remain, but agents move the center of gravity into the tool layer. Tool access is what lets an agent affect the world. It is also where prompt injection, token theft, confused deputy attacks, malicious plug-ins and supply-chain compromise become operational.

OWASP’s agentic AI threat work frames this shift directly. Its Agentic AI threats and mitigations guide describes agentic AI as autonomous systems increasingly enabled by LLMs and generative AI, with expanded scale, capabilities and risks. Its secure MCP server guidance says MCP servers are a critical connection point between AI assistants and external tools, APIs and data sources, and that delegated user permissions, dynamic tools and chained calls increase the potential impact of a single vulnerability.

The intermediate “skill” layer is also exposed. OWASP’s Agentic Skills Top 10 says skills are the execution layer that gives agents real-world impact, defining not only what resources agents can access but how they orchestrate multi-step workflows. It calls this behavior layer vulnerable and under-protected. This is a useful warning because many teams will secure the model API and forget the automation packages wrapped around it.

Tool poisoning is a clear example. An agent may read a tool description to decide when and how to use it. If the description includes malicious instructions, the model may treat those instructions as part of its context. The user may never see them. The tool can become a prompt injection carrier. The same risk appears in web pages, emails, documents, tickets and code comments. Any external content read by the agent may try to instruct the agent.

MCP’s own security guidance warns about attacks such as confused deputy problems in proxy servers, and the specification says tool behavior descriptions should be considered untrusted unless they come from a trusted server. That is a strong architectural clue: the agent runtime should not trust the text it consumes. It should treat tool metadata, external documents and retrieved content as data unless policy says otherwise.

Tool security also has a blast-radius problem. A single vulnerable MCP server can expose many downstream systems if it sits between agents and enterprise APIs. A single malicious skill can be installed by many users if a marketplace lacks publisher verification. A single popular framework pattern can spread an insecure execution model across thousands of applications. Tom’s Hardware reported in April 2026 on OX Security claims that a design choice in MCP SDKs could enable remote code execution across a large AI supply chain, affecting official SDKs and many server instances; the report also said the issue involved STDIO handling and marketplace poisoning vectors.

Those claims should be evaluated through the normal security-disclosure process, but they underline the same structural point: agent security is supply-chain security plus runtime security plus identity security. Treating it as prompt safety is too narrow.

Open standards need secure defaults to avoid becoming open attack rails

The open standards push around MCP, A2A and AGENTS.md is one of the most important developments in agentic AI. It lowers integration costs and gives developers shared primitives. It also creates concentration points. When a protocol becomes widely adopted, its mistakes scale. An unsafe pattern inside a de facto standard does not stay local; it becomes an attack rail across the ecosystem.

This is not a reason to reject standards. It is a reason to harden them early. The web benefited from open protocols, but it also spent decades patching security assumptions that were weak at the start. AI agent protocols have less time. Agents are entering enterprise systems while attackers already understand software supply chains, identity abuse, API exploitation and prompt injection.

MCP’s donation to the Linux Foundation’s Agentic AI Foundation is significant because governance shapes trust. Anthropic said it donated MCP to keep it open-source, community-driven and vendor-neutral, while the MCP project said Linux Foundation stewardship formalized vendor neutrality and long-term independence. Neutral governance helps prevent one vendor from controlling the protocol’s direction. It does not automatically guarantee strong security, but it creates a better venue for public review, shared standards and accountable change.

A2A raises a related issue. Google described A2A as an open protocol that lets agents built by different vendors or frameworks communicate and coordinate actions. The official A2A documentation says it provides a common language for interoperability, originally developed by Google and donated to the Linux Foundation. Agent-to-agent communication will need identity, authentication, capability discovery, task boundaries, non-repudiation, rate limits and abuse controls. Without those, interoperability becomes a way for untrusted agents to find and manipulate each other.

Open standards should adopt a security posture that assumes malicious participants. Discovery is risky. Delegation is risky. Tool metadata is risky. Cross-agent memory sharing is risky. Remote execution is risky. Marketplace installation is risky. The right question for each protocol feature is not “Does this make agents more powerful?” but “Can a hostile agent, hostile server or compromised dependency use this feature to gain authority?”

Secure defaults should be part of protocol design. Token audience binding should not be optional decoration. Human approval should be attached to sensitive actions. Tool calls should be typed by risk. Servers should authenticate clients. Clients should verify server identity. Registries should support signing and reputation. Logs should be structured enough for incident response. Dangerous transports should be sandboxed or narrowed.

Open standards win only if they are safe to adopt at scale. A protocol that spreads quickly and then forces every enterprise to build its own protective wrapper will still be useful, but it will not be trusted. The stronger path is to bake security expectations into the standard before insecure patterns become compatibility requirements.

The open source supply chain now includes models, tools and memory

Software supply-chain security used to focus on source code, dependencies, builds, packages and deployment artifacts. Agentic AI expands the chain. It now includes model weights, prompt templates, tool definitions, MCP servers, skills, vector indexes, memory stores, evaluation datasets, policy files, sandbox images and generated code. Each can affect behavior. Each can be poisoned, replaced, misconfigured or misunderstood.

OpenSSF’s SLSA framework defines standards and controls to prevent tampering, improve integrity and secure packages and infrastructure. It focuses on artifact integrity and trust across the software chain. Sigstore gives developers and consumers tools to sign and verify artifacts, including release files, container images, binaries and SBOMs, using ephemeral signing keys and a tamper-resistant public log. OpenSSF Scorecard assesses open source projects for security risks through automated checks and helps users evaluate the risk posture of dependencies.

These controls are directly relevant to agentic AI, but they need extension. A signed container image is not enough if the agent loads unsigned skills at runtime. A signed package is not enough if the prompt file changes without review. A dependency score is not enough if the tool registry has no publisher identity. A model checksum is not enough if the retrieval index has been poisoned.

SBOMs also need to widen. NSA, CISA and partners said in September 2025 that SBOMs improve visibility across an organization’s supply chain by documenting software dependencies and support risk management around software components. For agents, a useful bill of materials should include not only software libraries but also agent-relevant artifacts: model versions, tool servers, plug-ins, policies, prompts, connectors and data sources. The industry may need an “agent bill of materials” that records the components that can influence autonomous actions.

Open source communities have the advantage of existing culture. Maintainers understand release signing, dependency pinning, vulnerability reports, reproducible builds and package registries. The challenge is applying those habits to AI-native assets that are less familiar. A prompt template may not look like code, but it can change behavior. A tool manifest may not compile, but it can grant dangerous capability. A memory migration may not be a dependency update, but it can expose sensitive context.

Supply-chain security also has a sustainability problem. Many open source maintainers are volunteers. Agentic AI may depend on small packages that suddenly become part of enterprise automation paths. If those packages carry authority, maintainers need support, review and funding. The open source model works when responsibility is matched by resources. It becomes brittle when critical infrastructure is maintained as unpaid after-hours work.

The agent supply chain is the software supply chain with more behavioral inputs. Organizations that already struggle to inventory libraries will struggle more with agents unless standards, tooling and procurement demands mature quickly.

Table stakes for production agent security

Core controls for trustworthy open source agents

Control areaPractical requirementRisk reduced
Agent identityUnique, verifiable agent identity linked to owner, workload and delegated userAnonymous automation, weak audit trails
Scoped authorizationShort-lived, audience-bound tokens and least-privilege tool accessToken misuse, overbroad actions
Tool governanceVerified tool registries, signed plug-ins, capability manifests and version pinningMalicious tools, unsafe updates
Runtime mediationPolicy gates before state-changing actions, sandboxing and kill switchesRogue behavior, uncontrolled execution
ObservabilityStructured traces of prompts, tools, approvals, model versions and outcomesUndiagnosable incidents
Supply-chain assuranceSBOMs, SLSA-style provenance, Sigstore signing and Scorecard checksTampering, dependency compromise

These controls do not make agents risk-free. They make the risk governable. The point is to turn agent behavior from an opaque sequence of model choices into a controlled chain of authorized events.

Autonomy without revocation is not autonomy, it is exposure

Autonomy is often sold as the main benefit of agents. Less manual clicking. Fewer repetitive tasks. Faster execution. More work done across systems. But autonomy without revocation is a security anti-pattern. If an agent can act but cannot be stopped cleanly, paused safely, rolled back where possible or constrained when signals change, it is not mature enough for high-stakes work.

Revocation starts with credentials. Short-lived tokens reduce the damage of leakage. Audience binding reduces token replay across resources. Refresh-token rotation and secure storage matter when agents run on developer laptops, cloud workers and local desktops. MCP’s authorization specification calls for secure token storage, short-lived access tokens and validation that tokens were issued for the relevant MCP server.

Revocation also applies to tasks. A user should be able to cancel an agent’s delegated goal. An administrator should be able to disable a tool or MCP server immediately. A policy engine should be able to block a class of action when risk changes. A security team should be able to quarantine an agent identity. A deployment system should be able to roll back an agent bundle. A registry should be able to mark a plug-in as compromised.

Agents need circuit breakers. If an agent repeats a failed payment call, creates too many tickets, sends unusual data to an external endpoint, or triggers a high-cost cloud operation, the system should interrupt it. This is where site reliability patterns matter. Error budgets, rate limits, anomaly detection and progressive rollout are not glamorous, but they prevent small failures from becoming incidents. Microsoft’s toolkit announcement highlights SLOs, error budgets, circuit breakers, chaos engineering and progressive delivery as agent-system reliability practices.

Autonomy also needs rollback thinking. Some actions can be undone, such as draft creation, branch changes or reversible workflow updates. Others cannot, such as external emails, payments, deletion of records, disclosure of confidential information or public posts. Agents should treat irreversible actions differently. They should require stronger identity, explicit approval, policy checks and logging. A generic “confirm” button is too weak for actions with legal, financial or privacy consequences.

Open source agents running locally create a special revocation problem. A local agent may operate outside enterprise device management. It may use a user’s browser sessions or personal API keys. It may install community extensions. It may sync context to external services. Organizations need endpoint controls and acceptable-use rules for local agents before they become shadow infrastructure.

The test of safe autonomy is not whether the agent can complete a task. It is whether the organization can withdraw authority faster than the agent can misuse it.

Human approval must be designed as a control, not a ritual

Human-in-the-loop is often presented as the answer to agent risk. It helps, but only when it is designed carefully. A human approval step can become a ritual that trains users to click yes. If approvals are too frequent, too vague or too late in the workflow, people stop reading. If approvals appear after the agent has already collected sensitive data or prepared a risky action, the damage may already be partly done.

Good approval design is specific. The user should see what the agent wants to do, which resource it will touch, which identity it will use, whether data leaves the organization, whether the action is reversible, and what policy triggered the approval. A request that says “Agent wants to use email” is weak. A request that says “Finance reconciliation agent wants to send a draft email with customer invoice details to this external address” is meaningful.

Approval should also be risk-based. Low-risk read operations may not need repeated confirmation. Medium-risk internal writes may need approval within a bounded context. High-risk external or irreversible actions should need explicit, step-up approval. Administrative actions should require stronger identity and perhaps dual control. This mirrors how mature systems already treat payments, privileged access and production changes.

Agents complicate approval because they can chain actions. A user may approve an initial task, but the agent may discover that it needs extra data or a different tool. The right design is not a giant consent screen at the start. It is progressive authorization. The agent receives enough authority for the next safe step, then requests more only when needed. Each escalation should be logged and tied to the task.

Human approval also needs organizational ownership. Who is allowed to approve an agent’s access to customer data? Who approves code changes? Who approves autonomous remediation in infrastructure? A developer testing an agent should not be able to approve production actions simply because they launched the workflow. Approval rights must reflect the business process, not the agent interface.

Open source frameworks should make approval flows first-class. Tool definitions should mark read/write/destructive operations. Clients should support approval hooks. Frameworks should expose policy decisions to the UI. Logs should record who approved what and when. Local agents should distinguish user approval from enterprise approval. A human click is not a control unless the system makes the decision understandable, scoped and auditable.

Agent memory is a security boundary

Memory makes agents more useful, but it also creates a new security boundary. Agent memory can store user preferences, task history, retrieved documents, tool outputs, conversation summaries, credentials by mistake, sensitive business context and previous decisions. If memory is shared too broadly or retained too long, it becomes a data leakage path. If memory is poisoned, it becomes a behavioral attack path.

The most obvious risk is confidentiality. A support agent that remembers one customer’s private issue must not expose it to another customer. A coding agent that stores a secret from one repository must not reuse it in another. A personal agent that handles medical, financial or employment data needs retention limits. Memory cannot be treated as harmless context.

The subtler risk is instruction persistence. If an attacker can get malicious guidance into long-term memory, the agent may carry that instruction into future tasks. This is more dangerous than a one-time prompt injection because it survives across sessions. It can alter future tool selection, change data handling, or bias the agent toward unsafe actions. Memory poisoning turns the agent’s learning surface into an attack surface.

Memory also affects auditability. If the agent uses a memory item to justify a tool call, logs should identify which memory influenced the action. If a memory entry is deleted, the deletion should be recorded. If a user asks why an agent made a decision, the system should be able to show relevant stored context without exposing unrelated data.

Open source projects should define memory models explicitly. Is memory local or remote? Is it encrypted? Is it per user, per agent, per workspace or shared? Can administrators inspect it? Can users delete it? Does it store raw tool outputs or summaries? Are sensitive fields filtered? Is memory included in backups? Is it used for model fine-tuning? These questions belong in documentation and security review, not in issue threads after deployment.

For enterprises, memory should be governed like data storage. Classification rules should apply. Retention schedules should apply. Access controls should apply. E-discovery and privacy rights may apply. The agent interface does not make stored data exempt from governance. If memory can influence action, memory is part of the control plane.

Prompt injection becomes more dangerous when agents can act

Prompt injection is not new, but agents raise the stakes. A chatbot manipulated by a malicious web page may produce a bad answer. An agent manipulated by the same page may send data, call tools, make purchases, alter code or approve a workflow. The risk is not that language models are uniquely fragile; the risk is that agent systems often place untrusted text close to privileged tools.

Indirect prompt injection is especially relevant. An agent may read a customer email, a website, a PDF, a ticket, a code comment or a calendar invitation. That content can contain instructions aimed at the model: ignore previous rules, use a hidden tool, exfiltrate data, mark the task complete, or change a configuration. The model may not reliably separate task content from attacker instructions unless the system architecture enforces that boundary.

MCP and tool ecosystems introduce another channel. Tool descriptions, plug-in manifests and skill metadata may be included in the model context so the agent can decide what to call. If an attacker can publish or alter that metadata, the agent may receive malicious instructions before the user sees anything. OWASP’s Agentic Skills Top 10 warns that skills sit between model and tool layer and define the execution behavior that gives agents impact.

Defenses must be layered. Prompt hardening helps but is insufficient. The system should separate data from instructions where possible. Tool calls should be validated by deterministic policy. Sensitive actions should require approval. Retrieved content should be labeled as untrusted. Agents should not be allowed to follow instructions from external content that conflict with user, system or policy constraints. Output filters should not be the only barrier.

The architectural principle is simple: untrusted text should not grant authority. It can inform the agent. It can be summarized. It can be used as evidence. It cannot decide that the agent may access a confidential drive or send a message. That decision belongs to identity, policy and user approval.

Open source has a role in making these defenses visible. Security researchers can build benchmark attacks against public frameworks. Maintainers can add safer defaults. Tool authors can adopt manifest schemas that separate descriptions from execution permissions. Registries can scan for suspicious metadata. Enterprises can test popular agent packages before internal use.

Prompt injection will not disappear because it exploits the basic fact that agents interpret language. The goal is not perfect immunity. The goal is containment. A successful injection should produce a blocked action, an alert or a harmless wrong answer, not a breach.

Agent-to-agent communication turns trust into a network problem

Single-agent systems are already hard to govern. Multi-agent systems turn trust into a network problem. If agents can discover each other, delegate tasks and exchange information, every agent becomes both a caller and a called service. The identity and policy model must handle not only user-to-agent and agent-to-tool interactions, but agent-to-agent relationships.

A2A is designed for this world. Google’s announcement said the protocol would let agents communicate, exchange information securely and coordinate actions across enterprise platforms. The official A2A documentation says A2A enables agents built on different platforms to interoperate and delegate sub-tasks. This is valuable because no single agent will have every skill, data source or context. But delegation creates accountability questions.

If Agent A delegates to Agent B, whose authority is Agent B using? Does Agent B know the original user? Does it know the task purpose? Can it exceed Agent A’s scope? Can it call Agent C? What happens if Agent B is compromised or operated by another company? Can the original user see what happened? Can a policy engine enforce scope across the chain?

Traditional API security has similar problems, but agents add ambiguity because tasks may be expressed in natural language and decomposed dynamically. The solution must include structured delegation. Each handoff should carry identity, scope, purpose, constraints and trace identifiers. Downstream agents should receive attenuated authority, not the full rights of the upstream caller. A delegated subtask should not become a permission laundering mechanism.

Multi-agent systems also need reputation and trust tiers. Not every agent should be callable by every other agent. Internal agents may have strong identity and compliance controls. External agents may be untrusted. Community agents may be experimental. Marketplaces may need verification levels. Policy should reflect these tiers. An internal procurement agent should not accept pricing instructions from an unknown external agent without validation.

Observability becomes harder as agents collaborate. A user may see only the final response, while multiple agents and tools contributed. Logs need correlation across systems. Standards should support trace propagation. Incident responders need to reconstruct the chain: who requested, who delegated, who acted, what data moved, which policies fired and which approvals were granted.

Agent-to-agent interoperability will succeed only if trust travels with the task. Without portable identity and scoped delegation, multi-agent systems will recreate the weakest parts of API sprawl, but faster and with more autonomous behavior.

Local agents change the control equation

Local open source agents are appealing because they give users more control. They can run on a personal machine, use local files, connect to local tools and avoid sending every workflow through a vendor-hosted service. Goose, for example, describes itself as a native open source AI agent with desktop app, CLI and API, running on a user’s machine and connecting to many extensions through MCP.

Local execution supports privacy and sovereignty in some cases. A developer can run an agent against a local repository. A researcher can process files without uploading them to a SaaS workflow tool. A company can evaluate behavior in its own environment. Open code makes it possible to inspect and modify the runtime. These are real advantages.

But local agents also bypass centralized controls. A user may install extensions without security review. The agent may access browser sessions, SSH keys, environment variables, local documents and source code. It may run shell commands. It may connect to external APIs using personal tokens. Endpoint security may not understand agent behavior. Logs may stay on the device or not exist at all.

This makes device posture part of agent trust. A local agent running on a managed corporate laptop should have different permissions from the same agent running on an unmanaged home machine. A local agent inside a sandbox should have different rights from one with full file-system access. Enterprise policy should distinguish local experimentation from production authority.

The right model is not to ban local agents by default. Bans often create shadow use. The better model is approved local agents with clear guardrails: signed builds, managed configuration, extension allowlists, restricted file paths, secrets protection, command approval, network controls and centralized logging for business workflows. Developers need tools that are both useful and governable.

Open source maintainers can support this by documenting safe local deployment patterns. Quickstarts should avoid exposing network services without authentication. Extension installation should warn about capabilities. Dangerous tools should require explicit enablement. Logs should be easy to export. Enterprise configuration should be possible without patching the code.

Local control is not the opposite of governance. It is a different governance problem. The agent may run near the user, but its actions can still affect shared systems.

Closed models and open agent layers will coexist

The debate around open source AI often treats open and closed as opposing camps. Agentic AI will be more mixed. Many organizations will use closed frontier models through APIs, open weight models for local or specialized tasks, open source agent frameworks for orchestration, open protocols for tool access and proprietary enterprise systems for data. Trust will depend on the whole assembly.

This hybrid reality is visible in the ecosystem. OpenAI’s Agents SDK is open source and provider-agnostic according to its GitHub repository, while the OpenAI developer docs describe agents as applications that plan, call tools, collaborate across specialists and keep state for multi-step work. LangChain, LlamaIndex and CrewAI are open frameworks that can connect to different model providers.

A closed model can be part of a trustworthy system if the surrounding controls are strong and the provider gives enough assurance. An open model can be part of an untrustworthy system if the runtime is unsafe, the tools are overprivileged or the deployment lacks monitoring. The clean distinction is not open versus closed. It is controllable versus uncontrollable.

Open source agent layers may become the compromise that enterprises prefer. They allow companies to retain control over orchestration, permissions, logs, policy and integrations while using whatever model fits the task. That weakens model-provider lock-in and gives internal security teams a place to enforce rules. It also lets organizations swap models as costs, performance and regulation change.

Model openness still matters. An open weight or fully open source model may support offline operation, data locality, fine-tuning, inspection and resilience. The Model Openness Framework from LF AI & Data’s Generative AI Commons evaluates which model lifecycle components are publicly released and under what licenses. For regulated or sovereign deployments, that evidence may matter. But agent governance must not stop at the model boundary.

The likely enterprise pattern is tiered. High-risk workflows will use approved models, approved agent runtimes, approved tool servers and strict policies. Low-risk internal productivity tasks may allow more experimentation. Developer tools may use open source frameworks with managed configurations. Customer-facing agents will need stronger audit and safety evidence. The architecture will vary by risk, but identity and logging should be common.

The model is replaceable more often than the control layer. Enterprises that build governance around one model vendor may regret it. Enterprises that build governance around open protocols, identity, policy and observability will have more strategic freedom.

Regulation is moving toward documentation, safety and accountability

Agentic AI is arriving as regulators are already imposing new obligations on AI systems and general-purpose models. The EU AI Act’s general-purpose AI obligations entered into application on August 2, 2025, according to European Commission guidance. The Commission says providers of GPAI models placed on the market after that date must comply, with enforcement powers applying from August 2, 2026 and rules for earlier models by August 2, 2027.

The EU’s GPAI Q&A says providers must document technical information, provide information to authorities and downstream providers, maintain copyright policies and publish summaries of training content. It also says GPAI models with systemic risk have added duties, including systemic risk assessment and mitigation, model evaluations, serious incident documentation and reporting, and adequate cybersecurity protection.

Open source receives special treatment, but not unlimited exemption. The Q&A states that documentation obligations under Article 53(1)(a) and (b) do not apply when a model is released under a free and open-source license and its parameters, architecture information and usage information are publicly available, but the exception does not apply to GPAI models with systemic risk. It also notes that risk mitigations may be easier to circumvent or remove after an open-source release.

Agents complicate compliance because they sit downstream from models and upstream from actions. A model provider may comply with GPAI rules, but an enterprise deploying an agent into hiring, credit, healthcare, education, infrastructure or public services may face separate obligations depending on use. An open source agent framework may not itself be a regulated high-risk AI system, but its deployment can become one when connected to a high-risk process.

NIST’s AI Risk Management Framework gives organizations a broader structure for managing AI risks to individuals, organizations and society. NIST’s Zero Trust Architecture guidance also matters because it moves security from network location toward users, assets and resources, with authentication and authorization before sessions are established. Together, these frameworks point toward lifecycle governance: map risks, measure them, manage them and verify access continuously.

Regulation will push agent builders toward documentation that many teams currently lack. Which agent acted? Which data did it access? Which model version was used? Which tool made the change? Was there human approval? Was the action within scope? Was the incident reported? Was the component patched? These are not philosophical questions. They are evidence questions.

Open source projects that produce compliance-friendly metadata will have an advantage. Enterprises will prefer frameworks that make audit trails, policy evidence, model/tool inventories and risk documentation easier. Projects that ignore governance will still be used by hobbyists, but they will struggle in regulated deployment.

Enterprise adoption will depend on security teams, not demo teams

The first wave of agent demos is built for amazement. A browser agent books a trip. A coding agent fixes a bug. A research agent compiles a brief. A personal assistant schedules a meeting. These demos matter because they show the direction of the interface. They do not prove production readiness.

Enterprise adoption is decided later, often by security, legal, compliance, procurement, platform engineering and operations teams. These groups ask less exciting questions. Where does the data go? How are tokens stored? Can we restrict tools by group? Can we log prompts and tool calls without violating privacy? Who patches the MCP server? What happens if the agent loops? Can we revoke access? Can we prove the agent did not access a restricted record? Can we run it in our cloud? Can we disable a plug-in globally?

Open source helps answer some of those questions. Code can be reviewed. Deployment can be self-hosted. Controls can be added. Integrations can be inspected. Community issues can reveal patterns. But enterprises also need maturity: release discipline, vulnerability disclosure, stable APIs, documentation, support paths and clear governance. A fast-moving repository with unclear maintainership is a hard sell for high-risk workflows.

The enterprise path will likely run through platform teams. Rather than every department installing its own agents, companies will build approved agent platforms: model gateways, MCP gateways, tool registries, policy engines, identity integrations, observability pipelines, sandboxed runtimes and evaluation harnesses. Business teams will build agents on top of that platform. This mirrors cloud adoption: developers get speed, but within guardrails.

Shadow AI will be a major forcing function. Employees already use unapproved AI tools because they reduce friction. Agents will make shadow use more dangerous because they request deeper access. A banned tool may move to a local open source agent. A developer may connect it to GitHub, Slack, Jira or a database with personal tokens. The organization may discover it only after data leaves or code changes. Reuters’ OpenClaw report shows how quickly popular open source agents can become a public security concern when configuration and access control lag adoption.

The right enterprise answer is not blanket fear. It is a secure alternative that people want to use. Approved agents must be capable, fast and integrated. If the official path is worse than the unofficial path, people will route around it. Governance works only when it is paired with usability.

The buying center for agentic AI will shift from innovation teams to risk owners. Vendors and open source projects that make security teams comfortable will define the next phase.

Developers need guardrails that do not destroy speed

Developers are the early adopters of open source agents because their workflows are already tool-rich. Coding agents can inspect files, run tests, edit code, call package managers, read documentation and open pull requests. AGENTS.md has spread because repository-specific instructions matter when agents operate inside codebases. The Linux Foundation announcement said AGENTS.md had been adopted by more than 60,000 open source projects and frameworks, and OpenAI’s own announcement echoed that figure.

Coding agents expose the trust problem in miniature. They need enough file-system and command access to be useful. They may run tests, install packages and modify code. They may read secrets accidentally stored in repositories. They may follow malicious instructions hidden in dependencies, documentation or issues. They may generate insecure code. They may commit changes a human reviewer does not understand.

The answer is not to remove their power. A powerless coding agent is just autocomplete with extra steps. The answer is constrained power. Agents should run in isolated workspaces. Secrets should be unavailable by default. Network access should be restricted where possible. Commands should be approved by risk. Dependency installation should be logged. Generated changes should pass tests and security checks before merge. Pull requests should identify agent involvement and provide trace links.

Open source developer tooling can make this normal. Templates can include safe command lists. AGENTS.md can state test commands and forbidden operations. CI can detect agent-generated changes and require review. Repositories can expose task-specific sandboxes. Package managers can verify signatures and provenance. Security scanners can run automatically on agent commits.

The same design principle applies outside software development. Sales agents, finance agents, legal agents and operations agents all need domain guardrails that let them work quickly without crossing boundaries. A finance agent should reconcile invoices but not approve refunds above a threshold. A legal agent should summarize contracts but not send external redlines without review. An operations agent should suggest remediation before changing production.

Developers will accept guardrails if they are predictable and low-friction. They will reject controls that break every workflow without explanation. Policy engines should return clear reasons. Approval flows should be integrated into the tools developers already use. Logs should help debugging, not only compliance. Security that explains itself becomes part of the developer experience. Security that only blocks becomes a bypass target.

Table-driven comparison of the agent trust stack

Open source agent infrastructure and the trust questions it raises

LayerOpen source benefitTrust question
Model or model accessPortability, local deployment, inspection where artifacts are openIs the model suitable for the task and risk level?
Agent frameworkInspectable orchestration and easier customizationHow are tool calls, memory and approvals controlled?
Tool protocolShared connectors and less integration lock-inAre authentication, authorization and consent enforced?
Agent-to-agent protocolCross-vendor delegation and workflow compositionDoes scoped identity travel across delegation chains?
Skills and plug-insCommunity extension and faster capability growthAre publishers verified and capabilities bounded?
Runtime policyReusable control layer across frameworksCan actions be blocked before execution?
ObservabilityShared traces, debugging and audit evidenceCan incidents be reconstructed end to end?

This comparison shows the central trade-off. Open source improves inspection and portability, but each layer needs an explicit trust mechanism. Openness gives teams the chance to build control; it does not create control by default.

Market power will move to the control plane

The first agentic AI winners may be model providers and application vendors. The longer-term winners may be control-plane providers. Whoever manages identity, tool access, policy, memory, logging and interoperability will sit in the critical path of agentic work. That layer decides which agents are trusted enough to act.

Open source foundations understand this. The Agentic AI Foundation is not only a place for code. It is a governance venue for protocols and conventions that may become the connective tissue of agentic systems. The Linux Foundation announcement framed MCP, goose and AGENTS.md as groundwork for a shared ecosystem of tools and standards. Wired reported that OpenAI, Anthropic and Block were transferring widely used technologies to the foundation so others could contribute to their development under the Linux Foundation umbrella.

The strategic logic is clear. If agent standards become widely adopted, the companies that helped seed them gain influence, even if they do not own the protocols outright. Open governance reduces lock-in fears and encourages adoption. Adoption creates default patterns. Default patterns shape markets. The web’s history shows that open standards can produce huge private businesses around shared infrastructure.

Enterprises should welcome open standards but avoid passive dependence. A standard can be open and still reflect the assumptions of its largest contributors. Security teams should participate in governance, file issues, review proposals and push for controls that match real deployment needs. Regulators and public-interest groups should watch how agent standards handle identity, consent, privacy and auditability.

The control plane will also create new vendor categories. Agent gateways will broker tool access. MCP gateways will mediate servers. Registries will verify skills and plug-ins. Observability platforms will trace agent behavior. Identity providers will extend OAuth and workload identity patterns to agents. Policy engines will block actions. Sandboxes will isolate execution. Evaluation platforms will test task reliability and safety. Some will be open source, some commercial, many hybrid.

The economic prize is not merely building smarter agents. It is becoming the trusted layer through which agents act. That is why open source and security are now business strategy, not only engineering practice.

Open governance is a security feature when it has teeth

Open governance is often discussed in terms of fairness and neutrality. For agentic AI, it is also a security feature. A protocol or framework controlled by one vendor may move quickly, but its security priorities can be shaped by product deadlines. A neutral foundation does not remove politics, yet it gives competitors, users, researchers and enterprises a formal way to influence direction.

Anthropic’s MCP donation statement emphasizes vendor neutrality and transparent decision-making, while noting that the maintainer structure remains based on community input. The MCP project’s own announcement says individual projects under AAIF maintain technical autonomy while the foundation provides neutral infrastructure. That balance matters. Too much central control can slow innovation. Too little shared governance can leave standards fragmented and unsafe.

But governance needs teeth. A foundation should not become a logo under which insecure patterns spread. It should support security working groups, vulnerability disclosure, reference implementations, compatibility tests, secure default profiles and independent review. It should define what it means for a server, client, skill or agent to be compliant with a security baseline. It should make unsafe deviations visible.

Open governance also needs maintainer sustainability. Security work is often unglamorous: triage, patching, documentation, threat modeling, test harnesses, release signing and deprecation. If the open agent ecosystem depends on unpaid maintainers, the weakest projects will become hidden infrastructure risk. Foundations and companies that benefit from open standards should fund maintainers, security audits and tooling.

The community also needs norms for breaking changes. Sometimes security requires changing behavior that developers rely on. A permissive STDIO transport, broad dynamic registration or lax token validation may be convenient until attackers exploit it. If compatibility always wins, security loses. Standards bodies need processes for urgent hardening, migration guidance and deprecation timelines.

Open governance should include adversarial voices. Red-teamers, abuse researchers, privacy experts, open source maintainers, enterprise defenders and smaller vendors see different failure modes. Agentic AI is too consequential to be steered only by model labs and cloud platforms. A trustworthy open standard is one that can absorb criticism before attackers turn that criticism into an incident.

Evaluation must test actions, not just answers

Agent evaluation is harder than chatbot evaluation because the output is not always text. The agent may change a file, call an API, update a record, send a message, make a plan, ask for approval or delegate a task. Evaluation must test the whole loop: interpretation, tool choice, permission handling, recovery, memory use, refusal behavior, logging and final result.

The 2025 AI Agent Index documented 30 deployed agentic systems and found varied transparency levels among developers, with many sharing little information about safety, evaluations and societal impacts. That is a serious gap. If agents are acting in professional and personal tasks with limited human involvement, evaluation cannot remain a private marketing claim.

Action-based evaluation should include task success, but success is not enough. An agent that completes a task by using unauthorized data should fail. An agent that writes correct code but exposes secrets should fail. An agent that finishes quickly but ignores approval policy should fail. An agent that produces a good answer but cannot produce an audit trail should fail in regulated settings.

Security evaluation should include prompt injection, malicious tools, memory poisoning, token theft, confused deputy scenarios, sandbox escape attempts, dependency compromise, excessive delegation, denial-of-wallet attacks and policy bypass. It should test both the model and the runtime. A model may refuse a dangerous instruction in chat but follow it when hidden inside a tool description. A framework may pass unit tests but fail under chained tool calls.

Open source can improve evaluation by making test suites portable. A standard set of adversarial agent scenarios could be run against frameworks, MCP clients, tool servers and skills. Projects could publish results. Enterprises could require passing baselines before internal deployment. Security researchers could add cases as new attacks appear. This would move agent safety away from anecdotes and toward reproducible evidence.

Evaluation also needs operational feedback. Agents behave differently under real data, real users and real failures. Logs should feed into evaluation loops. Near misses should be recorded. Human overrides should be analyzed. Tool errors should be measured. Approval friction should be tracked. A safe deployment is not a one-time launch decision; it is a continuous measurement process.

The benchmark that matters is not whether an agent can solve a task in a clean sandbox. It is whether it can stay within authority when the environment is messy, adversarial and incomplete.

Audit logs are the memory of accountability

When agents act across systems, audit logs become the memory of accountability. Without them, organizations cannot prove what happened. They cannot distinguish user action from agent action. They cannot investigate incidents, satisfy regulators, debug failures or improve policies. A system that cannot explain its actions after the fact should not receive broad autonomy.

Good agent logs need more than timestamps and API endpoints. They should record the agent identity, user delegation, model version, prompt or summarized prompt where privacy requires, tool called, parameters, data classification, policy decision, approval event, external destination, result, error and trace ID. For multi-agent workflows, logs should record delegation chains. For memory use, logs should identify relevant memory references.

Privacy matters. Logging everything verbatim can create new risk. Prompts may contain personal data, trade secrets or regulated information. Tool outputs may include confidential records. The logging architecture must support redaction, hashing, access control and retention policies. Security teams need enough evidence to investigate without building a second uncontrolled data lake of sensitive AI context.

Open source agent frameworks should expose logging hooks at the right level. Developers should not need to monkey-patch tool calls to capture traces. The framework should treat observability as a first-class capability. OpenTelemetry-style patterns may become important, but agent traces need AI-specific fields. A normal distributed trace can show service calls; an agent trace must show intent, policy and tool reasoning where available.

Audit logs also affect user trust. Users should be able to see what an agent did on their behalf. In business workflows, managers should be able to review agent actions. In regulated settings, auditors should be able to reconstruct decisions. Logs should not be reserved only for engineers.

The strongest argument for open source in this area is inspectability. If a vendor says its agent is logging safely, a customer must trust the claim. If an open runtime implements logging, customers can verify what is captured and where it goes. They can adapt it to their policies. For agents, accountability is not a statement. It is a trace.

Sandboxing is the practical boundary between suggestion and execution

Sandboxing is where agentic ambition meets operating-system reality. A model can suggest a command. An agent can run it. The difference is a boundary. If that boundary is weak, every tool call becomes more dangerous. If it is strong, agents can experiment, test and repair within controlled limits.

Coding agents show why sandboxes matter. They may install dependencies, run tests, execute scripts, inspect files and start services. In an unsandboxed environment, a malicious package or prompt injection can reach the user’s machine, secrets or network. In a sandbox, the damage can be constrained. OpenAI’s developer documentation says sandbox agents are available in the Python Agents SDK for cases where an agent needs a container-based environment with files, commands, packages, ports, snapshots and memory.

Sandboxing should not be limited to coding. Browser agents should isolate sessions. Data agents should run queries with read-only credentials unless approved. Infrastructure agents should test changes in staging before production. Document agents should process files in environments that cannot freely exfiltrate data. Local agents should restrict file-system paths and shell commands.

A sandbox must also be understandable. Users and administrators should know what is inside and outside the boundary. Can the agent reach the internet? Can it read home directories? Can it mount cloud credentials? Can it start long-running processes? Can it communicate with other agents? Can it persist memory after the task? These settings should be explicit.

Open source can make sandbox mechanisms portable. Container profiles, policy templates, seccomp rules, network restrictions and file permissions can be shared. Frameworks can support default sandbox adapters. Enterprises can contribute hardened profiles. Security researchers can test escape paths. A proprietary sandbox may be strong, but open review often improves confidence.

Sandboxing does not replace identity and authorization. A sandboxed agent with a powerful token can still misuse the token through allowed network calls. A sandboxed browser with an authenticated session can still click dangerous buttons. The sandbox contains execution; policy contains authority. Both are needed.

The safest agent is one that can do useful work inside a small room, then ask before opening a door.

Open source agent registries need trust signals

Agents become more capable through extensions. MCP servers, skills, plug-ins and tool packages let the base agent reach new systems. This creates a marketplace dynamic. Developers want easy installation. Users want useful capabilities. Attackers want distribution. Registries therefore become trust infrastructure.

A registry without identity is a risk. Who published the extension? Is the publisher verified? Has the package changed ownership? Are releases signed? Is the source linked? Are permissions declared? Does the package request network access, file access, shell execution or credentials? Has it been scanned? Are there known vulnerabilities? Can versions be pinned? Can an enterprise allowlist or blocklist it?

OWASP’s Agentic Skills Top 10 checklist includes installing skills only from verified publishers, enabling automated scanning, reviewing skill permissions and pinning skill versions to prevent malicious updates. Those are standard software-supply-chain practices, but agent extensions make them more urgent because a skill may shape autonomous workflows, not only provide library functions.

Sigstore and SLSA-style provenance can help. Signing gives consumers a way to verify that an artifact came from an expected source. Provenance gives information about how it was built. Scorecards can expose repository hygiene. SBOMs can reveal dependencies. These signals should be displayed in agent registries, not hidden in advanced tabs.

Registries also need semantic capability labels. A package manager knows dependencies; an agent registry must know authority. A calendar plug-in that reads events is different from one that sends invitations. A GitHub plug-in that reads issues is different from one that merges pull requests. A browser extension that fetches pages is different from one that submits forms. Capability declarations should be machine-readable and enforceable by clients.

Enterprise registries may become standard. Companies will mirror approved MCP servers and skills, sign internal packages, scan updates and restrict installation from public sources. This is already common with container images and language packages. Agent ecosystems should expect the same.

The open agent registry that wins enterprise trust will look less like a toy marketplace and more like a secure software distribution system.

Data control will decide whether agents leave the pilot phase

Agents need data to be useful. They need customer histories, policies, contracts, code, metrics, emails, tickets, calendars and documents. They also need to know which data they should not use. Data control is therefore central to agent trust.

Retrieval-augmented generation already forced organizations to think about access-aware search. Agents raise the difficulty because they do not only retrieve; they act. A document agent may summarize a policy, then update a ticket. A sales agent may read CRM notes, then send a follow-up. A finance agent may compare invoice records, then trigger an approval. Each step may touch data with different classifications.

The simplest rule is that agents should not see data the delegated user could not see. But that is not enough. Some workflows require agent-specific restrictions narrower than the user’s rights. A manager may access salary data, but a scheduling agent acting for that manager may not need it. A developer may access production logs, but a code-formatting agent does not. Delegation should reduce authority to task purpose, not mirror all user privileges.

Data-loss prevention needs agent awareness. Traditional DLP may see an API call or file movement, but it may not know the agent’s goal or the source prompt. Agent-aware DLP should inspect tool outputs, external destinations, data classification and policy context. It should block or redact sensitive data before it enters model context when possible.

Open source tools can help organizations keep data local or self-hosted. They can also make data leakage easier if installed without controls. A local open source agent connected to a personal cloud drive and corporate email is a data-governance problem. The difference is configuration, identity and policy.

Data control also includes retention and training. Users need to know whether agent conversations, tool outputs and memory are used to improve models. Enterprises need contractual and technical controls. Open source deployments may avoid some vendor-retention issues, but they still must handle local logs, memory stores and backups.

No agent should receive data merely because it can process it. It should receive data because the task, identity, policy and user authority justify it.

Reliability failures become security failures when agents have tools

Agent reliability is often treated as a product-quality issue. Did the agent complete the task? Did it loop? Did it misunderstand? Did it ask too many questions? Once agents have tools, reliability failures become security and operations failures. A loop can create thousands of API calls. A misunderstanding can update the wrong records. A retry storm can trigger rate limits or costs. A partial failure can leave systems inconsistent.

Multi-step workflows are especially fragile. An agent might create a customer record, fail to attach a contract, retry with different data, then send a message anyway. A human employee would notice the inconsistency. An agent may not. Distributed systems solved similar problems with transactions, idempotency, sagas and compensating actions. Agent frameworks need those patterns.

Microsoft’s Agent Governance Toolkit announcement mentions saga orchestration for multi-step transactions, dynamic execution rings and kill switches. These ideas belong in agent infrastructure. A multi-step agent should know which actions are provisional, which are committed, which can be reversed and which require human intervention after failure.

Reliability also depends on deterministic boundaries. The model may decide flexibly, but policies should execute predictably. A policy gate should not hallucinate. A permission check should not be probabilistic. A circuit breaker should fire based on defined signals. Agents need a hybrid architecture: flexible reasoning inside deterministic control.

Open source frameworks should expose failure states clearly. Tool errors, model refusals, timeout retries, partial success and approval denials should be typed. Developers should be able to build recovery logic. Observability should show where a workflow broke. Silent fallback is dangerous when agents act.

A production agent is not successful because it works once. It is successful because it fails safely many times.

The business case for open source agents is control

The business case for open source AI agents is often framed as cost. Open source may reduce licensing fees or API dependence. But the stronger business case is control. Companies want to decide where agents run, which models they use, which data they see, which tools they call and how their actions are governed.

Control has direct economic value. It reduces vendor lock-in. It supports negotiating power. It lets companies meet data residency needs. It lets regulated industries adapt systems to internal controls. It lets platform teams build shared infrastructure rather than buying isolated agent products for every department. It lets security teams inspect and modify weak points.

Open source also supports competition. If agent protocols and frameworks are open, smaller vendors can build specialized tools without waiting for closed platform partnerships. Enterprises can integrate best-of-breed systems. Developers can contribute fixes. Researchers can evaluate safety. The ecosystem can move faster than a single vendor roadmap.

But control has costs. Self-hosting requires expertise. Forking creates maintenance burden. Open source dependencies need patching. Security reviews take time. Enterprises may need to fund or support projects they depend on. The business case must include operational responsibility, not only license savings.

The strategic value is resilience. If one model provider changes terms, an open agent layer can route to another. If one tool protocol evolves, a standards-based architecture can adapt. If a vulnerability appears, an organization with source access and internal expertise may patch faster. If regulation changes, configurable policy and logs reduce rework.

Open source agents will win in serious environments when they give organizations more control than closed platforms without pushing unreasonable security work onto every adopter. That requires shared tooling, strong defaults and mature governance.

The biggest risk is not rogue superintelligence, it is ordinary misdelegation

Public debate often jumps to extreme scenarios. Agentic AI does raise long-term safety questions, especially as systems become more capable. But the near-term enterprise risk is more mundane: ordinary misdelegation. A user gives an agent too much access. A developer installs an unsafe tool. A workflow lacks approval. A token is reused across resources. A prompt injection crosses a weak boundary. A registry package is malicious. A log is missing when something goes wrong.

These failures are not exotic. They are versions of problems security teams already know. What changes is speed and ambiguity. Agents can chain tools quickly. They can operate in places designed for humans. They can make mistakes that look like legitimate user actions. They can act through credentials that existing systems already trust.

That is why zero trust maps so well to agents. NIST’s zero trust guidance says trust should not be implicit based on network location or ownership, and that authentication and authorization should happen before sessions to enterprise resources. Agents should be treated the same way. They should not be trusted because they run inside the network, because a user launched them, because their repository is popular or because their model sounds competent.

Misdelegation is also a design problem. Products often ask users to connect everything because broad access makes demos smoother. The safer product asks what task the user is trying to accomplish and grants only the necessary capabilities. It makes escalation possible but visible. It treats autonomy as something earned through evidence, not assumed at onboarding.

Open source projects can reduce misdelegation by making examples narrow. A demo that uses a read-only token teaches one habit. A demo that uses an admin token teaches another. A template that separates read, write and destructive tools teaches developers to think in permissions. Documentation that says “never pass through tokens” helps only if the code makes the safe pattern easy.

The agent future will be shaped less by dramatic failures than by thousands of permission decisions that either compound safely or compound dangerously.

Open source communities must treat security as product quality

Security cannot remain a separate checklist for open source agent projects. For agents, security is product quality. An agent that cannot authenticate safely is incomplete. A framework that lacks approval hooks is incomplete. A tool protocol that cannot bind tokens to audiences is incomplete. A registry without publisher verification is incomplete. A local agent that can run commands without clear controls is incomplete.

This may feel demanding for open source maintainers, but it reflects the power of the software. A small CLI tool has one risk profile. A local agent that edits files, runs commands and connects to SaaS tools has another. A framework used by enterprises to orchestrate multi-agent workflows has another. As open source agent projects become infrastructure, expectations rise.

OpenSSF’s work is relevant because it gives communities shared language and tools. Scorecard checks, SLSA provenance, Sigstore signing and SBOM practices can become baseline expectations. Agent projects should not wait for regulators or customers to demand these controls. Early adoption will make later enterprise acceptance easier.

Security also needs documentation that developers read. Threat models should be short enough to use. Secure examples should be maintained. Dangerous patterns should be called out clearly. Vulnerability reporting should be easy. Release notes should identify security changes. Compatibility breaks for safety should be explained.

The community should value boring work. Tests, fuzzing, dependency updates, signed releases, docs and incident response are less visible than new agent abilities, but they decide trust. Microsoft’s toolkit announcement notes investment in tests, fuzzing, provenance and Scorecard tracking. Whether or not that toolkit becomes widely adopted, the pattern is right: security posture should be visible as part of the project.

In agentic AI, the open source project with fewer features but stronger controls may be the more advanced project.

The trust architecture that agentic AI needs

A mature trust architecture for open source agents starts with inventory. Organizations must know which agents exist, where they run, who owns them, which models they use, which tools they can call, which data they can access and which identities they use. Without inventory, every other control is partial.

The second layer is identity. Every agent should have a unique identity. Every action should carry the delegated user or business context. Every workload should be attested where possible. Every external tool should verify the caller. Existing IAM systems should be extended, not bypassed. OpenID Connect, OAuth resource indicators and SPIFFE-like workload identity patterns provide building blocks, but agent-specific claims and delegation chains will need standardization.

The third layer is authorization. Agents need scoped, short-lived access. Capabilities should be explicit. Read, write and destructive actions should be separated. Tokens should be audience-bound. Tool calls should pass through policy checks. Human approval should attach to high-risk actions.

The fourth layer is runtime containment. Agents should run in sandboxes appropriate to their task. Local file access, network access and command execution should be restricted. Tool servers should validate inputs. Memory should be partitioned. Multi-step actions should use transaction patterns and circuit breakers.

The fifth layer is supply-chain assurance. Agent components should be signed, versioned, scanned and traceable. Registries should verify publishers. SBOMs or agent bills of materials should record components. Build provenance should be available for high-risk deployments.

The sixth layer is observability. Prompts, tool calls, policies, approvals, model versions, memory references and outcomes should be traceable with privacy controls. Logs should support debugging, incident response and compliance.

The final layer is governance. Owners should review risk, approve deployment, monitor performance, handle incidents and retire agents when they are no longer needed. Open source communities should support governance with standards, documentation and secure defaults.

This architecture is not a luxury for later. It is the price of letting agents act.

The agentic AI future will be less closed than many expected

A year ago, it was plausible to imagine agentic AI becoming another closed platform race: each major model vendor offering its own agent environment, connector ecosystem and marketplace. That still may happen at the application layer. But the infrastructure layer is moving toward open protocols and open source building blocks faster than expected.

The formation of the Agentic AI Foundation under the Linux Foundation is part of that movement. MCP, AGENTS.md and goose are not the whole agent stack, but they represent connection, instruction and execution layers. A2A’s move from Google-originated protocol to Linux Foundation donation points in the same direction for agent-to-agent interoperability.

The reason is practical. No single vendor can connect every enterprise system, satisfy every deployment model, support every framework and win trust across all rivals. Open protocols reduce friction for everyone. They let closed model providers participate without owning the entire stack. They let open source frameworks remain relevant. They let enterprises avoid betting all agent infrastructure on one company.

But openness will not remove power struggles. Standards can be influenced by their largest contributors. Compliance profiles can favor certain architectures. Marketplaces can centralize distribution. Cloud providers can make their managed versions easier than self-hosting. The open agent stack will still have politics and business strategy.

The important question is whether the open layer remains genuinely inspectable, forkable and governable. If open standards become thin wrappers around proprietary control points, trust will weaken. If foundations maintain neutral governance and security discipline, open agent infrastructure could become the safest path for broad adoption.

Agentic AI may follow the cloud-native pattern: open infrastructure underneath, commercial platforms around it, and enterprise control built through policy, identity and observability.

Security incidents will shape standards faster than white papers

Standards often mature after incidents. Agentic AI will be no different. The first high-profile failures will force changes in defaults, procurement, regulation and user behavior. Some incidents will involve data leakage. Others will involve tool misuse, malicious extensions, unauthorized transactions, prompt injection, agent impersonation, model-driven code compromise or uncontrolled multi-agent loops.

The Reuters report on OpenClaw and the Tom’s Hardware report on MCP security claims show how quickly agent security stories move from technical communities into mainstream news. Even when details are disputed or context-specific, these reports affect trust. Regulators notice. Enterprises slow deployments. Vendors add controls. Foundations adjust priorities.

Security teams should not wait for a perfect taxonomy. They can act now by applying known controls: least privilege, identity, token scoping, sandboxing, approval, logging, dependency scanning, signing, provenance, red teaming and incident response. The novelty is in the composition, not in every individual defense.

Open source projects should prepare for vulnerability handling at scale. That means security policies, private reporting channels, coordinated disclosure, CVE processes where appropriate, patched releases, migration guidance and public postmortems. A project that becomes central to agent infrastructure will face scrutiny. Silence or defensiveness will damage trust more than the vulnerability itself.

Incident data should feed standards. If a prompt injection bypasses a tool description model, update the schema. If a registry attack works, strengthen publisher verification. If token passthrough causes cross-resource access, tighten authorization requirements. If local agents expose services, change defaults. Standards should learn from abuse quickly.

The safest agent ecosystem will be the one that treats incidents as design feedback, not public-relations threats.

Developers, enterprises and policymakers need a shared vocabulary

Agentic AI risk is hard to discuss because different groups use different words. Developers talk about tools, callbacks, prompts, MCP servers, skills and traces. Security teams talk about identity, least privilege, secrets, audit and threat models. Policymakers talk about transparency, accountability, systemic risk and human oversight. Business leaders talk about productivity, cost and control. These conversations often pass each other.

Open source standards can create a shared vocabulary. MCP names the tool-connection layer. A2A names agent-to-agent communication. AGENTS.md names repository instructions for coding agents. SLSA names provenance levels. Sigstore names artifact signing. SBOM names component visibility. SPIFFE names workload identity. NIST names zero trust and AI risk management. OWASP names agentic threat categories.

A shared vocabulary makes governance practical. A policy that says “all agent actions must be logged” is too vague. A policy that says “all state-changing MCP tool calls must include agent identity, delegated user identity, resource audience, policy decision and trace ID” can be implemented. A procurement rule that says “agent platforms must be safe” is vague. A rule requiring signed plug-ins, scoped tokens, sandboxing and audit export is actionable.

Policymakers also need technical specificity. Broad AI rules may not capture agent-specific risks. Agents are defined by action, delegation, tool access and autonomy, not only by model capability. Regulation that ignores tool layers may miss where harm occurs. Regulation that treats open source as a single category may miss the difference between open weights, open runtimes, open protocols and open deployments.

The open source community can help by publishing reference architectures and control mappings. OWASP, OpenSSF, NIST-aligned profiles and foundation projects can translate risk into implementation patterns. Enterprises can contribute requirements from real deployments. Researchers can test whether controls work. This is how agent governance becomes less abstract.

Agentic AI needs language precise enough for engineers and broad enough for law, procurement and public trust.

The next competitive edge is verifiable restraint

For much of the AI boom, competitive advantage meant more capability: bigger models, longer context, better reasoning, faster coding, richer multimodal interfaces. In agentic AI, the next competitive edge will be restraint that can be verified. The winning agents will not be those that claim they can do anything. They will be those that prove they will do only what they are allowed to do.

Verifiable restraint is not weakness. It is what lets organizations grant more autonomy. A bank will not let an agent handle customer workflows because it writes fluent text. It may let an agent act if the system proves identity, enforces policy, limits access, logs actions, supports review and passes security tests. A hospital, government agency, law firm or cloud operator will reason the same way.

Open source has a special role because restraint can be inspected. A policy engine can be reviewed. A sandbox profile can be tested. A token flow can be audited. A signing process can be verified. A trace format can be integrated. An open implementation does not eliminate trust questions, but it gives adopters evidence.

This is where the agentic market may surprise people. The most trusted projects may not be the most spectacular demos. They may be the boring layers: gateways, registries, identity brokers, policy engines, tracing libraries, sandbox managers and secure templates. These layers decide whether powerful agents are deployable.

Open source communities should embrace that. Infrastructure projects rarely win attention at first, but they shape ecosystems. The Linux kernel, Kubernetes, OpenSSL, Envoy, SPIFFE and many other projects became critical because they handled trust, execution and connectivity. Agentic AI will need its own equivalents.

The future agent will be judged not only by what it can do, but by what it can prove it did not do.

The path from open experimentation to trusted autonomy

Agentic AI is still in the transition from experimentation to infrastructure. The experimentation phase values speed, demos and broad capability. The infrastructure phase values stability, security and governance. Open source is present in both, but its responsibilities change. A prototype can tolerate rough edges. Infrastructure cannot.

The path forward starts with acknowledging that agents are actors in systems. They are not human, but they act under delegated authority. They are not ordinary APIs, but they call APIs. They are not employees, but they need identity and accountability. They are not just software packages, but they are built from software packages. This hybrid nature is why old controls must be recomposed.

The agentic AI stack should become more modular and more governed. Open protocols should connect tools and agents. Open frameworks should support orchestration. Open security projects should provide identity, signing, policy and tracing. Enterprises should assemble these pieces under risk-based controls. Model providers should compete on capability while accepting that the control layer may be open and shared.

There is still a risk that convenience wins too early. Developers may install agents faster than organizations can govern them. Vendors may hide risky defaults behind smooth onboarding. Users may approve broad access because the first task works. Open source communities may prioritize growth over security. Regulators may react after incidents with rules that fail to distinguish responsible open infrastructure from reckless deployment.

The better path is available now. Treat agent identity as mandatory. Treat tool access as privileged. Treat memory as sensitive. Treat registries as supply-chain infrastructure. Treat logs as accountability. Treat open governance as security. Treat human approval as a designed control. Treat evaluation as action testing. Treat autonomy as something granted in stages.

Open source AI agents can give users and organizations more control than closed agent platforms, but only if the ecosystem builds security into the foundation. The agentic future will not be decided by autonomy alone. It will be decided by trust that survives contact with real systems.

Questions readers are asking about open source AI agents

What is an open source AI agent?

An open source AI agent is an agentic system whose relevant code, and in stronger cases its model and supporting artifacts, are available under terms that allow use, study, modification and sharing. In practice, many “open source agents” combine open orchestration code with closed or open models.

Why are AI agents riskier than chatbots?

Chatbots mainly produce text. Agents can call tools, access data, write files, trigger workflows and act across systems. That means a mistake or attack can create operational consequences, not only a bad answer.

Why does identity matter for AI agents?

Identity tells systems who or what is acting. For agents, identity should show the agent, the delegated user, the workload and the scope of authority. Without it, audit trails and access control become weak.

Does open source automatically make AI agents safer?

No. Open source improves inspection and community review, but safety also requires secure defaults, scoped permissions, signed components, strong governance, logging, sandboxing and patching.

What is MCP in agentic AI?

The Model Context Protocol is an open protocol for connecting AI applications and agents to tools, data sources and external systems. It is becoming a common layer for agent-to-tool communication.

What is A2A in agentic AI?

Agent2Agent is an open protocol for communication and task delegation between agents built with different frameworks or by different vendors. It focuses on agent-to-agent interoperability.

What is AGENTS.md?

AGENTS.md is an open format that gives coding agents a predictable place to find repository-specific instructions such as setup commands, tests and coding conventions.

What is the biggest security risk for open source agents?

The biggest near-term risk is overbroad authority: agents receiving access to tools, data or credentials beyond what a task requires. Prompt injection, malicious plug-ins and weak identity become more dangerous when permissions are too broad.

How should companies control agent tool access?

They should use least privilege, short-lived tokens, audience-bound access, explicit capability declarations, policy checks before tool calls, human approval for risky actions and detailed logs.

What is prompt injection in agentic AI?

Prompt injection occurs when malicious text tries to override an agent’s instructions. In agents, this can happen through websites, emails, documents, tool descriptions or plug-in metadata.

Why is agent memory sensitive?

Memory can store private data, task history and instructions that influence future behavior. If memory is shared, retained too long or poisoned, it can leak data or distort agent actions.

Do agents need human approval?

Yes for high-risk actions, but approval must be specific, scoped and understandable. A vague approval prompt does not provide meaningful control.

What is an agent bill of materials?

It is an emerging idea similar to an SBOM, but focused on agent systems. It would list models, tools, prompts, policies, plug-ins, MCP servers, memory components and other artifacts that influence agent behavior.

How do SLSA and Sigstore relate to AI agents?

SLSA supports build provenance and supply-chain integrity. Sigstore supports artifact signing and verification. Both can help prove that agent components came from trusted sources and were not tampered with.

Can local open source agents be safer than cloud agents?

They can improve control and data locality, but they can also bypass enterprise monitoring if unmanaged. Safety depends on sandboxing, configuration, identity, extension controls and logging.

How does zero trust apply to agentic AI?

Zero trust means agents should not receive implicit trust based on network location, device ownership or user launch. Every action should be authenticated, authorized and scoped to the resource.

Will regulation affect open source agents?

Yes. Regulations such as the EU AI Act affect general-purpose AI models and high-risk AI deployments. Open source may receive some exemptions, but systemic-risk models and high-risk uses still face obligations.

What should enterprises ask before deploying agents?

They should ask which systems the agent can access, which identity it uses, how permissions are scoped, where logs go, how tools are verified, how memory is governed, how incidents are handled and how access can be revoked.

What will decide the future of agentic AI?

The future will be decided by whether agents can be trusted to act across real systems. That trust depends on identity, security, transparency, governance and control, not only model intelligence.

Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

Open source AI agents will live or die by trust
Open source AI agents will live or die by trust

This article is an original analysis supported by the sources cited below

Linux Foundation announces the formation of the Agentic AI Foundation
The Linux Foundation announcement describing the creation of AAIF and the founding contributions of MCP, goose and AGENTS.md.

Agentic AI Foundation
The official AAIF site describing its open source agentic AI projects and foundation structure.

Anthropic donates the Model Context Protocol to the Linux Foundation
Anthropic’s announcement explaining the MCP donation and the governance intent behind the Agentic AI Foundation.

OpenAI co-founds the Agentic AI Foundation
OpenAI’s announcement about contributing AGENTS.md to the Agentic AI Foundation and supporting open agent infrastructure.

MCP joins the Agentic AI Foundation
The MCP project announcement describing its move under the Linux Foundation and reporting adoption figures.

The Open Source AI Definition 1.0
The Open Source Initiative’s definition of open source AI and the required freedoms to use, study, modify and share AI systems.

OWASP Agentic AI threats and mitigations
OWASP’s agentic AI threat-model resource for understanding risks and mitigations in autonomous AI systems.

The 2025 AI Agent Index
Academic index documenting technical and safety features of deployed agentic AI systems and transparency gaps.

Model Context Protocol security best practices
Official MCP security guidance covering attack vectors and implementation best practices.

Model Context Protocol specification
The MCP specification containing principles for user consent, data privacy, tool safety and sampling controls.

Model Context Protocol authorization specification
MCP authorization guidance covering token validation, resource indicators, HTTPS, PKCE and related OAuth security practices.

Google announces the Agent2Agent protocol
Google’s announcement of A2A as an open protocol for secure agent communication and coordination.

Agent2Agent protocol official documentation
Official A2A documentation describing agent interoperability, delegation and communication across frameworks.

SLSA supply-chain security framework
The official SLSA site explaining supply-chain controls for artifact integrity, tampering prevention and build security.

Sigstore OpenSSF project
OpenSSF’s Sigstore project page describing software artifact signing, verification and transparency logs.

OpenSSF Scorecard
OpenSSF project page describing automated security checks for open source project risk assessment.

SPIFFE and SPIRE
Official SPIFFE site describing cryptographic workload identity for distributed systems and zero trust environments.

OpenID Connect overview
OpenID Foundation resource explaining OpenID Connect as an interoperable identity protocol based on OAuth 2.0.

RFC 8707 Resource Indicators for OAuth 2.0
IETF RFC defining resource indicators so clients can signal the intended protected resource for OAuth access tokens.

NIST SP 800-207 Zero Trust Architecture
NIST guidance defining zero trust architecture and its shift from network perimeter trust to resource-focused verification.

NIST AI Risk Management Framework
NIST framework for managing risks to individuals, organizations and society associated with artificial intelligence.

European Commission guidelines for providers of general-purpose AI models
European Commission guidance on GPAI obligations under the EU AI Act, including timelines and open-source clarifications.

General-purpose AI models in the AI Act questions and answers
European Commission Q&A explaining GPAI documentation, systemic-risk duties and open-source exemptions.

The General-Purpose AI Code of Practice
European Commission page describing the voluntary GPAI Code of Practice for transparency, copyright, safety and security.

OpenAI Agents SDK documentation
OpenAI developer documentation describing agents, tool execution, orchestration, state and sandbox agents.

OpenAI Agents SDK GitHub repository
The open source Python Agents SDK repository for multi-agent workflows and provider-agnostic orchestration.

LangChain agents documentation
LangChain documentation explaining agents as systems that combine language models with tools and iterative task execution.

LlamaIndex agents documentation
LlamaIndex documentation defining agents as automated reasoning and decision engines that use tools, planning and memory.

CrewAI documentation
CrewAI documentation describing its open source framework for orchestrating autonomous agents and workflows.

Microsoft Agent Governance Toolkit
Microsoft’s announcement of an open source runtime governance toolkit for autonomous AI agents.

OpenSSF securing agentic AI tech talk recap
OpenSSF recap discussing the security layers of AI stacks, dependency visibility and SBOM relevance.

OWASP practical guide for secure MCP server development
OWASP resource describing secure architecture, authentication, authorization, validation and deployment for MCP servers.

OWASP Agentic Skills Top 10
OWASP project describing security risks in agentic skills, the execution layer that gives agents real-world impact.

Reuters report on OpenClaw security risks
Reuters coverage of China’s warning about OpenClaw misconfiguration, network exposure, identity authentication and access-control risks.

Stanford HAI report on declining AI transparency
Stanford HAI analysis of the 2025 Foundation Model Transparency Index and continuing opacity in key AI development areas.

NSA, CISA and partners release shared SBOM vision
NSA announcement describing SBOM’s role in software supply-chain visibility and cybersecurity risk management.

AGENTS.md
Official AGENTS.md site describing the open format for guiding coding agents with project-specific instructions.

Goose open source AI agent
Official goose documentation describing the local open source AI agent and its MCP-based extension model.