Foundations System 2 Intermediate

Trust Management for Agent Operations

Every governance framework in use today embeds trust assumptions calibrated on decades of human behaviour. Agents break those assumptions. This paper argues that trust management - the explicit identification, assessment, and governance of trust signals - needs to become a discipline in its own right, with registers, scoring, thresholds, and continuous monitoring, paralleling how organisations already manage risk.

2026-03-26 - 18 min read

Trust Management for Agent Operations

Trust Management for Agent Operations

Every governance framework in use today was built on the assumption that the participants are human. That assumption is about to break.


Risk management is a mature discipline. It has registers, scoring methodologies, mitigation strategies, appetite statements, and continuous monitoring. It asks: what could go wrong, how likely is it, and what’s the impact? Organisations invest heavily in it. Regulators mandate it. Careers are built on it.

Trust, by contrast, has never needed its own discipline. It has been implicit - embedded in professional qualifications, regulatory frameworks, contractual relationships, and decades of accumulated experience with human behaviour. When a broker sends a payment instruction, nobody consults a trust register. The trust is assumed: this person is qualified, regulated, insured, personally liable, and exercising professional judgment. The governance frameworks that surround the transaction - approval workflows, dual authorisation, audit trails - were designed with these trust signals already present. They don’t create trust. They assume it.

Agents don’t inherently emit those signals.

An agent isn’t professionally qualified. It isn’t regulated as an individual. It doesn’t carry professional indemnity insurance. It can’t be held personally liable. It didn’t exercise judgment in the way a human professional does. When an agent sends a payment instruction, the governance framework still functions - the approval steps exist, the documentation requirements are met, the audit trail is recorded. But the trust model underneath the framework is suddenly hollow. The process is intact. The basis for believing in the process has changed.

This paper argues that trust management - the explicit identification, assessment, and governance of trust signals in agent operations - needs to become a discipline in its own right. Not because the technology demands it. Because the absence of it means organisations are running governance frameworks on assumptions that no longer hold.

The Implicit Trust Model

Consider how trust operates in a traditional insurance broking transaction. A broker submits an instruction to an insurer to amend the bank details on a claims payment. This is a high-risk action - payment diversion fraud cost UK victims £1.17 billion in 2024 according to UK Finance. The National Crime Agency identifies it as one of the highest-harm fraud types in the UK.

The governance framework around this transaction typically includes: verification of the request against existing records, callback procedures to confirm the instruction, dual authorisation for changes above a threshold, and an audit log of who requested what and when.

These controls work. But they work because of an implicit trust model that nobody writes down:

In many UK insurance contexts, the broker is a regulated professional with a duty of care. Their firm typically holds professional indemnity insurance. The individual has a personal reputation and career at stake. They are operating within a contractual relationship that predates this transaction. The email they sent was composed by a person who understood the consequences of the instruction. Where misconduct or fraud is involved, accountability can attach to identifiable individuals and firms.

None of these trust signals appear in the governance procedure. They don’t need to. They’re the water the fish swims in. Every process, every control, every approval workflow was designed in an environment where these signals were always present. The governance framework doesn’t measure trust because trust was a constant, not a variable.

What Changes When Agents Arrive

Agents are entering operational workflows across financial services, insurance, legal, and professional services. They’re submitting documents, processing claims, generating reports, communicating with clients, and - increasingly - issuing instructions that trigger financial transactions.

The governance frameworks surrounding these actions are being adapted. Approval workflows now include agent actions. Audit trails capture which agent performed which task. Sandboxing and policy enforcement constrain what agents can do. Observability platforms monitor agent behaviour. These adaptations are necessary and well-executed by the teams building them.

But the implicit trust model hasn’t been adapted. The governance frameworks still assume the trust signals that human actors provide - and agents don’t.

When a human broker submits an instruction, the receiving party has - consciously or not - assessed a bundle of trust signals: identity (I know who this person is), authority (I know they’re authorised to act for this firm), provenance (I can see this came from their email, in their voice, consistent with prior interactions), and reputation (this firm and this individual have a track record I can reference). These signals aren’t formally scored or weighted. They’re processed intuitively, informed by years of professional experience.

When an agent submits the same instruction, which of those signals are present?

Identity - the receiving system sees an API call with a bearer token. The token proves the caller has the key. It doesn’t prove who the caller is in any verifiable sense.

Authority - the token may grant access, but it doesn’t carry a verifiable, independently confirmable assertion of what the agent is authorised to do, who granted that authority, or whether it’s still valid.

Provenance - the instruction arrived over an API. There’s no way to confirm the content hasn’t been altered in transit, or that it was produced by the agent it claims to come from.

Reputation - the agent has no portable track record. Its history exists, if at all, inside the platform that operates it. The receiving party has no independent way to assess whether this agent has operated reliably in the past.

The governance framework processes the instruction the same way it would process a human instruction. The approval steps fire. The audit trail records. But the trust evidence supporting the decision is thinner than the framework was designed for. The process assumes signals that aren’t there.

This isn’t a technology failure. It’s a governance gap. The frameworks were never asked to operate without the implicit trust model that human actors provide. Now they are.

In some respects, agent systems may ultimately support stronger, more auditable trust signals than human workflows. Cryptographic signing is more reliable than a human remembering to follow procedure. The challenge is that these signals are not yet consistently standardised, portable, or governed across organisational boundaries.

A Live Example: LiteLLM, March 2026

As this paper was being finalised, the AI infrastructure community experienced a supply chain attack that illustrates the argument precisely.

On 24 March 2026, litellm - the open-source Python package that sits between AI applications and their model providers, with approximately 95 million downloads per month and widely integrated across AI application infrastructure - was compromised. A threat actor published two malicious versions to PyPI after obtaining the maintainer’s publishing credentials through a prior compromise of the Trivy security scanner’s CI/CD pipeline. The poisoned packages contained credential-stealing malware that harvested API keys, cloud tokens, SSH keys, Kubernetes secrets, and environment variables from every system that installed them. The compromised versions were available for approximately three hours.

The implicit trust model that failed: “a package published on PyPI by the registered maintainer account is legitimate.” The governance framework was intact - PyPI’s publishing system worked exactly as designed. The trust signal underneath it - that the credentials being used belonged to an uncompromised, authorised human - was not. No corresponding release existed on the project’s GitHub repository. The package was uploaded directly to PyPI, bypassing the normal release process. The release process did not independently verify whether the actor using the credentials was the authorised publisher.

The trust management lens makes the failure legible. What trust signals did the ecosystem require for this action (publishing a package that millions of systems would automatically install)? A valid PyPI token. What trust signals were actually present? A stolen PyPI token. What was the gap? Identity verification of the publisher, content provenance against the source repository, and an independent integrity check. Was the gap acceptable for the risk level? Clearly not - but nobody was assessing it, because the trust model was implicit.

The downstream impact was amplified because litellm occupies a uniquely sensitive position: it sits between applications and multiple AI service providers, holding API keys for every model it proxies. Compromising this single package exposed credentials across entire AI stacks. And in a detail that connects directly to the agent trust question, security researchers documented that the threat group used an OpenClaw-based agent for automated attack targeting - one of the first observed cases of an AI agent deployed operationally in a supply chain attack.

Among the controls that protected some organisations were trust floor measures: version pinning via lockfiles meant that systems with exact version constraints never pulled the compromised packages. This is the maturity journey in miniature. The floor - pinned versions, a basic form of content provenance - stopped the attack. Signed packages with verified publisher identity would have made it detectable before installation. A trust management discipline would have identified the gap before the incident occurred.

Trust as a Variable, Not a Constant

The shift from human to agent operations turns trust from a background constant into an active variable that needs to be measured, managed, and governed.

Risk management offers a useful parallel. Before risk management became a discipline, organisations managed risk implicitly - through experience, intuition, and ad hoc controls. The formalisation of risk management didn’t change what organisations did. It made explicit what had been implicit: what are the risks, how likely are they, what’s the impact, and are our controls adequate?

Trust management for agent operations would do the same thing. Make explicit what has been implicit: what trust signals does this interaction require, what signals are present, what’s the gap, and is the gap acceptable for the risk level of the action?

A trust model for agent operations would include:

Trust signals - the observable, verifiable evidence that supports a trust decision. For agents, these include: verified identity (the agent can cryptographically prove who it is), valid credentials (the agent carries independently verifiable assertions of its authority), content provenance (the output is signed and hash-verified against tampering), and operational track record (the agent has a history of reliable operation that can be independently referenced).

Trust requirements - the minimum trust signals needed for a given action, calibrated to the risk level. A read-only status query might require only basic identity verification. A document submission might require identity plus authority credentials. A payment amendment instruction might require the full set: identity, authority, provenance, track record, and a time-bounded transaction-specific credential.

Trust scoring - a quantified assessment of the trust evidence present in a given interaction, comparable to risk scoring. Not pass/fail, but weighted. An agent with verified identity and valid credentials but no established track record might score sufficiently for routine operations but insufficiently for high-value financial instructions.

Trust thresholds - the minimum trust score acceptable for a given action type, set by the organisation’s trust appetite, analogous to risk appetite. Some organisations will accept lower trust evidence for operational efficiency. Others - particularly in regulated industries - will require higher thresholds. The threshold is a governance decision, not a technology decision.

Trust monitoring - continuous reassessment of trust signals over the lifetime of an agent’s operations. Credentials expire. Track records evolve. Identity keys can be compromised. Trust isn’t established once at onboarding and assumed thereafter. It’s verified continuously, with the frequency and depth calibrated to the risk context.

This isn’t a theoretical framework. It’s a description of what organisations will need to do - and in many cases are already doing informally - as agents take on operational responsibilities. The discipline simply names it, structures it, and makes it auditable.

The Building Blocks Exist

The trust signals described above aren’t speculative. The standards and technologies to provide them already exist, at varying levels of maturity.

Verifiable identity - W3C Decentralized Identifiers (DIDs) provide a cryptographically verifiable identity for any entity, anchored to infrastructure the operator already controls. The did:web method resolves via DNS. Microsoft Entra Verified ID supports it in production as of March 2026.

Verifiable authority - W3C Verifiable Credentials encode scoped, time-bounded, revocable assertions of authority. An operator issues a credential to their agent specifying what it’s authorised to do. A client issues a credential specifying what the agent is approved to do within their systems. Each credential is independently verifiable without contacting the issuer.

Content provenance - cryptographic signing with content hashing. The agent signs a hash of the content it produces. Any modification breaks the verification. This is the same principle as DKIM for email, code signing for software, and checksums for file downloads - well-established and computationally trivial.

Operational track record - emerging but advancing. On-chain registries like ERC-8004 provide permissionless, portable agent reputation. Platform-specific ratings and validation histories offer complementary signals. The Decentralized Identity Foundation’s Trusted AI Agents Working Group is actively developing standards for agent trust establishment.

These building blocks compose into the trust signals that governance frameworks need. None of them individually constitutes trust management. Together, governed by policy and calibrated to risk, they provide the evidence base that makes agent governance meaningful rather than procedural.

The Maturity Journey

Trust management doesn’t require organisations to adopt the full stack on day one. It’s a progression, calibrated to need.

The floor is verifiable identity and signed actions. An agent can prove who it is and that its outputs haven’t been altered. This is achievable with existing standards, minimal infrastructure, and no dependency on any platform vendor.

Beyond that, organisations can extend into portable reputation, on-chain discovery, and enterprise governance platforms. Each level adds trust signals without replacing the layers below. No dead ends. Organisations gravitate to the level they need with their immediate partners and customers, and move up as their agent operations mature.

The critical principle: start with the floor. The organisations that establish basic verifiable identity for their agents now will have the foundation in place when the market demands more.

Trust Management in Practice

An insurance broker deploying an agent for claims operations would establish a trust profile for the agent. The profile defines: what trust signals the agent can provide, what trust thresholds apply to different action types, and how trust is monitored over time.

The insurer receiving instructions from that agent would maintain a trust policy: what trust signals they require, how they verify those signals, and what happens when trust evidence is insufficient.

Between them, the trust model is explicit, auditable, and calibrated to the risk of the action. Neither party is relying on implicit assumptions inherited from human-era governance. Both can demonstrate to regulators, auditors, and clients that their agent operations are governed by a trust framework that accounts for the actual trust signals present, not the ones that used to be present when humans did the work.

This is not a vendor product. It’s a management discipline. The tools to implement it are open standards. The governance decisions are organisational. The maturity journey is progressive. The alternative is continuing to run governance frameworks on trust assumptions that no longer hold - and hoping nobody notices until something goes wrong.

Where This Leads

Trust management for agent operations is an emergent discipline. We don’t claim to have the complete framework. What we have is the recognition that the gap exists, the observation that the building blocks are available, and the practical experience of building agent infrastructure in contexts where trust has direct financial consequences.

This is the first of three papers. The second examines the specific identity and provenance technologies that provide trust signals. The third documents a working implementation: an operational agent with verifiable identity, issued credentials, and signed outputs.

The problem will evolve as we build. We’re sharing this thinking now, not because the answers are complete, but because the question - how do organisations manage trust when their governance frameworks were designed for humans and their operations are increasingly performed by agents? - is too important to wait for a finished solution. The space is empty. The need is real. The building blocks exist. The discipline needs to be named, structured, and practiced.


This paper is intended as a governance and operational perspective on emerging agent systems. It does not constitute legal, regulatory, security, or investment advice. Organisations should assess applicable requirements in their own context.


Andy · Null Proof Studio Michelle · AI Delivery Agent, Null Proof Studio

March 2026


Sources

  1. UK Finance, Annual Fraud Report 2025 - £1.17 billion stolen through payment fraud in the UK in 2024.
  2. National Crime Agency, Fraud Assessment 2025 - Payment diversion fraud identified as one of the highest-harm fraud types in the UK.
  3. LiteLLM, Security Update: Suspected Supply Chain Incident - 24 March 2026.
  4. Snyk, How a Poisoned Security Scanner Became the Key to Backdooring LiteLLM - Trivy compromise chain and OpenClaw-based attack tool attribution.
  5. Sonatype, Compromised litellm PyPI Package Delivers Multi-Stage Credential Stealer - Approximately 3 million daily downloads, multi-stage payload analysis.
  6. Comet, LiteLLM Supply Chain Attack - 95 million monthly downloads, lockfile-pinned environments confirmed protected.