Agent-to-agent communication: provenance and audit trail for multi-agent AI ecosystems

In the past eighteen months, multi-agent AI ecosystems have moved from pilot to production. Gartner estimates that 80% of enterprise applications shipped or updated in Q1 2026 embed at least one AI agent, up from 33% in 2024. The paradigm has shifted: we no longer think of a single model answering a prompt, but of orchestrators coordinating specialized agents, each with a role, memory and tooling, to run end-to-end workflows across customer service, procurement, sales operations and internal knowledge.

The trust problem moved with them. Asking whether a single agent produced a correct output is no longer enough. The real question is which agent generated which message, in response to which request from which other agent, at which moment and with which state of the conversation. Without an agent to agent communication provenance trail, an operational or legal dispute is impossible to reconstruct: article 12 of the EU AI Act requires automatic event logging across the entire lifetime of the system, and traditional application logs do not meet that bar, because they are mutable by the operator and therefore not admissible.

The answer is not a new log format. It is a paradigm shift: every message exchanged among agents must be captured and certified at runtime with forensic provenance. Cryptographic hash of the payload, qualified timestamp, electronic seal issued by a third-party QTSP through an API, building an immutable graph of interactions. This is the foundation of a multi-agent AI audit trail that holds up in court, supports AI Act compliance and, above all, allows companies to establish liability when an automated decision causes harm.

Why trust shifts from single agent to inter-agent communication

When a single AI agent answers a user, the perimeter of responsibility coincides with the perimeter of the model: the company running it knows what came in as input and what came out as output, and can assess the quality of both. In a multi-agent ecosystem, that perimeter dissolves.

From single agent to end-to-end orchestration

A modern orchestrator coordinates specialized agents: a planner that decides strategy, a retriever that fetches information, a function-caller that invokes external APIs, a critic that evaluates outputs, an executor that closes the loop. Each of these agents can run on a different model, hosted by a different provider, configured with different tools. The final decision presented to the user or pushed into production is the result of an internal conversation of ten, twenty, sometimes fifty inter-agent exchanges, each carrying its own context state.

Gartner data published in August 2025 shows that LangGraph has become the production reference for agentic workflows, cited in 34% of documented architectures at companies with more than 1,000 employees. CrewAI and AutoGen follow with different specializations: CrewAI for role-driven collaboration, AutoGen for multi-turn conversations with coordinators that pick the next speaker. The shared weak spot across all three frameworks is the same: inter-agent messages live in volatile memory, are persisted as unsigned text, and can be rewritten by the system administrator at any time.

Four blind spots of traditional application logs

A traditional application log, even structured and versioned, has four limits compared to a forensic inter-agent communication trail.

First, it is mutable. Anyone with filesystem or database access can alter rows without leaving a trace that holds up against a third party.

Second, it does not bind the event to an external time reference that anyone can verify. A timestamp generated by the application server depends on the server clock: a ruling that needs to fix the exact sequence between two automated decisions cannot rest on a self-declared time.

Third, it does not identify the sender with certainty. An "agent_id" field is a label chosen by the system; nothing prevents a compromised or replaced agent from signing messages as if it were another agent.

Fourth, it is not admissible. In court, the defendant can challenge the integrity of the log, and the burden of proving authenticity falls on the claimant. Without a chain of evidence built by a qualified third party, the probative value of the log is weak.

What the AI Act requires for record-keeping of agentic ecosystems

Article 12 of Regulation (EU) 2024/1689, with progressive applicability culminating in August 2026 for high-risk system obligations, requires providers and deployers of high-risk AI systems to ensure automatic recording of events across the entire lifecycle. The wording is sharp: logs must be generated automatically by the system itself, not manually, and must cover the full operational life, not just current state.

Multi-agent systems integrated into regulated functions, from credit scoring to HR decision support to mission-critical process management, fall within the high-risk perimeter. Compliance does not stop at storing prompts and answers: it requires a trail that proves, verifiably to a third party, which agent did what in response to which communication from which other agent, and at which exact moment.

A 2026 technical synthesis from Help Net Security underlines that for agentic systems, recording obligations require "proof of integrity on demand", a condition difficult to meet with standard logs when agents operate across multiple organizations or providers.

Article 12 and event traceability

Article 12 lists three purposes for recording: (a) identifying situations that may cause the system to present a risk or undergo a substantial modification; (b) facilitating post-market monitoring; (c) enabling the monitoring of system operations. None of these purposes can be met in an agentic ecosystem without a granular trail of inter-agent exchanges: a "risk-presenting situation" can emerge from a chain of three or four cascading decisions, each reasonable in isolation, and reconstructing it after the fact without the original signed and timestamped messages is impossible.

No finalized technical standard for article 12 logging exists yet: two drafts are in discussion, prEN 18229-1 on logging and human oversight and ISO/IEC DIS 24970 on AI system logging. In the meantime the burden of demonstrating compliance falls on the operator, and the sanction risk grows as August 2026 approaches.

Legal liability and reconstruction of causal chains

Record-keeping is not just a question of internal transparency. The EU proposed directive on AI liability and European case law on harm from automated decisions converge on a principle: when damage occurs, the causal chain linking the harmful event to the decisions that produced it must be verifiably reconstructable, otherwise the burden of proof shifts against the system operator. In an agentic ecosystem this means that, without a forensic trail of inter-agent communications, a company risks finding itself in a position of presumed liability that is hard to exit.

What a verifiable multi-agent audit trail needs

A useful audit trail is not a more detailed log. It is a graph of signed, timestamped, linked events where every node can be verified independently of the party that generated it. Three technical requirements separate a forensic trail from an application log.

Requirement	Traditional application logs	Forensic inter-agent trail
Immutability	No, mutable by system operator	Yes, cryptographic hash per message
Qualified timestamp	No, server clock	Yes, eIDAS timestamp from QTSP
Verified agent identity	No, label chosen by system	Yes, certificate bound to agent
Legal admissibility	Limited, weak probative value	Full, eIDAS seal admissible in EU
Graph of interactions	Linear sequence, weak references	Hash chain, every node verifiable

Verified identity for every participating agent

Every agent in the system must hold a verifiable technical identity, bound to a cryptographic key and to a certificate issued by a recognized authority. When agent A sends a message to agent B, the request must be signed with A's private key; the trail records signature, content and recipient so that, at verification time, anyone can check with A's public key whether the message was indeed originated by that agent.

This property becomes decisive when agents cross organizational boundaries. In an agentic supply chain where vendor agents talk to customer agents, application identity alone is not enough: a cryptographic identity is required that the customer can verify even if the vendor changes the internals of its orchestrator.

Qualified timestamp and immutable message hashing

The second requirement is qualified timekeeping. Each message, once signed by the sending agent, receives a timestamp issued by a qualified trust service provider under the eIDAS regulation (Regulation EU 910/2014 as updated by eIDAS 2). A qualified timestamp benefits from a presumption of accuracy and integrity of the date and time indicated, recognized across all EU member states.

Alongside the timestamp, each message is represented by a cryptographic hash (for example SHA-256) that locks its content at the time of the exchange. Any subsequent modification to the message changes its hash, making tampering immediately detectable. The hash chain links each message to the previous one, building a graph that cannot be rewritten retroactively without invalidating the whole structure.

Graph of interactions admissible in court

The third requirement is admissibility. A verifiable audit trail must be producible in proceedings (civil, administrative, regulatory) without the counterparty being able to validly challenge the integrity of the data. In the EU this means relying on instruments recognized by eIDAS: qualified electronic seal, digital signature, qualified timestamp. Only these instruments produce legal effects and admissibility in documentary evidence with the level of assurance the EU legislator has reserved for qualified trust services.

No application product can issue these instruments by itself: their issuance is reserved to authorized QTSPs. That is why a forensic audit trail for agentic ecosystems requires integration with one or more QTSPs, ideally delivered through APIs that orchestrators can call at runtime without disturbing the application flow.

How an inter-agent message is certified with TrueScreen

TrueScreen is the Data Authenticity Platform that certifies origin, integrity and time of creation of any digital content. In the agentic domain, TrueScreen extends its Digital Provenance framework to the runtime certification of inter-agent communications: every message exchanged among agents is captured, timestamped with a qualified timestamp issued by an integrated QTSP and sealed with SHA-256 hashing, building a forensic graph of interactions admissible in any legal or regulatory venue.

The integration pattern is intentionally lightweight. The agentic orchestrator calls a TrueScreen endpoint passing the message payload, the identifiers of sender and recipient agents, and a reference to the conversation context. TrueScreen computes the hash of the payload, applies the electronic seal through a third-party QTSP via API, records the event in its forensic registry and returns to the orchestrator a verifiable reference to store in the conversation state. Added latency stays low because the seal does not travel synchronously with the application message: the orchestrator continues the flow, and certification completes in parallel with durability guarantees.

API for runtime certification of interactions

The TrueScreen API exposes primitives dedicated to inter-agent message certification. Each call accepts the message content, the metadata of the two agents (sender and recipient), a conversation identifier and an optional set of context attributes (model version, tooling configuration, reference to orchestrator state). In response, the API returns a unique identifier, the message hash and the qualified timestamp. The same identifier can be recalled later for an integrity check: the platform reproduces the hash and confirms or denies the match with what was recorded at exchange time.

A concrete use case: a multi-agent customer service orchestrator manages a conversation among an intent recognition agent, a knowledge retrieval agent and a drafting agent. When the drafting agent produces the final answer, the orchestrator invokes TrueScreen to certify the whole chain of exchanges that led to that answer. Six months later, the customer opens a formal dispute: the company retrieves the certification reference, presents it to the customer or to a regulator, and demonstrates in a verifiable way that the answer was not fabricated after the fact.

Orchestrator integrations (LangGraph, AutoGen, CrewAI)

Integration with the most widely adopted agentic frameworks is designed to be non-invasive. In LangGraph, certification plugs into graph nodes as a post-execution callback: every transition between agents emits a TrueScreen call that signs the handoff message. In AutoGen, integration runs through a listener on GroupChat that intercepts every message before it is broadcast to other agents. In CrewAI, a wrapper on tasks and delegates records every delegation and every response as a certified event.

In all three cases the integration principle is the same: the developer does not rewrite the orchestrator's logic, but adds a thin layer that intercepts exchanges and submits them for certification. The result is a forensic graph that lives next to the application graph and can be queried at any time to reconstruct the causal chain of a decision.

Seal and qualified timestamp issued by an integrated QTSP

One point deserves emphasis: TrueScreen does not issue qualified seals on its own. The electronic seal and the qualified timestamp are applied by a third-party QTSP qualified under eIDAS, integrated into TrueScreen through API. TrueScreen is the acquisition and certification platform that applies forensic methodology, but the legal value of the seal derives from the qualification of the trust service provider. This separation is fundamental to the defensibility of the trail: in a dispute, the admissibility of the seal rests on the QTSP, not on an application platform.

Three operational scenarios where the trail makes the difference

To see why runtime certification of inter-agent communications is not an academic exercise, three scenarios illustrate where a forensic trail changes the outcome of a dispute or an audit.

Agentic supply chain and operational disputes

A manufacturing company runs a multi-agent ecosystem to manage orders for critical components. A vendor agent confirms availability, a customer agent accepts terms, a logistics agent schedules delivery. Three months later a significant delay produces downstream damage: the customer challenges the vendor for having accepted terms different from those agreed in the initial exchange.

Without a forensic trail, the parties compare application logs maintained separately, each with timestamps from its own server and each mutable by the other party. With a runtime-certified audit trail, the original message from the vendor agent is sealed with a hash and qualified timestamp: content and time cannot be challenged, and liability is established on documentary evidence.

Financial trading bots and regulated accountability

In a regulated domain such as algorithmic trading, an agent ecosystem analyzes market signals, evaluates risk, applies compliance rules and submits orders. A supervisory authority opens an investigation on an operation suspected of breaching internal risk limits. The regulator asks to reconstruct the chain of inter-agent decisions that led to the order.

With standard application logs the company provides a database export, but the regulator observes that times are self-declared and that nothing prevents a system administrator from having altered records after the event. The regulator's response is a request for proof of integrity that the company struggles to meet. With a forensic trail, every exchange between agents already carries a qualified timestamp and a seal: the export is verifiable by anyone with the public keys of the QTSPs involved, and the burden of proof of integrity is met transparently.

Multi-agent customer service and customer disputes

In an automated customer service workflow, a conversation between the customer and the multi-agent system results in a concession: for example a refund, a warranty extension, a contract change. Six months later the company does not recognize the concession and the customer complains. Without a verifiable trail, the dispute hinges on the word of one of the two parties.

With a runtime-certified trail, every message from the system to the customer, every inter-agent decision that led to that concession, and every step of the flow are timestamped and sealed. The company can produce the full sequence, demonstrate what was said, by which agent and when. The concession holds up or it does not, but on documentary grounds, not on an interpretable narrative.

FAQ: frequently asked questions on agent-to-agent audit trails

What is an agent-to-agent audit trail?

It is a forensic registry of all messages exchanged between the agents of a multi-agent ecosystem, where every exchange identifies sender, recipient, content and a qualified timestamp, and is protected by a cryptographic hash. Unlike an application log, the trail is verifiable by a third party and admissible in court.

Does the AI Act require tracking inter-agent communications?

Article 12 of Regulation (EU) 2024/1689 requires providers and deployers of high-risk AI systems to ensure automatic recording of events across the full system lifecycle. For multi-agent ecosystems in regulated domains, this includes the chain of inter-agent exchanges that lead to a decision. The European Commission has clarified that the obligations call for proof of integrity on demand, not a plain text log.

How does TrueScreen integrate into an orchestrator such as LangGraph?

Integration plugs into graph nodes as a post-execution callback: every transition between agents emits a call to the TrueScreen API that signs the handoff message with hash, qualified timestamp and electronic seal. The developer does not modify orchestrator logic: a thin interception layer is added on top.

Are standard logs from an agentic framework enough for compliance?

No. Standard logs are mutable by the operator, rely on internal clocks and are not admissible in disputes. They serve the operational observability function, but they do not provide the proof of integrity required for AI Act compliance in regulated domains and they are not admissible as strong documentary evidence in proceedings.

What happens when an automated multi-agent decision causes harm?

If the causal chain between the harmful event and the decisions that produced it cannot be reconstructed in a verifiable way, the burden of proof risks shifting against the system operator. A forensic trail of inter-agent communications, certified at runtime and sealed through a QTSP, makes it possible to reconstruct the exact sequence and establish liability on documentary grounds.

Ready to certify your agents’ communications?

Discover how TrueScreen builds a forensic trail of inter-agent interactions in production.

Start now

Request a demo