Data integrity in the AI era: why source certification rewrites the paradigm
For decades, data integrity has been treated as an infrastructure problem. ACID-compliant databases, post-hoc hashing, encryption at rest, application audit trails: a stack engineered to preserve data once it reached enterprise systems. The implicit assumption was simple: what gets captured is authentic, so the real job is keeping it intact over time.
That assumption has broken. Generative AI now manipulates photos, videos, audio and documents before they ever touch a corporate system. The tampering no longer happens inside the database: it happens earlier, in the instant the data is created. The ENISA Threat Landscape 2025 report estimates that more than 80% of global phishing campaigns now use AI-generated or AI-enhanced content, and the Precisely 2026 Data Integrity Report, based on over 500 senior data leaders across the US and EMEA, exposes the scale of the gap: 88% of organizations claim data readiness for AI, while 43% name data readiness as the single biggest barrier to AI adoption. The structural question follows: how do you guarantee data integrity when the manipulation precedes the act of recording?
The answer requires a paradigm shift. Data integrity in the AI era is no longer defended downstream, inside databases and backup systems: it is established at the source, when data is created, through a forensic-grade methodology that captures, verifies and certifies the content with legal value. TrueScreen calls this approach “integrity-at-source” and deploys it as a cross-cutting trust layer for critical business processes: forensic-grade certification extends the guarantee to the weakest link in the chain, the moment data enters the corporate perimeter.
What data integrity meant before generative AI
Until a few years ago, “data integrity” was a term used operationally to describe the accuracy, completeness and consistency of data across its lifecycle inside enterprise systems. The canonical definition revolved around well-understood controls: referential integrity constraints, atomic transactions, verification signatures, backup and restore procedures. The reference frameworks were the pharmaceutical industry’s ALCOA principles (Attributable, Legible, Contemporaneous, Original, Accurate) and the cybersecurity CIA triad (Confidentiality, Integrity, Availability).
ALCOA was introduced by the US Food and Drug Administration in the 1990s to ensure that clinical data is attributable to a subject, legible, produced contemporaneously with the event, kept original and kept accurate. It is the five-point test that pharma and other regulated sectors adopted as a compliance baseline. The CIA triad, coming from information security, complemented ALCOA with three operational properties: confidentiality (only authorized parties access the data), integrity (data is not altered without authorization) and availability (data is accessible when needed). Together, ALCOA and CIA have been the vocabulary enterprises used to reason about integrity for the past three decades, codified in standards like ISO/IEC 27001 and reflected in SOX, HIPAA and GDPR controls.
The five ALCOA principles and the CIA triad
The five ALCOA principles work as a conformity test for any record: attribution to an identifiable author, legibility over time, contemporaneity with the event, preservation of the original, content accuracy. The CIA triad adds the system layer: data must be protected from unauthorized access, untracked modifications and availability disruptions. These are solid principles, but built for an era in which data was born “inside systems” and had to be defended within them.
The limits of post-creation guarantees
ACID transactions, hashing as a checksum, crypto at rest, role segregation: all of these techniques share one implicit assumption. They assume that incoming data is authentic and that the real challenge is its preservation. A file hash, computed when the file arrives in the system, guarantees that from that moment on it has not been modified. It says nothing about what the file represented when it was created. A photo manipulated with a diffusion model and then uploaded to a corporate system receives exactly the same treatment as an authentic photo: signed hash, immutable record, archived. The system “sees” it as intact because it never saw the original. That was an academic curiosity until 2022; today it is a systemic risk.
Why generative AI breaks the classic integrity model
The shift is twofold. First, manipulation tools have become cheap and accessible. A few hundred dollars buys deepfake video quality that, in 2020, only specialized studios could produce. Second, the attack surface has moved. The weakest link is no longer the production database: it is the moment data enters the organization. A videocall, a screenshot, a signed PDF, a photo of an insurance claim, a voice recording: these are all artifacts that enterprises receive from the outside and that generative AI can alter indistinguishably at the source. Classic integrity, tuned for “after,” sees none of this.
The numbers define the scale. The Precisely 2026 Data Integrity Report finds that 87% of organizations claim to have the infrastructure required for AI, yet 42% cite the same infrastructure as the main barrier; in parallel, 88% say they are data-ready while 43% name data readiness as the most significant impediment. Only 63% have an AI governance program in place. Gartner forecasts that 60% of AI projects will be abandoned by the end of 2026 due to a lack of AI-ready data. The ENISA Threat Landscape 2025 examined 4,875 incidents and observed that more than 80% of phishing campaigns now leverage AI-generated content: the compromise point has decisively moved to the source.
Tampering moves upstream
Consider a photo uploaded into the claims app of an insurance company. In the pre-AI world, that photo was a “fact”: taken with a smartphone, transmitted to the insurer, archived. Integrity controls ensured that, once received, the photo would not be altered. Today that same photo can be synthetic, generated by a model that invents a non-existent damage; or authentic but with a manipulated timestamp; or authentic and original but with falsified EXIF metadata. The database receives it and treats it as genuine data. The fraud has already been committed.
Market data on enterprise AI readiness
The World Economic Forum’s Global Cybersecurity Outlook 2026 reports that 94% of surveyed leaders rank AI as the single most significant driver of cyber risk change, and that 34% now worry more about data leaks tied to generative AI than about adversarial AI attacks (29%). The share of organizations with processes to assess AI tool security almost doubled, from 37% in 2025 to 64% in 2026. That still leaves one organization in three, at the start of 2026, operating without any structured control on the integrity of AI-generated content entering their processes.
From theoretical risk to real loss: the 25 million dollar deepfake
In February 2024 a Hong Kong-based multinational engineering firm, UK-headquartered Arup, wired approximately 25 million dollars after a videoconference in which the CFO and other senior colleagues had been recreated by deepfake to authorize an international transfer. The case, documented by the World Economic Forum, is emblematic: no cryptography was broken, no database was breached, no perimeter system was compromised. The “videocall” data entered the organization as authentic data because no classic integrity mechanism looked at its source. INTERPOL, in its Global Financial Fraud Threat Assessment, estimates that AI-enhanced fraud is 4.5 times more profitable than traditional cybercrime: an economic multiplier that explains why the shift to source-side attacks is not a fashion but an industrial trend.
The new paradigm: source certification with forensic methodology
The structural response to this scenario cannot be another detection layer. Synthetic content detection techniques improve, but generative models improve too: the creation-versus-detection race is a losing one, as the University of Edinburgh study has shown when demonstrating that AI “fingerprints” can be removed and forged with relative ease. The alternative that holds up over time is not about recognizing the fake: it is about guaranteeing the real, certifying data at the instant it is born with a methodology that locks it forensically.
Source certification, or “integrity-at-source,” is a set of technical-legal controls that acts at the moment data is captured. It is not a stamp added after the fact: it is a process where TrueScreen captures the content with a forensic methodology aligned with the ISO/IEC 27037:2012 standard, verifies its integrity and creation context, and certifies it by applying a qualified electronic seal under the EU eIDAS Regulation. The result is a digital artifact with a reconstructible chain of custody, a qualified time stamp with legal value, and proof of the data’s initial condition: exactly what ALCOA required before AI made the very concept of “original” ambiguous. It is also an approach consistent with the EU AI Act, whose transparency and traceability duties explicitly mention the need for provenance of AI-generated content.
What “integrity-at-source” really means
Integrity-at-source means shifting the perimeter of the guarantee. The question is no longer “how do I protect this data going forward” but “how do I certify what this data was in the moment of its creation.” The concept translates into three concrete properties: content immutability (bit-for-bit preservation from the moment of capture), source authenticity (attribution to an identified subject and device), and certified contextuality (qualified time stamp, geolocation and device telemetry). A three-part signature that, read together, reproduces the ALCOA guarantee in a world where data no longer originates inside a corporate system, but outside, on a smartphone, a camera, a browser, a videocall. Digital Provenance becomes the load-bearing property of this new model.
The building blocks: forensic capture, verification, qualified seal and time stamp
The process is sequential. Forensic capture records the data using modes that preserve its original nature (no lossy recompression, metadata preservation, device context evidence). Verification runs automated integrity and consistency checks: cryptographic hashing, timestamp cross-checks, chain-of-custody analysis. Certification closes the loop with a qualified electronic seal issued by an EU-recognized Qualified Trust Service Provider (QTSP), a qualified time stamp and, when the use case requires it, a digital signature from the subject performing the capture. Nothing is added afterwards: everything happens inside the creation flow, leaving no tampering window.
Classic data integrity vs source integrity
| Dimension | Classic data integrity | Data integrity at source |
|---|---|---|
| When it acts | After data enters systems | At the instant of creation |
| Object protected | Storage and transfer | Origin, context and content |
| Typical techniques | ACID, hashing, crypto at rest, RBAC | Forensic capture, QTSP seal, qualified time stamp |
| Legal value | Variable, context-dependent | Defined by eIDAS and established case law |
| Resilience to generative AI | Low: the source is invisible | High: certifies the moment of creation |
| Compliance reference | ISO 27001, GDPR, SOX | ISO 27037, eIDAS 2, EU AI Act |
The regulatory frame: eIDAS 2, ISO 27037, EU AI Act
The EU Regulation 910/2014 (eIDAS) and its update eIDAS 2 (Regulation 2024/1183) define the legal value of the qualified electronic seal and the qualified time stamp as evidence with presumed probative force across all EU member states. The ISO/IEC 27037:2012 standard sets out the forensic methodology for identification, collection, acquisition and preservation of digital data: it is the international reference for any data integrity process that expects to hold up in court. The EU AI Act (Regulation 2024/1689), in force since 2024 with staggered application through 2027, introduces for high-risk AI systems obligations on event logging, traceability of input data and transparency on the provenance of synthetic content. On the international side, ISO/IEC 27001 and NIST SP 800-86 provide complementary requirements on information security management and computer forensics. US Federal Rules of Evidence 901 establish the authentication baseline for digital evidence admitted in federal proceedings.
How a Data Authenticity Platform enables integrity at the source
What is a Data Authenticity Platform, and why does it matter for data integrity in the AI era? It is a software platform that captures, verifies and certifies digital content with a forensic-grade methodology at the moment of its creation, producing evidence with recognized legal value. TrueScreen is the Data Authenticity Platform that enables companies and professionals to ensure the authenticity and reliability of digital information, making critical processes faster, fraud-proof and compliant with regulations. Through forensic-grade data capture, verification and certification, combined with end-to-end provenance, TrueScreen guarantees the authenticity, traceability and legal validity of digital information throughout its entire lifecycle. The technical architecture rests on three interlocked pillars: a forensic capture engine, a verification and immutability layer, and a certification layer with qualified QTSP seal and qualified time stamp.
Mobile app for forensic field capture
The TrueScreen mobile app turns any smartphone into a certified capture tool. A field operator, an insurance adjuster assessing a claim, a construction inspector, a real estate agent takes photos or records video directly from the app: each piece of content is captured with forensic methodology, geolocated, accompanied by device telemetry and sealed immediately. The resulting file is not just an image: it is a digital artifact with a complete chain of custody.
Enterprise platform and web portal
The TrueScreen web platform lets enterprise teams manage captures, certifications and archives from a browser. The console shows the status of every piece of evidence, the history of verifications and the validity of each seal; users invite collaborators, organize evidence by case or dossier, and export reports with probative value.
API and SDK for embedding certification in workflows
For companies that want to bring source certification inside their existing systems, TrueScreen APIs and the mobile SDK make it straightforward to embed forensic capture in application flows. An insurance app can call the SDK at the moment a claim photo is taken; a KYC system can certify the onboarding videocall; a document management platform can certify every upload. Certification stops being a separate step and becomes part of the data creation pipeline.
Certified Data Room and Mail Certification
Once data is certified, it needs to be shared, archived or exchanged. That is where the TrueScreen Data Room comes in, providing storage with a complete audit trail, together with Mail Certification to turn every email communication into intact, traceable evidence. Together they form the ecosystem that carries integrity at the source from the instant of capture all the way to legal proceedings, audits or board meetings.
What changes for CISOs, compliance and enterprise risk
The paradigm shift has direct consequences on three roles: the CISO, who has to rethink the security perimeter to include the instant of data creation; the compliance lead, who must map source certification onto EU AI Act, eIDAS 2 and sector-specific requirements; the enterprise risk manager, who has to quantify the exposure to AI-native fraud that currently bypasses traditional controls. Precisely finds that 71% of organizations with an established data strategy report high data trust, compared with 50% of those without: the gap is operational, not cultural. Where integrity-at-source is missing from the strategy, trust in data erodes at the same rate at which generative models improve.
KPIs and metrics for source-level data integrity
Classic KPIs (availability, accuracy, completeness) need to be augmented with metrics dedicated to source integrity: percentage of evidence captured with forensic methodology on the total received, average time to certification, rate of contested evidence rejected in legal proceedings, certification coverage across high-fraud-risk processes. These are metrics that forward-looking insurance and banking firms are starting to report to the board alongside traditional security KPIs, acknowledging that source integrity has become an indicator of resilience to AI-native risk.
Governance: integrating integrity into the data lifecycle
Governance for integrity at source is not an isolated security project: it is a cross-functional program that involves legal, compliance, IT and operating business units. The emerging model from more mature organizations rests on three layers: a policy framework defining which data must be certified at source (typically anything that could end up in a dispute, an audit or a claims review); a technical architecture embedding forensic capture into entry points (apps, APIs, portals, communication channels); a measurement mechanism tracking adoption and value generated. That is the direction market analysts, from Gartner to Forrester, identify as the “trust layer” of data infrastructure, and that Gartner quantifies by forecasting that 60% of AI projects without AI-ready data will be abandoned by the end of 2026.
Restoring integrity at the source does not mean discarding four decades of post-creation controls. It means extending the perimeter of the guarantee to the point where, today, trust is won or lost: the instant the data is born. It is a paradigm change that touches technology, processes and governance, and, like every paradigm change, it gets adopted first to survive the risk and then to generate competitive advantage.

