AB 2013 Documentation Guide: How to Write a Compliant Training Data Summary in 2026

California AB 2013 — the Generative Artificial Intelligence: Training Data Transparency Act — took effect January 1, 2026 and requires developers of generative AI systems made publicly available to Californians to post a high-level summary of their training data, organized around twelve enumerated disclosure categories. This guide walks through what each of those twelve categories actually asks for, how to write a summary that is "high-level" without being evasive, what the early implementation pattern from OpenAI and Anthropic looks like, and what the xAI litigation tells us about the trade-secret limits of the statute. It is built for the engineer, compliance lead, or general counsel who needs to ship a working disclosure this quarter — not a 2024-vintage preview of a law that's already in force.

What AB 2013 actually says (and what it doesn't)

AB 2013 is a transparency statute, not a content statute. It does not regulate what data you can train on, whether you can use copyrighted material, or how you must obtain consent. It only regulates what you must tell the public about the data you used. The covered behavior is making a generative AI system publicly available to Californians at any point on or after January 1, 2026, where that system was first released or substantially modified on or after January 1, 2022. The compliance act is publishing a documented, high-level summary on the developer's website. The statute applies regardless of whether use of the system involves payment, which means free chatbots, free image generators, and freemium products are covered as squarely as paid services.

Two things AB 2013 does not contain are worth flagging up front. First, there is no trade-secret exemption — the absence is conspicuous, and is the central argument in the xAI v. Bonta litigation. Second, there is no standalone enforcement mechanism. The statute is enforceable through California's Unfair Competition Law (UCL), which the Attorney General, district attorneys, certain city attorneys, and (for actual injury) private plaintiffs can invoke. UCL penalties cap at $2,500 per violation, but the more meaningful exposure is class-action and reputational, especially because compliance disclosures are public and become discoverable evidence in unrelated copyright and privacy litigation.

The 12 required disclosure categories, in plain language

The statute enumerates twelve items the "high-level summary" must cover. Treat this as your document outline and you cannot accidentally omit a required field. The first category is the sources or owners of the datasets — who provided or owned the data, whether commercial vendor, public-web crawl, licensed corpus, or user-generated. The second is how the datasets further the system's intended purpose — a brief explanation of why each dataset was selected. The third is the number of data points, which the statute permits as a general range or as an estimate for dynamic datasets, so "approximately 12 billion text tokens" or "between 100M and 500M images" satisfies the standard.

The fourth category is a description of the types of data points — either the labels used (for supervised data) or general characteristics (for unsupervised). Fifth is copyright/trademark/patent status: whether the data is protected, public-domain, or mixed. Sixth is licensing and acquisition method: whether you purchased, licensed, or otherwise obtained the data. Seventh is whether the data includes personal information as defined by the CCPA. Eighth is whether it includes aggregate consumer information, a separate CCPA-defined category. Ninth is the time period during which the data was collected. Tenth is the dates of first use in training. Eleventh is whether the developer modified or cleaned the data and a description of those modifications. Twelfth is whether the system used or continuously uses synthetic data generation, with an optional description of the functional purpose of that synthetic data.

A practical compliance pattern that is now emerging from major-vendor disclosures: organize the document as a list of dataset categories (e.g., "Public Web Crawl," "Licensed Books," "Synthetic Reasoning Traces," "Code Repositories," "User Conversations"), and answer the twelve enumerated questions for each category. This is more useful for readers than a flat document and is easier to maintain when datasets change.

How "high-level" is high-level? The xAI litigation as a real-world test

The single most asked question about AB 2013 is what "high-level" actually means in practice. The statute does not define it. Until very recently this was pure speculation, but the post-effective-date evidence is now meaningful. As of January 2026, OpenAI and Anthropic both posted AB 2013 disclosures on their websites — described in legal commentary as "generalized" in OpenAI's case and as enumerating the statutory categories without disclosing dataset-level proprietary detail. Neither company sued.

xAI took the opposite path. On December 29, 2025, three days before AB 2013 took effect, xAI filed a federal complaint in the Central District of California against Attorney General Rob Bonta, asserting that the statute is an "unconstitutional trade-secrets-destroying disclosure regime." The complaint advanced four counts: per se Takings, regulatory Takings, compelled speech in violation of the First Amendment, and unconstitutional vagueness under the Due Process Clause. xAI sought a preliminary injunction. On March 4, 2026, Judge Jesus G. Bernal denied the motion, finding that xAI had standing but had not demonstrated a likelihood of success on any of the three constitutional theories. The case continues, but enforcement was not blocked.

The most useful piece of the ruling for compliance teams was implicit rather than explicit: the court was unimpressed by xAI's generalized trade-secret claims given that similarly situated competitors (OpenAI, Anthropic) had complied without apparent harm. That tells you the practical floor: if a model card or disclosure of comparable specificity to what OpenAI and Anthropic published is in the field, your "but it's a trade secret" argument needs to be specific to your situation, not generalized. The complement is also true — if you publish at materially less detail than your competitors and a regulator notices, you do not have a strong "everyone redacts" defense.

A working AB 2013 documentation process

For teams that are still building this, the right sequence is roughly as follows. Begin by inventorying every dataset used in training, fine-tuning, or substantial modification of every covered system. The legal definition of "train" explicitly includes testing, validation, and fine-tuning, so RLHF data, evaluation sets, and reward-model training data are all in scope, not just pretraining corpora. Then categorize: cluster datasets into logical groups (public web, licensed text, licensed media, user-contributed, synthetic, code, instruction-tuning). For each group, answer the twelve enumerated questions. Where any answer is genuinely sensitive, document the redaction and its specific justification. Have legal review the draft against trade-secret risk, with attention to whether your redactions hold up against the "but your competitor disclosed it" comparator. Publish on a stable, easy-to-find URL on your developer-facing website. Add a date and version. Update on substantial modifications. This guide's companion piece, the step-by-step inventory walkthrough, covers the audit phase in more depth, and the high-level summary writing guide covers tone and structure.

The retroactivity problem and how to handle it

AB 2013 reaches backward — it covers any covered system released or substantially modified on or after January 1, 2022. For models trained in 2022 or 2023, before formal data-provenance practices were widespread, retrospective documentation is genuinely difficult. Many engineering teams did not retain detailed dataset manifests, license records, or modification logs from that era. The statute does not provide grace for this. Three practical approaches have emerged. First, reconstruct from secondary sources — git history, dataset access logs, vendor invoices, and engineering Slack archives can often reconstruct enough to answer the twelve categories at a defensible level. Second, narrow the scope: a substantial modification (a major retraining or significant fine-tune) restarts the relevant disclosure clock for that version, so for many actively-developed systems the practical compliance horizon is the most recent training run, not the original 2022 one. Third, where reconstruction is impossible, disclose the gap: an honest statement that records prior to a specific date are incomplete is better than a fabricated retrospective and is consistent with the statute's spirit. Goodwin's post-effective-date analysis flags this as one of the most contested compliance areas to watch.

How AB 2013 fits with the rest of California's 2026 AI regime

AB 2013 is one of five statutes you should think about as a single compliance posture. It governs training data transparency. SB 942 governs watermarking and provenance for AI-generated output. SB 53 (the TFAIA, also effective January 1, 2026) governs catastrophic-risk safety frameworks for frontier developers above the 10^26 FLOP threshold. The CCPA/ADMT regulations from the California Privacy Protection Agency govern automated decision-making technology used for significant decisions. AB 489 and AB 3030 govern healthcare-specific AI behavior. For most generative AI products available to Californians, AB 2013 and SB 942 apply at minimum; if your product makes significant decisions about people, ADMT layers in; if it ships into healthcare, AB 489 and AB 3030 layer in; if you are a frontier developer, SB 53 sits on top. Our 2026 California AI Compliance Roadmap walks the combined sequencing.

What to do this week

If you ship a generative AI system to Californians and have not published an AB 2013 disclosure, the law is already in force and the AG's office is reportedly building in-house AI expertise to evaluate disclosures. The minimum-viable path to compliance is: produce a structured document covering the twelve enumerated categories for your covered systems, post it at a stable URL on your developer site, link it from your privacy policy or developer documentation index, and version it. The structure is the easy part — our free Training Data Transparency Generator outputs the twelve-category skeleton so you do not omit required fields. The hard part is the substantive content, which has to come from your team. Block out engineering, legal, and data-governance time, draft for the categories above, and treat the published version as a living document that updates with substantial modifications.

Sources

The primary statute is AB 2013 on California Legislative Information. Post-effective-date implementation analysis comes from Goodwin Procter's January 2026 alert, Perkins Coie's compliance guide, and the post-xAI ruling analysis from Hintze Law via Mondaq. The xAI litigation is tracked by the IAPP's xAI v. Bonta explainer. The constitutional commentary from the Institute for Law & AI provides additional doctrinal context. Watch the IAPP and Goodwin pages for updates on the merits ruling, which is expected later in 2026, and on any AG-issued interpretive guidance that narrows or clarifies what "high-level" requires.

Generate your AB 2013 disclosure structure for free

Our Training Data Transparency Generator outputs a document scaffold organized around the twelve required categories — so you produce a complete, properly-structured disclosure in minutes, not hours. The substantive content is yours to write; the structure is on us.

Open the AB 2013 Transparency Generator →

Frequently Asked Questions

What is California AB 2013 and when did it take effect?

AB 2013, formally the Generative Artificial Intelligence: Training Data Transparency Act (TDTA), is a California law authored by Assemblymember Jacqui Irwin and signed by Governor Newsom on September 28, 2024. It took effect January 1, 2026. It requires developers of generative AI systems made publicly available to Californians since January 1, 2022 to publish a high-level summary of their training data on their website. Enforcement runs through California's Unfair Competition Law, at the Attorney General's discretion.

Who has to comply with AB 2013?

Any person, partnership, government agency, or corporation that designs, codes, produces, or substantially modifies a generative AI system or service made publicly available to Californians, regardless of whether the use involves payment. The statute defines 'train' to include testing, validation, and fine-tuning, which means downstream developers who fine-tune third-party foundation models can also be covered. There are narrow exemptions only for systems used solely for security and integrity, aircraft operation in national airspace, and national-security or military systems made available only to federal entities.

What are the 12 required disclosure categories under AB 2013?

The statute enumerates twelve items the high-level summary must address. They are: the sources or owners of the datasets; how the datasets further the system's intended purpose; the number of data points (which can be a range or estimate for dynamic datasets); a description of the types of data points; whether the datasets include data protected by copyright, trademark, or patent or are entirely public-domain; whether the data was purchased or licensed; whether personal information as defined by the CCPA is included; whether aggregate consumer information is included; the time period during which the data was collected; the dates of first use; whether the developer modified or cleaned the data; and whether the system used or continuously uses synthetic data generation.

What does 'high-level' mean for the summary?

The statute does not define it, and that ambiguity is at the center of the xAI litigation. In practice, OpenAI and Anthropic have published disclosures on their websites that are general but enumerate the categories required by the statute. A federal judge denied xAI's preliminary injunction motion on March 4, 2026, partly because xAI's own competitors had managed to comply without revealing what xAI argued were trade secrets. The practical floor seems to be: name the categories, give honest descriptive answers, redact only what you can specifically justify, and document what you redacted and why.

What happened with the xAI lawsuit against AB 2013?

xAI filed a federal complaint against California Attorney General Rob Bonta on December 29, 2025, four days before AB 2013 took effect, arguing that the law violates the Fifth Amendment's Takings Clause, the First Amendment's prohibition on compelled speech, and the Due Process Clause's vagueness doctrine. xAI sought a preliminary injunction. On March 4, 2026, Judge Jesus G. Bernal denied the motion, finding xAI had not shown a likelihood of success on any of the three constitutional theories. The underlying case continues but enforcement was not blocked. xAI subsequently filed a parallel challenge against the Colorado AI Act on April 9, 2026.

How does AB 2013 handle trade secrets?

Notably, the statute contains no trade-secret exemption. This is one of the most-discussed compliance challenges. The xAI litigation argues that this omission is unconstitutional, but the March 4 ruling suggested courts will require specific evidence of competitive harm rather than generalized trade-secret claims. The practical guidance from Goodwin and IAPP analysis is to disclose at the level of category and characteristic rather than at the level of specific dataset names or curation methodologies, and to document the rationale for any redactions. Consult counsel before publishing if your training data strategy is itself the moat.

What is the penalty for not posting an AB 2013 summary?

AB 2013 itself does not specify financial penalties. Enforcement runs through California's Unfair Competition Law (UCL), Business and Professions Code § 17200, which can be invoked by the Attorney General, district attorneys, certain city attorneys, and — for actual injury — private plaintiffs. UCL penalties are up to $2,500 per violation, but the more material exposure for AI vendors is class-action exposure and the reputational harm of being named in an enforcement action. Several legal commentators have flagged the risk of plaintiffs' attorneys mining published AB 2013 disclosures for ammunition in copyright and privacy litigation.

Does AB 2013 apply retroactively?

Yes — and this is one of its most aggressive features. The statute applies to any generative AI system or service released or substantially modified on or after January 1, 2022. That means developers must produce documentation about training runs that completed up to four years before the law was enacted. For models trained before formal data-provenance practices were standard, retrospective documentation is genuinely difficult. The retroactivity is one of the constitutional pressure points in the xAI litigation, on the theory that pre-statute training was conducted under a settled expectation of confidentiality.

Can I just use a template or generator?

A generator gets you the structural skeleton — the twelve required categories, the right field names, a defensible format — but it cannot supply the content. Each category requires honest answers specific to your system. Use a generator to produce the document structure and avoid omitting required fields, then fill it with content that your engineering, legal, and data-governance teams can actually attest to. Our free transparency generator at the link below produces the structure; the substantive disclosures are yours to write.

AB 2013 Documentation Guide: How to Write a Compliant Training Data Summary in 2026

What AB 2013 actually says (and what it doesn't)

The 12 required disclosure categories, in plain language

How "high-level" is high-level? The xAI litigation as a real-world test

A working AB 2013 documentation process

The retroactivity problem and how to handle it

How AB 2013 fits with the rest of California's 2026 AI regime

What to do this week

Sources

Generate your AB 2013 disclosure structure for free

Frequently Asked Questions

Related Articles

AB 2013: MedTech Training Data Transparency Guide

AB 2013: MedTech Training Data Transparency

AB 2013: Is Your AI 'Substantially Modified'?

Writing an AB 2013 Summary Without Leaking IP

Is Your AI Training Data Legal? 2026 AB 2013 Deadline

AB 2013 vs GDPR: AI Training Data Transparency Requirements Compared

Is Your AI Compliant?

2026 Legislative Tracker

Transparency in Frontier AI

Training Data Transparency

AI Watermarking (per AB 853)

Healthcare AI Disclosure

Companion Chatbot Safety

Autonomous AI Defense

Safe & Secure Innovation

Related Articles

AB 2013: MedTech Training Data Transparency Guide
The secret sauce of your AI model might have to go public. Here’s what AB 2013 requires.

AB 2013: MedTech Training Data Transparency
The 'Black Box' is opening. California now requires you to show your training data.

AB 2013: Is Your AI 'Substantially Modified'?
Updating your model in 2025? You might have just triggered a mandatory public disclosure.

Writing an AB 2013 Summary Without Leaking IP
California’s AB 2013 requires training data transparency by 2026. Here is how to disclose without losing your IP.

Is Your AI Training Data Legal? 2026 AB 2013 Deadline
2026 feels far away, but data provenance takes years to document. Start now.

AB 2013 vs GDPR: AI Training Data Transparency Requirements Compared
Both require AI transparency — but in different ways. GDPR gives individuals rights over data processing; AB 2013 requires a public company-level training data disclosure. GDPR compliance does not satisfy AB 2013.