AB 2013 vs Trade Secrets: How to Disclose Training Data Without Giving Away Your IP
California AB 2013, the Generative Artificial Intelligence Training Data Transparency Act effective January 1, 2026, requires generative AI developers to publish a high-level summary of their training data covering twelve specific statutory categories — but the statute's "high-level" framing, the absence of any requirement to disclose model architecture or training methodology, and the unsuccessful trade-secret challenge in xAI v. Bonta together create real working room for AI developers to satisfy the transparency obligation without compromising the competitive IP that distinguishes their products. The puzzle is real but the answer is generally available: draft the summary at the right level of generality from the start, characterize data categorically rather than specifically, and rely on the structural distinction between data (which AB 2013 requires you to summarize) and methodology (which it does not). This article walks through where the line actually sits, how the xAI v. Bonta litigation has clarified the terrain, and what drafting discipline lets compliance and competitive secrecy coexist.
What AB 2013 actually requires (and what it doesn't)
AB 2013 requires every covered generative AI developer to publish, on the developer's internet website, a high-level summary of the data used to train the AI system or service. The summary must address twelve statutory categories: the sources or ownership of the data; whether the data sets include data subject to copyright, trademark, or patent protection; whether the data sets were purchased or licensed; whether the data sets include personal information; whether the data sets include aggregate consumer information; whether there was any cleaning, processing, or modification of the data sets and a high-level description of those activities; the time period during which the data was collected; the date the data sets were first used during development; whether the AI system or service uses or continuously uses synthetic data generation; the purpose of the data in training; whether data sets include data the developer purchased or licensed in the form of training data sets specifically formatted for AI training; and a description of any source of any other type of data.
The structural reality is that the twelve categories address what data was used, not how the data was used. AB 2013 does not require disclosure of model architecture, training hyperparameters, optimization regimes, loss functions, fine-tuning procedures, or RLHF processes. It does not require disclosure of evaluation methodologies, alignment techniques, or safety training approaches. Each of those categories of information remains a trade secret as far as AB 2013 is concerned, even if a developer chooses to disclose any of them voluntarily. The trade-secret puzzle is therefore largely a question of how to describe the data without inadvertently disclosing methodology, because methodology is what most AI companies actually consider their core competitive moat.
The structural distinction: categories versus implementations
The general rule emerging from practitioner analysis is that AB 2013 requires disclosure of categories and characteristics, not implementations and specific identifiers. A compliant summary describes "publicly available web crawl data including academic papers, news articles, and reference works" — not "a 2.3 TB Common Crawl snapshot from August 2023 with custom filters removing X, Y, and Z and supplemented by a proprietary deduplication pipeline." The first description satisfies the statute by characterizing the data at the level of generality the statute's "high-level summary" framing contemplates. The second description discloses operational specifics that constitute trade secrets — the specific data version, the custom filtering pipeline, the deduplication methodology — and goes beyond what the statute actually requires.
The drafting discipline that makes this work has three rules. First, describe data sources as categories rather than identifying specific datasets, vendors, or content providers. "Licensed text and image datasets from third-party data providers" is a category; "LAION-5B and Anthropic-licensed dataset XYZ" is a list of identifiers. Second, describe data processing at the level of activity types rather than methodology. "Cleaning included deduplication, profanity filtering, and quality scoring" is an activity-type description; "a custom semantic deduplication pipeline using cosine similarity over MinHash signatures with threshold X" is methodology that goes well beyond what the statute requires. Third, describe time periods and provenance characteristics generally rather than specifically. "Data collected between 2018 and 2024 with rolling updates" satisfies the statute; "data collected from May 14, 2018 through November 3, 2024 in batches of approximately 50 GB per month" is operational specificity beyond the statutory floor.
Compliance teams typically draft summaries at the higher level and then have legal review confirm that no implementation details have inadvertently leaked through during drafting. The review pass often catches small specificity creep — a sentence that names a specific vendor when the category description would have sufficed, a methodology detail that wandered into what should have been a data description, a time-period precision that was tighter than necessary. Each of those is correctable through targeted re-drafting; what matters is making the review pass mandatory rather than optional.
The xAI v. Bonta case and what the preliminary injunction denial signals
xAI Corporation filed suit in the Northern District of California on December 29, 2025, challenging AB 2013 on First Amendment compelled-speech grounds and alleging that the disclosure requirements would force disclosure of trade secrets. xAI sought a preliminary injunction to block enforcement before the January 1, 2026 effective date. On March 4, 2026, the court denied the preliminary injunction motion, finding that xAI had not demonstrated likelihood of success on the merits or irreparable harm sufficient to justify pre-enforcement relief.
The denial does not finally resolve the litigation — the case continues, and the merits adjudication may produce a different result — but the preliminary injunction analysis is informative for compliance planning. The court's reasoning indicated that the high-level summary requirement does not on its face require disclosure of trade-secret-protected material, because the "high-level" framing of the summary leaves room for developers to describe their data at a level of generality that does not compromise specific operational details. That signals a judicial reading consistent with the categories-not-implementations drafting discipline most practitioners have been recommending. Compliance plans built around the assumption that AB 2013 would be enjoined and not enter into force need to be revised; the law is operative and developers need to comply.
The litigation also clarifies that compliance is the safer posture even if a developer believes the law is constitutionally questionable. A developer who refuses to publish a summary because of constitutional objections faces enforcement exposure now, and would only avoid that exposure if the law were ultimately struck down — a multi-year litigation timeline. A developer who publishes a carefully drafted high-level summary satisfies the operative law while preserving any constitutional arguments for separate adjudication if specific enforcement actions raise them. The dual-track approach is the practitioner consensus and the appropriate response to the post-injunction-denial environment.
Specific drafting techniques for the twelve categories
Working through the twelve statutory categories with the drafting discipline above produces a summary structure that satisfies AB 2013 without compromising trade secrets. For sources or ownership of data, describe categories of source and arrangements rather than specific datasets — "publicly available web data, licensed datasets from commercial providers, and contributed data with explicit user consent" is the right level. For copyright/trademark/patent presence, the disclosure is binary at the high level — yes, the data sets include such material; no, they do not — without identifying specific protected works. For purchase or license, the disclosure is also binary; specific vendor identities and license terms are not required.
For personal information presence, the disclosure is again binary, with the additional CCPA-aligned characterization of whether sensitive personal information is included. For aggregate consumer information, similarly binary. For cleaning and processing activities, the high-level description is the appropriate framing — describe activity types (deduplication, filtering, quality scoring, format normalization) without describing the specific methods or thresholds used. For collection time period, a year-range is sufficient — "between 2018 and 2024" — and a precise date range is unnecessary detail. For first-use date, a year is sufficient. For synthetic data generation, the disclosure is binary plus a category-level description of how synthetic data is used in training.
For purpose of the data in training, the description should be functional rather than methodological — "data was used for pretraining the foundation model and for subsequent fine-tuning to support specific application capabilities" — without describing what those capabilities are at the architectural level. For training-data-specifically-formatted purchase, binary plus a category-level description. For other data, a residual category that captures any source not already addressed. The overall summary typically runs four to ten pages of structured prose covering all twelve categories with appropriate headings and the categorical-not-specific drafting throughout.
What licensing agreements should now contain
Where training data is purchased or licensed from third-party providers, AB 2013 creates a contractual ripple effect that compliance teams should address proactively. The summary's "data sources" description does not require naming specific vendors, but it does require characterizing the data's origin in some form — and where licensing agreements with vendors include confidentiality provisions that prohibit even category-level disclosure, those provisions create a tension between contractual obligations and statutory disclosure requirements.
The practical resolution emerging in the market is that data licensing agreements signed in 2025 and later increasingly include explicit AB 2013 carve-outs — provisions that permit the licensee to make the high-level categorical disclosures AB 2013 requires without breaching confidentiality, while preserving confidentiality for more specific operational details. For agreements signed before AB 2013 was on the radar, an amendment process to add the carve-out is generally cooperative because vendors face the same compliance pressure indirectly: a customer that cannot make required disclosures may stop buying. Vendors that resist amendment requests are creating their own competitive risk.
Where the licensing agreement also includes provisions specifying particular content within the data, the disclosure obligation becomes more sensitive because the summary may need to describe characteristics of the licensed content (whether it includes copyrighted material, personal information, etc.) at a level that the licensing agreement may have intended to keep confidential. The drafting discipline here is to map the AB 2013 categories to the licensing agreement's confidentiality provisions explicitly during legal review, identify any conflicts, and resolve them through amendment or through deliberate categorical-level disclosure that does not breach confidentiality.
How AB 2013 fits with the broader California AI compliance picture
AB 2013's training-data transparency obligation is one of three core California AI disclosure regimes that operate together for many developers. SB 942 (the California AI Transparency Act) covers content provenance — disclosing that AI generated a specific piece of content. SB 53 (the Transparency in Frontier Artificial Intelligence Act) covers safety frameworks for large frontier developers. AB 2013 covers training data. A frontier AI developer with a consumer product can easily be subject to all three, with the AB 2013 disclosure being the most likely to raise trade secret concerns because the underlying subject matter (the data corpus) is both more central to competitive differentiation and more concretely describable than safety frameworks or content provenance.
For the operational compliance how-to, see our companion AB 2013 documentation guide. For the artifact-format deep dive on what the published summary actually looks like, see our companion AB 2013 high-level summary guide. For the broader 2026 California AI compliance picture, see our 2026 California AI Compliance Roadmap.
Sources
The primary statute is AB 2013 on California Legislative Information. For practitioner analysis, Morgan Lewis's overview and Cooley's analysis are the most useful starting references. The xAI v. Bonta complaint documents the constitutional challenge, with the preliminary injunction denial of March 4, 2026 establishing the operative pre-enforcement landscape. The California Attorney General's legal advisory provides the regulator-side framing on AI transparency. Watch the merits adjudication in xAI v. Bonta and any pre-enforcement guidance from the AG's office for further clarification on the trade-secret line.
Generate your AB 2013 training data summary
Our AI Transparency Generator outputs an AB 2013 high-level summary structured around the twelve statutory categories with categorical-not-specific drafting that satisfies the disclosure obligation while protecting trade secret IP. Free, no signup, exports as PDF.
Open the AI Transparency Generator →