AB 2013 vs Trade Secrets: How to Disclose Training Data Without Giving Away Your IP

California AB 2013, the Generative Artificial Intelligence Training Data Transparency Act effective January 1, 2026, requires generative AI developers to publish a high-level summary of their training data covering twelve specific statutory categories — but the statute's "high-level" framing, the absence of any requirement to disclose model architecture or training methodology, and the unsuccessful trade-secret challenge in xAI v. Bonta together create real working room for AI developers to satisfy the transparency obligation without compromising the competitive IP that distinguishes their products. The puzzle is real but the answer is generally available: draft the summary at the right level of generality from the start, characterize data categorically rather than specifically, and rely on the structural distinction between data (which AB 2013 requires you to summarize) and methodology (which it does not). This article walks through where the line actually sits, how the xAI v. Bonta litigation has clarified the terrain, and what drafting discipline lets compliance and competitive secrecy coexist.

What AB 2013 actually requires (and what it doesn't)

AB 2013 requires every covered generative AI developer to publish, on the developer's internet website, a high-level summary of the data used to train the AI system or service. The summary must address twelve statutory categories: the sources or ownership of the data; whether the data sets include data subject to copyright, trademark, or patent protection; whether the data sets were purchased or licensed; whether the data sets include personal information; whether the data sets include aggregate consumer information; whether there was any cleaning, processing, or modification of the data sets and a high-level description of those activities; the time period during which the data was collected; the date the data sets were first used during development; whether the AI system or service uses or continuously uses synthetic data generation; the purpose of the data in training; whether data sets include data the developer purchased or licensed in the form of training data sets specifically formatted for AI training; and a description of any source of any other type of data.

The structural reality is that the twelve categories address what data was used, not how the data was used. AB 2013 does not require disclosure of model architecture, training hyperparameters, optimization regimes, loss functions, fine-tuning procedures, or RLHF processes. It does not require disclosure of evaluation methodologies, alignment techniques, or safety training approaches. Each of those categories of information remains a trade secret as far as AB 2013 is concerned, even if a developer chooses to disclose any of them voluntarily. The trade-secret puzzle is therefore largely a question of how to describe the data without inadvertently disclosing methodology, because methodology is what most AI companies actually consider their core competitive moat.

The structural distinction: categories versus implementations

The general rule emerging from practitioner analysis is that AB 2013 requires disclosure of categories and characteristics, not implementations and specific identifiers. A compliant summary describes "publicly available web crawl data including academic papers, news articles, and reference works" — not "a 2.3 TB Common Crawl snapshot from August 2023 with custom filters removing X, Y, and Z and supplemented by a proprietary deduplication pipeline." The first description satisfies the statute by characterizing the data at the level of generality the statute's "high-level summary" framing contemplates. The second description discloses operational specifics that constitute trade secrets — the specific data version, the custom filtering pipeline, the deduplication methodology — and goes beyond what the statute actually requires.

The drafting discipline that makes this work has three rules. First, describe data sources as categories rather than identifying specific datasets, vendors, or content providers. "Licensed text and image datasets from third-party data providers" is a category; "LAION-5B and Anthropic-licensed dataset XYZ" is a list of identifiers. Second, describe data processing at the level of activity types rather than methodology. "Cleaning included deduplication, profanity filtering, and quality scoring" is an activity-type description; "a custom semantic deduplication pipeline using cosine similarity over MinHash signatures with threshold X" is methodology that goes well beyond what the statute requires. Third, describe time periods and provenance characteristics generally rather than specifically. "Data collected between 2018 and 2024 with rolling updates" satisfies the statute; "data collected from May 14, 2018 through November 3, 2024 in batches of approximately 50 GB per month" is operational specificity beyond the statutory floor.

Compliance teams typically draft summaries at the higher level and then have legal review confirm that no implementation details have inadvertently leaked through during drafting. The review pass often catches small specificity creep — a sentence that names a specific vendor when the category description would have sufficed, a methodology detail that wandered into what should have been a data description, a time-period precision that was tighter than necessary. Each of those is correctable through targeted re-drafting; what matters is making the review pass mandatory rather than optional.

The xAI v. Bonta case and what the preliminary injunction denial signals

xAI Corporation filed suit in the Northern District of California on December 29, 2025, challenging AB 2013 on First Amendment compelled-speech grounds and alleging that the disclosure requirements would force disclosure of trade secrets. xAI sought a preliminary injunction to block enforcement before the January 1, 2026 effective date. On March 4, 2026, the court denied the preliminary injunction motion, finding that xAI had not demonstrated likelihood of success on the merits or irreparable harm sufficient to justify pre-enforcement relief.

The denial does not finally resolve the litigation — the case continues, and the merits adjudication may produce a different result — but the preliminary injunction analysis is informative for compliance planning. The court's reasoning indicated that the high-level summary requirement does not on its face require disclosure of trade-secret-protected material, because the "high-level" framing of the summary leaves room for developers to describe their data at a level of generality that does not compromise specific operational details. That signals a judicial reading consistent with the categories-not-implementations drafting discipline most practitioners have been recommending. Compliance plans built around the assumption that AB 2013 would be enjoined and not enter into force need to be revised; the law is operative and developers need to comply.

The litigation also clarifies that compliance is the safer posture even if a developer believes the law is constitutionally questionable. A developer who refuses to publish a summary because of constitutional objections faces enforcement exposure now, and would only avoid that exposure if the law were ultimately struck down — a multi-year litigation timeline. A developer who publishes a carefully drafted high-level summary satisfies the operative law while preserving any constitutional arguments for separate adjudication if specific enforcement actions raise them. The dual-track approach is the practitioner consensus and the appropriate response to the post-injunction-denial environment.

Specific drafting techniques for the twelve categories

Working through the twelve statutory categories with the drafting discipline above produces a summary structure that satisfies AB 2013 without compromising trade secrets. For sources or ownership of data, describe categories of source and arrangements rather than specific datasets — "publicly available web data, licensed datasets from commercial providers, and contributed data with explicit user consent" is the right level. For copyright/trademark/patent presence, the disclosure is binary at the high level — yes, the data sets include such material; no, they do not — without identifying specific protected works. For purchase or license, the disclosure is also binary; specific vendor identities and license terms are not required.

For personal information presence, the disclosure is again binary, with the additional CCPA-aligned characterization of whether sensitive personal information is included. For aggregate consumer information, similarly binary. For cleaning and processing activities, the high-level description is the appropriate framing — describe activity types (deduplication, filtering, quality scoring, format normalization) without describing the specific methods or thresholds used. For collection time period, a year-range is sufficient — "between 2018 and 2024" — and a precise date range is unnecessary detail. For first-use date, a year is sufficient. For synthetic data generation, the disclosure is binary plus a category-level description of how synthetic data is used in training.

For purpose of the data in training, the description should be functional rather than methodological — "data was used for pretraining the foundation model and for subsequent fine-tuning to support specific application capabilities" — without describing what those capabilities are at the architectural level. For training-data-specifically-formatted purchase, binary plus a category-level description. For other data, a residual category that captures any source not already addressed. The overall summary typically runs four to ten pages of structured prose covering all twelve categories with appropriate headings and the categorical-not-specific drafting throughout.

What licensing agreements should now contain

Where training data is purchased or licensed from third-party providers, AB 2013 creates a contractual ripple effect that compliance teams should address proactively. The summary's "data sources" description does not require naming specific vendors, but it does require characterizing the data's origin in some form — and where licensing agreements with vendors include confidentiality provisions that prohibit even category-level disclosure, those provisions create a tension between contractual obligations and statutory disclosure requirements.

The practical resolution emerging in the market is that data licensing agreements signed in 2025 and later increasingly include explicit AB 2013 carve-outs — provisions that permit the licensee to make the high-level categorical disclosures AB 2013 requires without breaching confidentiality, while preserving confidentiality for more specific operational details. For agreements signed before AB 2013 was on the radar, an amendment process to add the carve-out is generally cooperative because vendors face the same compliance pressure indirectly: a customer that cannot make required disclosures may stop buying. Vendors that resist amendment requests are creating their own competitive risk.

Where the licensing agreement also includes provisions specifying particular content within the data, the disclosure obligation becomes more sensitive because the summary may need to describe characteristics of the licensed content (whether it includes copyrighted material, personal information, etc.) at a level that the licensing agreement may have intended to keep confidential. The drafting discipline here is to map the AB 2013 categories to the licensing agreement's confidentiality provisions explicitly during legal review, identify any conflicts, and resolve them through amendment or through deliberate categorical-level disclosure that does not breach confidentiality.

How AB 2013 fits with the broader California AI compliance picture

AB 2013's training-data transparency obligation is one of three core California AI disclosure regimes that operate together for many developers. SB 942 (the California AI Transparency Act) covers content provenance — disclosing that AI generated a specific piece of content. SB 53 (the Transparency in Frontier Artificial Intelligence Act) covers safety frameworks for large frontier developers. AB 2013 covers training data. A frontier AI developer with a consumer product can easily be subject to all three, with the AB 2013 disclosure being the most likely to raise trade secret concerns because the underlying subject matter (the data corpus) is both more central to competitive differentiation and more concretely describable than safety frameworks or content provenance.

For the operational compliance how-to, see our companion AB 2013 documentation guide. For the artifact-format deep dive on what the published summary actually looks like, see our companion AB 2013 high-level summary guide. For the broader 2026 California AI compliance picture, see our 2026 California AI Compliance Roadmap.

Sources

The primary statute is AB 2013 on California Legislative Information. For practitioner analysis, Morgan Lewis's overview and Cooley's analysis are the most useful starting references. The xAI v. Bonta complaint documents the constitutional challenge, with the preliminary injunction denial of March 4, 2026 establishing the operative pre-enforcement landscape. The California Attorney General's legal advisory provides the regulator-side framing on AI transparency. Watch the merits adjudication in xAI v. Bonta and any pre-enforcement guidance from the AG's office for further clarification on the trade-secret line.

Generate your AB 2013 training data summary

Our AI Transparency Generator outputs an AB 2013 high-level summary structured around the twelve statutory categories with categorical-not-specific drafting that satisfies the disclosure obligation while protecting trade secret IP. Free, no signup, exports as PDF.

Open the AI Transparency Generator →

Frequently Asked Questions

What does AB 2013 require developers to disclose?
Generative AI developers covered by AB 2013 must publish a high-level summary on their website describing the data used to train the AI system or service, before making the system or service publicly available to Californians. The summary must address twelve specific categories defined by statute, including the sources and ownership of the data, whether the data is publicly available or licensed, the purpose of the data in training, the time period of data collection, and whether copyrighted, trademarked, or patented material was used. The disclosure obligation took effect January 1, 2026 and applies to any AI system or service released on or after January 1, 2022.
Does AB 2013 protect trade secrets?
The statute does not contain an explicit trade secret exemption, but the high-level summary requirement is intentionally pitched at a level of generality designed to satisfy transparency without requiring disclosure of competitively sensitive technical details. The summary must describe data sources and categories — not specific data points, proprietary preprocessing methods, training hyperparameters, model architecture, or other implementation details that constitute trade secrets. Most legal analysis concludes that a properly drafted AB 2013 summary can meet the statutory requirements without disclosing trade-secret-protected material, though the line is fact-specific and conservative drafting is the rule.
What's the line between disclosure and trade secret?
The general rule emerging from practitioner analysis is that AB 2013 requires disclosure of categories and characteristics, not implementations and identifiers. A compliant summary describes 'publicly available web crawl data including academic papers, news articles, and reference works' — not 'a 2.3 TB Common Crawl snapshot from August 2023 with custom filters removing X, Y, and Z.' The first description satisfies the statute; the second discloses operational specifics that constitute trade secrets. Compliance teams typically draft summaries at the higher level and then have legal review confirm that no implementation details have inadvertently leaked through.
What happened in xAI v. Bonta?
xAI Corporation filed suit in the Northern District of California on December 29, 2025, challenging AB 2013 on First Amendment compelled-speech grounds and alleging that the disclosure requirements would force disclosure of trade secrets. xAI sought a preliminary injunction to block enforcement. On March 4, 2026, the court denied the preliminary injunction motion, finding that xAI had not demonstrated likelihood of success on the merits or irreparable harm sufficient to justify pre-enforcement relief. The case continues, but the preliminary injunction denial signals that the court does not currently view AB 2013's high-level summary requirement as facially unconstitutional or as automatically requiring trade secret disclosure. Compliance plans built around the assumption that AB 2013 would be enjoined need to be revised.
How do I disclose 'data sources' without revealing my actual data sources?
Categorically, not specifically. AB 2013 requires disclosure of the sources of the data used to train the AI, and the practical drafting answer is to describe sources at a category level — 'publicly available web data,' 'licensed text and image datasets from third-party data providers,' 'publicly available code repositories,' 'user-contributed data with consent' — without naming specific datasets, vendors, or content providers. Where a specific source is genuinely identifiable (Common Crawl, Wikipedia, certain widely-known academic corpora), naming it does not generally compromise trade secrets because the source is already public. Where the source is a proprietary licensing arrangement, categorical description is the protective approach.
What about purchased or licensed datasets — can I keep the vendor confidential?
Generally yes, but with attention to the licensing terms. AB 2013 does not require disclosure of specific data vendor names or licensing terms; the requirement is to disclose whether data is purchased or licensed and to characterize the data's origin and ownership. A summary that says 'training data includes images licensed from third-party stock photography providers' satisfies the statutory requirement without identifying the specific vendors. Where licensing agreements include confidentiality provisions, those provisions typically remain enforceable — AB 2013 does not override existing contractual confidentiality. Where licensing agreements do not include confidentiality provisions, vendors increasingly request that they be added precisely to manage AB 2013 disclosure exposure.
How do I describe model architecture and training methodology without giving away IP?
AB 2013 does not require disclosure of model architecture or training methodology — only of training data. The line is the data itself versus how the model uses it. A compliant summary describes what data was used; it does not describe how the model was trained on that data, what loss functions were used, what optimization regime was applied, or what architectural choices were made. Compliance teams sometimes inadvertently include methodology details in the data summary, which both increases trade secret exposure and expands the disclosure beyond what the statute actually requires. The discipline is to keep the data summary focused on data and to address methodology questions, when they arise from customers or partners, through separate channels with appropriate confidentiality.
What if my training data is itself confidential?
Some training data is genuinely confidential — synthetic data with proprietary generation methodologies, customer-derived data with confidentiality obligations to the customers, internally generated data with significant R&D investment. AB 2013 still requires a summary, but the summary can characterize the data at a level that does not breach confidentiality. 'Synthetically generated data using proprietary methods' is a compliant description that does not reveal the methods. 'Aggregated and de-identified user interaction data with explicit consent' is a compliant description that does not identify specific users. The drafting goal is to characterize the data category truthfully without specifying the protected details.
Should I redact my AB 2013 summary?
Generally no. AB 2013 contemplates a published summary, not a publicly redacted document. A summary with extensive redactions invites regulator scrutiny and signals that something material was withheld. The better approach is to draft the summary at the right level of generality from the start — categorical rather than specific — so that no redaction is needed because nothing trade-secret-protected was ever included. If genuine concerns about specific elements arise, the right response is to refine the wording, not to publish a redacted version. The summary you publish is the summary you defend; making the wording precise from the start is dramatically easier than defending a redaction strategy.

Related Articles

More on the same topics — California AI laws, healthcare compliance, and the rules behind them.

Is Your AI Compliant?

Don't guess. Use our free calculator to check your AB 489 & AB 3030 status in minutes.

Start Free Compliance Check

2026 Legislative Tracker

Live status of California AI regulations.

SB 53In Force

Transparency in Frontier AI

Effective: Jan 1, 2026
AB 2013In Force

Training Data Transparency

Effective: Jan 1, 2026
SB 942Upcoming

AI Watermarking (per AB 853)

Effective: Aug 2, 2026
AB 3030In Force

Healthcare AI Disclosure

Effective: Jan 1, 2025
SB 243In Force

Companion Chatbot Safety

Effective: Jan 1, 2026
AB 316In Force

Autonomous AI Defense

Effective: Jan 1, 2026
SB 1047Vetoed

Safe & Secure Innovation

Effective: N/A