How to Draft an AB 2013 Training Data Summary (With Template)

AB 2013 requires developers of Generative AI systems to post a "high-level summary" of their training datasets. But what exactly counts as "high-level," and how much detail is too much?

The Requirement

By January 1, 2026 (and annually thereafter), developers must publish a summary on their website that describes the data used to train or fine-tune their GenAI system.

What Must Be Included?

The law specifies several key categories that must be addressed:

  1. Sources: Where did the data come from? (e.g., "Publicly available internet scrapes," "Licensed stock image libraries," "Proprietary internal datasets").
  2. Data Types: What kind of data is it? (e.g., text, images, audio, code).
  3. IP Status: Does the data include copyrighted works, or is it public domain?
  4. Personal Information: Was PII (Personally Identifiable Information) included, and was it scrubbed?
  5. Purchase/License: Did you buy the data or collect it yourself?

Downloadable Template

Use this structure to create your disclosure page.

// AB 2013 Transparency Report

System Name: [Model Name]

Version: [Version Number]

Date: [Date]


1. Data Sources

- 40% Common Crawl (Web scrape)

- 30% Licensed Partner Data (Books3 subset)

- 30% Synthetic Data generated by [Model X]


2. Intellectual Property

This dataset contains copyrighted material used under [Fair Use / License Agreements].


3. Personal Information Handling

We utilized automated PII redaction tools (Presidio) to remove names, SSNs, and addresses prior to training.

Strategic Considerations

"High-Level" vs. Specific

The law asks for a "summary," not a file manifest. You do not need to list every single URL or book title. However, you cannot be vague. Saying "We used data from the internet" is insufficient and likely non-compliant. You must categorize the sources (e.g., "News articles," "Social media posts," "Academic journals").

Placement

The summary must be easily accessible from your website's footer or a dedicated "AI Safety" page. It should not be buried in a PDF or Terms of Service document.

Conclusion

Transparency is now a competitive advantage. A clear, honest AB 2013 summary builds trust with users and regulators alike. Use the template above to get started, but always consult with counsel to ensure your specific data practices are accurately reflected.

Is Your AI Compliant?

Don't guess. Use our free calculator to check your AB 489 & AB 3030 status in minutes.

Start Free Compliance Check

2026 Legislative Tracker

Live status of California AI regulations.

SB 53Enacted

Transparency in Frontier AI

Effective: Jan 1, 2026
AB 2013Deadline Approaching

Training Data Transparency

Effective: Jan 1, 2026
SB 942Enacted

AI Watermarking

Effective: Jan 1, 2026
SB 1047Vetoed

Safe & Secure Innovation

Effective: N/A