How to Draft an AB 2013 Training Data Summary (With Template)
AB 2013 requires developers of Generative AI systems to post a "high-level summary" of their training datasets. But what exactly counts as "high-level," and how much detail is too much?
The Requirement
By January 1, 2026 (and annually thereafter), developers must publish a summary on their website that describes the data used to train or fine-tune their GenAI system.
What Must Be Included?
The law specifies several key categories that must be addressed:
- Sources: Where did the data come from? (e.g., "Publicly available internet scrapes," "Licensed stock image libraries," "Proprietary internal datasets").
- Data Types: What kind of data is it? (e.g., text, images, audio, code).
- IP Status: Does the data include copyrighted works, or is it public domain?
- Personal Information: Was PII (Personally Identifiable Information) included, and was it scrubbed?
- Purchase/License: Did you buy the data or collect it yourself?
Downloadable Template
Use this structure to create your disclosure page.
// AB 2013 Transparency Report
System Name: [Model Name]
Version: [Version Number]
Date: [Date]
1. Data Sources
- 40% Common Crawl (Web scrape)
- 30% Licensed Partner Data (Books3 subset)
- 30% Synthetic Data generated by [Model X]
2. Intellectual Property
This dataset contains copyrighted material used under [Fair Use / License Agreements].
3. Personal Information Handling
We utilized automated PII redaction tools (Presidio) to remove names, SSNs, and addresses prior to training.
Strategic Considerations
"High-Level" vs. Specific
The law asks for a "summary," not a file manifest. You do not need to list every single URL or book title. However, you cannot be vague. Saying "We used data from the internet" is insufficient and likely non-compliant. You must categorize the sources (e.g., "News articles," "Social media posts," "Academic journals").
Placement
The summary must be easily accessible from your website's footer or a dedicated "AI Safety" page. It should not be buried in a PDF or Terms of Service document.
Conclusion
Transparency is now a competitive advantage. A clear, honest AB 2013 summary builds trust with users and regulators alike. Use the template above to get started, but always consult with counsel to ensure your specific data practices are accurately reflected.