The Death of the 'Black Box': California's New AI Training Disclosure Laws
Transparency is no longer optional. Is your model's data ready for public scrutiny? 🔍
The Era of Openness
For years, AI companies treated their training data as a trade secret. They would say "we trained on a massive proprietary dataset" and leave it at that. AB 2013 ends that era. The state of California has decided that the public has a right to know what is feeding the algorithms that make decisions about their lives.
What Must Be Disclosed?
Under AB 2013, you must disclose:
- Sources: Where did the data come from? (e.g., "Publicly scraped web data," "Licensed medical records").
- Processing: How was it cleaned? Did you remove PII? Did you use RLHF?
- Content: Does it contain copyrighted works? Does it contain personal information?
Public Trust
While this feels like a burden, it's an opportunity. In healthcare, trust is low. Showing that your medical AI was trained on high-quality, diverse, and ethically sourced data is a massive selling point. You can turn compliance into a marketing asset.
Conclusion
Embrace transparency. The black box is closed. If you have nothing to hide, this law is easy. If you have something to hide (like stolen data), you have a problem.
Frequently Asked Questions (FAQ)
Does this apply to small startups?
Yes. There is no "small business exemption" for AB 2013. If you make your system available to Californians, you must comply.
How detailed does the summary need to be?
"High-level." You don't need to list every URL. But you can't just say "internet data." You need to be descriptive enough for a user to understand the nature of the data.
When do I need to post this?
Before January 1, 2026, or before you release a new model after that date.