Training Data Transparency: Preparing for California’s AB 2013
The secret sauce of your AI model might have to go public. Here’s what AB 2013 requires. 📂
The Requirement
AB 2013, the Generative AI Training Data Transparency Act, mandates that developers of Generative AI systems post a high-level summary of their training data. This applies to any system made available to Californians on or after January 1, 2026.
For MedTech companies, this is particularly sensitive. Your training data—often curated clinical datasets—is a key competitive advantage. The law requires you to disclose the sources of this data, how it was processed, and whether it includes personal information or copyrighted material.
Balancing Transparency and IP
The good news is that the law asks for a "high-level summary," not a detailed line-item list. You don't need to list every patient record or every specific medical journal article.
Acceptable Disclosure: "The model was trained on de-identified MRI scans from three major US academic medical centers, licensed medical textbooks, and publicly available biomedical literature (PubMed)."
Too Vague: "The model was trained on medical data."
Too Detailed (Risky): "The model was trained on the specific patient records of Dr. Smith's clinic in Los Angeles between 2020 and 2024." (This risks re-identification and IP theft).
When to Post
The disclosure must be available on your website before the system is released to the public in California. For existing systems that are "substantially modified" after the effective date, you must update your disclosure.
Conclusion
Prepare your data summary now. It's a new requirement for doing business in the state. Work with your legal team to craft a disclosure that satisfies the regulator without giving away your trade secrets.
Frequently Asked Questions (FAQ)
Does this apply to models trained years ago?
Yes, if the model is still in use and available to Californians after January 1, 2026. The law has a retroactive element regarding the data used for current models.
What if I fine-tune an open-source model like Llama 3?
You are responsible for disclosing the data used for your fine-tuning. You may also need to link to the original model's transparency documentation.
Is there a penalty for non-compliance?
Yes, the California Attorney General can enforce this law. While specific fine amounts are often determined by the court, they can be substantial, and the reputational damage of being labeled "non-transparent" can be even worse.