Training Data Transparency: Preparing for California’s AB 2013

The secret sauce of your AI model might have to go public. Here’s what AB 2013 requires. 📂

The Requirement

AB 2013, the Generative AI Training Data Transparency Act, mandates that developers of Generative AI systems post a high-level summary of their training data. This applies to any system made available to Californians on or after January 1, 2026.

For MedTech companies, this is particularly sensitive. Your training data—often curated clinical datasets—is a key competitive advantage. The law requires you to disclose the sources of this data, how it was processed, and whether it includes personal information or copyrighted material.

Balancing Transparency and IP

The good news is that the law asks for a "high-level summary," not a detailed line-item list. You don't need to list every patient record or every specific medical journal article.

Acceptable Disclosure: "The model was trained on de-identified MRI scans from three major US academic medical centers, licensed medical textbooks, and publicly available biomedical literature (PubMed)."

Too Vague: "The model was trained on medical data."

Too Detailed (Risky): "The model was trained on the specific patient records of Dr. Smith's clinic in Los Angeles between 2020 and 2024." (This risks re-identification and IP theft).

When to Post

The disclosure must be available on your website before the system is released to the public in California. For existing systems that are "substantially modified" after the effective date, you must update your disclosure.

Conclusion

Prepare your data summary now. It's a new requirement for doing business in the state. Work with your legal team to craft a disclosure that satisfies the regulator without giving away your trade secrets.

Frequently Asked Questions (FAQ)

Does this apply to models trained years ago?

Yes, if the model is still in use and available to Californians after January 1, 2026. The law has a retroactive element regarding the data used for current models.

What if I fine-tune an open-source model like Llama 3?

You are responsible for disclosing the data used for your fine-tuning. You may also need to link to the original model's transparency documentation.

Is there a penalty for non-compliance?

Yes, the California Attorney General can enforce this law. While specific fine amounts are often determined by the court, they can be substantial, and the reputational damage of being labeled "non-transparent" can be even worse.

Training Data Transparency: Preparing for California’s AB 2013

The Requirement

Balancing Transparency and IP

When to Post

Conclusion

Frequently Asked Questions (FAQ)

Does this apply to models trained years ago?

What if I fine-tune an open-source model like Llama 3?

Is there a penalty for non-compliance?

Is Your AI Compliant?

2026 Legislative Tracker

Transparency in Frontier AI

Training Data Transparency

AI Watermarking

Safe & Secure Innovation