Is Your AI Training Data Legal? Preparing for the 2026 AB 2013 Deadline
2026 feels far away, but data provenance takes years to document. Start now. ⏳
The Timeline
AB 2013 takes full effect in 2026. However, it applies to data used to train models released after Jan 1, 2022, if they are still in use. This retroactive lookback means you are likely already on the hook for data you used years ago.
The Audit
You need to perform a "data archaeology" project.
- Contracts: Find the contracts for data you bought in 2023. Do they allow for AI training?
- Scraping Logs: If you scraped the web, do you have records of what domains you hit and whether you respected robots.txt?
- Consent Forms: For patient data, pull the specific consent forms used at the time of collection.
Legal Risk
If you can't prove where your data came from, you face two risks:
- Regulatory Fines: For failing to comply with AB 2013.
- Copyright/Privacy Lawsuits: If you disclose "we scraped the internet," you invite lawsuits from artists and patients. If you don't disclose, you violate AB 2013. It's a catch-22 that requires careful legal strategy.
Conclusion
Start your data audit today. It will take longer than you think to find those old contracts and logs.
Frequently Asked Questions (FAQ)
What if I lost the records?
You may need to retrain your model on data you can verify. This is expensive but safer than lying to the regulator.
Can I just say "Unknown"?
No. The law requires a summary. Saying "we don't know" is an admission of negligence and non-compliance.
Does this apply to open-source datasets?
Yes. Just because a dataset is on HuggingFace doesn't mean it's legal. You still need to disclose that you used it.