Is Your LLM Training Data Legal? The AB 2013 High-Level Summary Guide

Your training data is no longer a secret. Here's how to survive California's AB 2013. 📂

The New Transparency Mandate

AB 2013 forces developers to pull back the curtain on their training data. It is no longer acceptable to simply say "internet data." You must provide a high-level summary that identifies the sources and categories of data used.

What Needs to be Disclosed?

  • Data Sources: Where did the data come from? (e.g., Common Crawl, licensed datasets, user-generated content).
  • Data Categories: What kind of data is it? (e.g., text, images, code, medical records).
  • Copyright Status: Does the data include copyrighted material?
  • Personal Information: Does it contain PII?

Creating Your Summary

The key is to be descriptive without being exhaustive. You don't need to list every URL, but you must characterize the dataset accurately.

Conclusion

Start auditing your data pipelines now. Retrospective documentation is painful and prone to errors.

Is Your AI Compliant?

Don't guess. Use our free calculator to check your AB 489 & AB 3030 status in minutes.

Start Free Compliance Check

2026 Legislative Tracker

Live status of California AI regulations.

SB 53Enacted

Transparency in Frontier AI

Effective: Jan 1, 2026
AB 2013Deadline Approaching

Training Data Transparency

Effective: Jan 1, 2026
SB 942Enacted

AI Watermarking

Effective: Jan 1, 2026
SB 1047Vetoed

Safe & Secure Innovation

Effective: N/A