How the CMIA Now Governs Your Medical AI Training Sets

The CA Attorney General is watching: Patient data in AI models is now a high-stakes legal game. 🔍

Beyond HIPAA

Most health tech developers are familiar with HIPAA. But in California, the Confidentiality of Medical Information Act (CMIA) is the law you really need to watch. Unlike HIPAA, which applies only to "covered entities" (like hospitals and insurers), the CMIA applies to any business that maintains medical information.

This means that even if you are a small startup selling a wellness app or an AI tool directly to consumers, you are likely subject to CMIA strictures if you handle health data.

Training Data Risks

The intersection of CMIA and AI training is a legal minefield. Using CMIA-protected data to train an AI model without explicit consent can be a violation.

  • Consent: Did the patient consent to have their data used for "product improvement" or "AI training"? Standard privacy policies may not be enough.
  • De-identification: CMIA has strict standards for de-identification. If you strip names but leave enough data points to potentially re-identify a patient (e.g., rare disease + zip code), the data is still regulated.

The "Derivative Data" Problem

A major concern is "derivative data." If your AI model "memorizes" patient data during training and can be prompted to regurgitate it (a known issue with Large Language Models), the model itself might be considered a container of regulated medical information. This creates complex liability issues regarding data retention and deletion rights.

Conclusion

Audit your data pipeline. Ensure you have the right consents before you train. If you are buying data, verify that the vendor obtained it compliantly. In California, ignorance of the data's origin is not a defense.

Frequently Asked Questions (FAQ)

Are the penalties higher than HIPAA?

They can be. CMIA allows for a private right of action, meaning individuals can sue you directly for nominal damages ($1,000 per person) even without proving actual harm. Administrative fines can also be substantial.

Can I use synthetic data to avoid this?

Yes, using high-quality synthetic data is an excellent risk mitigation strategy. Since it does not correspond to real individuals, it falls outside the scope of CMIA.

Does CMIA apply to data from wearables?

Yes, if that data is derived from a device that measures medical or health conditions. Recent amendments and interpretations have broadened the scope to include many consumer health apps.

Is Your AI Compliant?

Don't guess. Use our free calculator to check your AB 489 & AB 3030 status in minutes.

Start Free Compliance Check

2026 Legislative Tracker

Live status of California AI regulations.

SB 53Enacted

Transparency in Frontier AI

Effective: Jan 1, 2026
AB 2013Deadline Approaching

Training Data Transparency

Effective: Jan 1, 2026
SB 942Enacted

AI Watermarking

Effective: Jan 1, 2026
SB 1047Vetoed

Safe & Secure Innovation

Effective: N/A