How the CMIA Now Governs Your Medical AI Training Sets
The CA Attorney General is watching: Patient data in AI models is now a high-stakes legal game. 🔍
Beyond HIPAA
Most health tech developers are familiar with HIPAA. But in California, the Confidentiality of Medical Information Act (CMIA) is the law you really need to watch. Unlike HIPAA, which applies only to "covered entities" (like hospitals and insurers), the CMIA applies to any business that maintains medical information.
This means that even if you are a small startup selling a wellness app or an AI tool directly to consumers, you are likely subject to CMIA strictures if you handle health data.
Training Data Risks
The intersection of CMIA and AI training is a legal minefield. Using CMIA-protected data to train an AI model without explicit consent can be a violation.
- Consent: Did the patient consent to have their data used for "product improvement" or "AI training"? Standard privacy policies may not be enough.
- De-identification: CMIA has strict standards for de-identification. If you strip names but leave enough data points to potentially re-identify a patient (e.g., rare disease + zip code), the data is still regulated.
The "Derivative Data" Problem
A major concern is "derivative data." If your AI model "memorizes" patient data during training and can be prompted to regurgitate it (a known issue with Large Language Models), the model itself might be considered a container of regulated medical information. This creates complex liability issues regarding data retention and deletion rights.
Conclusion
Audit your data pipeline. Ensure you have the right consents before you train. If you are buying data, verify that the vendor obtained it compliantly. In California, ignorance of the data's origin is not a defense.