Healthcare is experiencing an AI gold rush. From clinical decision support to patient communication, medical documentation to diagnostic assistance—the potential applications seem endless. But there's a massive elephant in the room: HIPAA.
The Health Insurance Portability and Accountability Act sets strict requirements for how Protected Health Information (PHI) must be handled. Violating these requirements can result in fines ranging from $100 to $50,000 per violation, with annual maximums reaching into the millions.
So how do you leverage the power of LLMs while staying compliant? Let's break it down.
Understanding PHI in the Context of LLMs
Before we can protect PHI, we need to understand what it is. Under HIPAA, PHI includes any individually identifiable health information, including:
- Patient names
- Geographic data (anything more specific than state)
- Dates (birth dates, admission dates, discharge dates)
- Phone numbers, fax numbers, email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- Web URLs and IP addresses
- Biometric identifiers
- Full-face photographs
- Any other unique identifying number or code
When you send a prompt to an LLM that contains any of this information, you're potentially transmitting PHI to a third party. This is where things get complicated.
The Business Associate Agreement Problem
Under HIPAA, any third party that handles PHI on behalf of a covered entity must sign a Business Associate Agreement (BAA). This agreement establishes the third party's responsibilities for protecting that data.
Here's the challenge: most LLM providers either don't offer BAAs, or their BAA-covered offerings come with significant limitations.
As of late 2025, major providers like OpenAI and Anthropic offer BAAs for their enterprise tiers, but these agreements often restrict certain use cases and require specific configurations. Azure OpenAI Service and AWS Bedrock offer more comprehensive HIPAA-compliant options, but at significantly higher price points.
Even with a BAA in place, you're still responsible for minimizing the PHI you transmit. The principle of minimum necessary applies: you should only share the minimum amount of PHI required to accomplish your purpose.
The De-identification Approach
The most robust approach to HIPAA compliance with LLMs is to ensure that no PHI ever reaches the LLM in the first place. This is done through de-identification.
HIPAA provides two methods for de-identification:
1. Expert Determination
A qualified statistical or scientific expert determines that the risk of identifying an individual is "very small." This method is flexible but requires expert involvement and documentation.
2. Safe Harbor
Remove all 18 types of identifiers listed above, and have no actual knowledge that the remaining information could identify an individual. This is more prescriptive but easier to implement systematically.
For LLM workflows, the Safe Harbor method is typically more practical because it can be automated.
Building a Compliant Architecture
Here's a practical architecture for HIPAA-compliant LLM usage:
Step 1: Intercept and Scan
Before any data leaves your infrastructure, it passes through a scanning layer that identifies all potential PHI. This includes:
- Named entity recognition for names, locations, organizations
- Pattern matching for SSNs, phone numbers, medical record numbers
- Date detection and normalization
- Custom identifiers specific to your organization
Step 2: Tokenize and Replace
Each piece of identified PHI is replaced with a token. For example:
Original: "John Smith was diagnosed with Type 2 diabetes on March 15, 2024."
Tokenized: "[PATIENT_1] was diagnosed with Type 2 diabetes on [DATE_1]."
The mapping between tokens and original values is stored securely within your infrastructure—never transmitted to the LLM.
Step 3: Process with LLM
The de-identified prompt is sent to the LLM. Since no PHI is included, you've significantly reduced your compliance risk. The LLM processes the request and returns a response that may contain your tokens.
Step 4: Rehydrate
The response is processed to replace tokens with original values before being returned to your application:
LLM Response: "Based on [PATIENT_1]'s diagnosis on [DATE_1], I recommend..."
Rehydrated: "Based on John Smith's diagnosis on March 15, 2024, I recommend..."
De-identification doesn't eliminate all risk. You still need appropriate security controls, access logging, and policies. It's one layer in a defense-in-depth approach, not a silver bullet.
Implementation Considerations
Accuracy Matters
Your PHI detection must be highly accurate. False negatives (missing actual PHI) create compliance risk. False positives (flagging non-PHI) degrade the quality of your LLM interactions. Invest in robust detection that's tuned for medical terminology.
Context Preservation
The way you tokenize matters for LLM performance. Replace "John Smith" with "[PATIENT_1]" rather than "[REDACTED]"—this preserves the semantic structure of the text and helps the LLM understand what type of entity is being referenced.
Audit Everything
HIPAA requires audit trails. Log every request, what PHI was detected, how it was tokenized, and what was sent to the LLM. This documentation is essential for compliance audits.
Local Processing
The PHI detection and tokenization should happen within your own infrastructure, not in a third-party service. This minimizes the number of parties handling PHI.
Common Pitfalls
Based on our experience working with healthcare organizations, here are the most common mistakes:
- Relying solely on BAAs: A BAA doesn't make non-compliant practices compliant. You still need to minimize PHI transmission.
- Incomplete detection: Custom identifiers (patient IDs, case numbers) are often missed by generic PII detectors.
- Ignoring the response: PHI can appear in LLM responses too, especially in conversational contexts. Monitor both directions.
- Inconsistent enforcement: Compliance must be systematic. One developer bypassing the guardrails can create liability.
The Path Forward
HIPAA compliance doesn't mean you can't use LLMs—it means you need to be thoughtful about how you use them. The healthcare organizations seeing the most success are those that:
- Build compliance into their architecture from day one
- Automate PHI detection and de-identification
- Maintain comprehensive audit trails
- Regularly review and update their controls
The potential of AI in healthcare is too significant to ignore. With the right architecture, you can capture that potential while meeting your compliance obligations.