BSidesPDX-2025

Keeping PHI Out of the Model: Practical Patterns for Privacy Preserving LLMs in Healthcare
2025-10-25 , Talk 2

LLMs are racing into clinics and back offices, but a single prompt, log or misstep can leak Protected Health Information (PHI) and erode trust. This fast paced, vendor agnostic talk shows how to ship useful Large Language Model (LLM) features in healthcare without violating privacy or slowing delivery. Instead of theory, we’ll focus on what can go wrong across the LLM lifecycle (e.g. in training, prompts, logs, embeddings etc.) and how to think like an attacker. Then translate all of it into a pragmatic, privacy by design workflow you can adopt immediately. You’ll leave with a concise blueprint, a threat to control matrix you can copy into your program, and a simple decision rubric for on-premises versus cloud deployments. If you own security, ML or compliance and need practical patterns, this session is for you!


Healthcare AI systems face two simultaneous pressures: deliver real utility (focusing on intake, documentation, triage and clinical guidance) and avoid exposing Protected Health Information (PHI) at any point in the lifecycle. This talk presents a practical, privacy by design workflow for Large Language Model (LLM) use in healthcare that teams can implement without stalling delivery.

We begin with a concise threat model that traces how PHI can leak during training, inference, logging and analytics. From there, we build a layered architecture:
(1) a deterministic de-identification pipeline that removes identifiers, tokenizes sensitive terms, and generalizes identifiers before prompts or training
(2) input, output and system guardrails that block prompt injection, redact emergent PHI, enforce tone/policy and create auditable traces
(3) Retrieval Augmented Generation (RAG) constrained to pre approved, up to date clinical sources to reduce hallucinations and citation drift
(4) a hosting decision rubric for on-device/on-premises vs cloud going over points like maximal control, scale etc. while also discussing relevant compensating controls

On top of that foundation, we cover where Privacy Enhancing Technologies (PETs) fit. This would go over Differential Privacy for training to resist membership/attribute inference, Federated Learning with Secure Aggregation to keep raw data local while learning across institutions, Confidential Computing for data in use protection at inference/training time, and Machine Unlearning to honor “right to be forgotten” events without full retrains. The aim is for attendees to leave with a minimal threat to control matrix, a rollout checklist and concrete patterns they can adopt in hospital or vendor environments.

I’m Anoop Nadig, a security engineer with seven years of experience. I specialize in Cloud and Application security, with professional interests in automation, threat modeling, and “shift-left” practices.

Outside of work, you’ll often find me on a hiking trail, at a live concert, or supporting security conferences and community initiatives.