Dicom De-identification

Anonymization Solutions

Automated and scalable tool for your organization

Why DICOM De-identification Matters

Protecting Patient Privacy

Medical images stored in DICOM format contain a wealth of sensitive information, including patient names, dates of birth, medical record numbers, and anatomical details. Unauthorized access to this information can lead to breaches of patient privacy, identity theft, and legal repercussions for healthcare providers. DICOM de-identification mitigates these risks by stripping identifiable data from images, ensuring that only authorized individuals have access to patient information.

Regulatory Compliance

Healthcare organizations must adhere to stringent regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe, which mandate the protection of patient data. DICOM de-identification is essential for compliance with these regulations, as it helps healthcare providers meet their legal obligations while facilitating data sharing for research and clinical purposes.

Promoting Data Sharing and Collaboration

DICOM de-identification promotes data sharing and collaboration among healthcare professionals, researchers, and institutions. By anonymizing patient information, DICOM images can be shared more freely for educational purposes, multi-center studies, and quality improvement initiatives, leading to improved patient outcomes and advancements in medical science.

How we de-identify/anonymize DICOM

DICOM document can contains PHI and PII in four places:

Metadata

DICOM metadata contains in most cases contains PHI. Below are some common examples of Protected Health Information (PHI) often present in DICOM metadata:

Patient Name: The complete name or other identifying details of the patient.
Patient ID: A unique identifier for the patient, which may consist of a medical record number or another patient-specific code.
Date of Birth: The patient’s birth date, valuable for confirming identity.
Study Date: The date when the medical examination, like an MRI or CT scan, took place.
Accession Number: An exclusive identifier assigned to a specific study or examination.
Institution Name: The title of the healthcare facility where the study occurred.
Referring Physician: The name of the physician who referred the patient for the study.
Performing Physician: The name of the physician who conducted the study.
Patient History: Relevant medical history or clinical information about the patient’s condition.
Patient’s Sex: The gender of the patient.

Pixel data

"Burned-in pixel data" refers to image information permanently embedded within the image itself, becoming an inseparable component of the pixel data. This integration makes the information resistant to easy removal or modification since it's part of the image's raw data.

This is most challanged and expensive part because pixel data is big. Some time size of one frame can be 1..2 GB, and our solution support process DICOM files up to 4 GB.

Another challenage with pixel data is various color schema and compression. ApicomPro's solution support following color schemas(PhotometricInterpretations) in DICOM:

MONOCHROME2: Grayscale images with different tones of gray are represented by monochrome images in this photometric interpretation. It is frequently applied to grayscale photos and X-rays, which are medical images.
RGB: This color schema is used for full-color images, where each pixel is represented by three color channels: red, green, and blue. Combining these three channels in varying intensities creates a wide range of colors, making it suitable for standard color images.
YBR: YBR stands for YCbCr (Luminance, Chrominance Blue, Chrominance Red). It is a color schema used to represent color images in a way that separates the luminance (brightness) information from the chrominance (color) information. It’s often used in medical imaging and JPEG compression.
YBR FULL: This extension of the YBR color space provides full-color information. It incorporates all the color information required for correct color representation while maintaining the separation of luminance and chrominance.
YBR FULL 422: This variation of YBR FULL uses 4:2:2 chroma subsampling. It reduces the amount of chrominance data while preserving good color quality, making it useful for compression without significant loss of image quality.
PALETTE COLOR: The images in this photometric interpretation are represented by a color palette. To represent the colors in the image, it indexes a color palette rather than keeping the specific color values for every pixel. It's a productive method for sending and storing color images with a constrained color space, like GIF images.

Overlay data

"Overlay data" denotes additional graphical or textual elements that can overlay medical images, serving to provide annotations, measurements, or other enhancing information for interpretation. This overlay data is structured into overlay planes, each representing a distinct layer of graphical or textual data superimposable on the primary image. Multiple overlay planes enable the addition of various types of annotations or information to an image.

Encapsulated Documents

Encapsulated Documents in DICOM are utilized to associate textual or document-based information with medical images. These documents may encompass clinical reports, patient histories, annotations, or any other relevant textual data. DICOM Encapsulated Documents support various formats such as PDF, HTML, or plain text, with the format typically specified within the DICOM object.

How we automate de-identification/anonymization

We are extracting each frame from the pixel data
Extracting overlay data
Detect text on the each frame/everlay using Deep learning model
Recognize text
Detect PHI data using LLM or NER models
Detect PHI in the metadata
Detect PHI from the metadata on the image
Update the pixel/overlay data in the original Dicom file. For avoid to broke internal structure of the Dicom document
Update metadata
Compress pixel data using loss-less algorithms, for reduce size of result file.

How we scale de-identification/anonymization process

For single frame documents we able to run REST API service and scale it horisontaly by contanerisation.

For multiframe documents prefer to run pipeline on the Spark and distribute processing each frame. This approach give capability handle really big files, we have expiriance with 3..4 GB files and southands of frames.

Dec. 16, 2024

Automated PDF redaction/de-identification: Performance Analysis

Today, keeping sensitive information safe is more important than ever. Organizations frequently han…

De-identification Pdf

June 11, 2024

Medical PDF De-identification: Ensuring Patient Privacy and Compliance in Document Management

In the healthcare industry, the handling of medical documents is governed by stringent regulations …