Anonymize DICOM Pixel Data: A Comprehensive Guide

July 10, 2024Olga Druchek
dicom de-identification

In today’s healthcare landscape, protecting patient privacy and data security is paramount. Medical imaging data, particularly in theDICOM(Digital Imaging and Communications in Medicine) format, often contains sensitive patient information that must be safeguarded.ApicomProoffers a robust solution through CleverDoc, a sophisticated tool designed to anonymizeDICOM pixel dataeffectively usingApache Spark. This comprehensive guide will explore the importance of anonymizing DICOM data, the features and benefits of CleverDoc, and provide a step-by-step tutorial on how to use this powerful tool.

The Importance of Anonymizing DICOM Data

Medical images, such as X-rays, CT scans, and MRIs, are typically stored in DICOM format. This format includes the actual image data and a wealth of metadata that can contain patient identifiers, such as names, birth dates, and other personal information. The need to anonymize this data arises from several key concerns:

  1. Patient Privacy:Ensuring that personal health information (PHI) remains confidential is a fundamental right and a cornerstone ofOnce CleverDoc is installed, the next step medical ethics.
  2. Regulatory Compliance:Laws such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe mandate stringent measures for protecting personal data.
  3. Research and Data Sharing:Anonymized data can be used in medical research and shared between institutions without the risk of exposing patient identities, fostering advancements in healthcare.

Introducing CleverDoc by ApicomPro

CleverDoc is a cutting-edge tool developed by ApicomPro specifically for anonymizing DICOM, PDF, Image and text files. It provides a seamless, efficient, and reliable solution to ensure that all sensitive patient information is removed from medical images and file metadata. CleverDoc stands out due to its comprehensive de-identification capabilities and robust compliance with regulatory standards.

Explore CleverDoc in action through this detailedworkshop notebookon GitHub.

Key Features of CleverDoc

CleverDoc is designed with several powerful features to address the complexities of DICOM data anonymization:

  1. Comprehensive De-Identification:CleverDoc meticulously removes all identifiable metadata and anonymizes pixel and overlay data within DICOM files. This includes patient names, IDs, and other personal information embedded in the image data itself.
  2. Automation Capabilities:CleverDoc supports:
    - The batch processing of multiple DICOM files.
    - The streaming processing for real time processing of new files.
    - REST API server for easy integration to the existing micro service infrastructure.
  3. Compliance Assurance:CleverDoc ensures that processed files meet the stringent requirements of data privacy regulations, providing peace of mind to healthcare providers.
  4. Scalability:Whether handling a small set of images or an extensive medical database, CleverDoc scales to meet the needs of any healthcare organization using Apache Spark.
  5. Performance:CleverDoc is optimized for efficiency, capable of processing one frame in approximately0.2 seconds. This rapid processing time ensures that large volumes of DICOM files can be anonymized quickly and without significant delay.
  6. Support big files: CleverDoc is designed to handle substantial file sizes, making it suitable for extensive medical imaging datasets. The tool supports:
    - File sizesup to 3 GB, allowing for comprehensive storage of detailed medical images.
    -Up to 2000frames per file, accommodating multi-frame DICOM files often used in advanced imaging techniques like MRI and CT scans.
    - Frame sizesup to 1 GB, ensuring that even high-resolution images are processed efficiently and accurately.
  7. Support CPU and GPU: With support for both CPU and GPU, CleverDoc provides versatile performance options, catering to various operational needs and infrastructure capabilities within healthcare organizations. This ensures that regardless of the existing hardware, CleverDoc can deliver efficient and effective DICOM data anonymization.

How to Use CleverDoc: A Step-by-Step Guide

Using CleverDoc to anonymize DICOM data is a straightforward process. Here’s a detailed guide to help you get started:

Step 1: Install CleverDoc

Before you can start using CleverDoc, you need to install the tool.

pip install -U cleverdoc[inference]

For GPU usage:

pip install -U cleverdoc[inference-gpu]

Step 2: Start Spark Session with CleverDoc

Once CleverDoc is installed, the next step:

license = "your_license_here"from cleverdoc import *
spark = start(license)
spark

Step 3: Load Your DICOM Files

When Spark session is started, next step is to load your DICOM files into the tool. This can be done easily through the user interface, where you can select the files or directories containing the DICOM images that you wish to anonymize.

dicom_path = "./*.dcm"
df = spark.read.format("binaryFile") \
 .load(dicom_path)

Step 4: Define pipeline

As next step need to defineSpark ML pipelinewith following stages:

  • DicomToImage — for extract images from pixel and overlay data
  • ImageToStringOnnx — for detect and recognize text on extracted images
  • Ner — for detect PHI/PII in the text
  • DicomDrawRegions — for hide detected PHI/PII on the original DICOM file
from pyspark.ml.pipeline import PipelineModel
import pyspark.sql.functions as f

dicom = DicomToImage() \
    .setInputCols(["content"]) \
    .setOutputCol("image") \
    .setKeepInput(True)

image_to_string = ImageToStringOnnx()

ner = Ner() \
    .setModel("ApicomPro/deid-bert-onnx-1.2.0") \
    .setNumPartitions(1) \
    .setThreshold(0.8) \
    .setDevice(Device.CPU.value)

draw_regions = DicomDrawBoxes() \
    .setInputCols(["content", "ner"]) \
    .setOutputCol("dicom") \
    .setAggCols(["path", "content"]) \
    .setKeepInput(True) \
    .setCompression(DicomCompression.RLELossless) \
    .setForceCompress(False)


def pipeline(debug=False):
    stages=[
        dicom,
        image_to_string,
        ner
    ]
    ifnot debug:
        stages.append(draw_regions)
    return PipelineModel(stages)

CleverDoc provides a range of settings to customize the anonymization process. You can specify which PHI/PII to anonymize and apply different anonymization techniques to pixel data. The tool offers flexibility to ensure that all sensitive information is effectively removed while retaining the essential medical data.

Step 5:Run pipeline and display intermediate results

For check intermediate results of de-identification process let’s run pipeline withdebug=Trueparam:

result = pipeline(True).transform(df).cache()

And display the extracted image:

show_images(df, "image", limit=1)

Show the recognized text:

print(result.select("text.text").collect()[0].text)

Output:

Chest
se: 3/2
Im: 1/1
Lat: F

ACC: 0545234234325V
2018 Oct 23
Img Tm: 14:23:53
Name: John Stiles
Age: 78
DOB: 05/12/42

1d DCM/Lin: DCM/ a 
w 3498 l 2000

Show detected PHI/PII entities:

result.limit(1).select(f.explode("ner.entities").alias("entities")).select("entities.*").show(50)

Output:

+------------+------------------+--------------+-----+---+--------------------+
|entity_group|             score|          word|start|end|               boxes|
+------------+------------------+--------------+-----+---+--------------------+
|          ID| 0.995708703994751|0545234234325V|   35| 49|[{0545234234325V,...|
|        DATE|0.9835751056671143|           Oct|   55| 58|[{Oct, 0.99454885...|
|     PATIENT|  0.98607337474823|          John|   85| 89|[{John, 0.5720782...|
|     PATIENT|0.8783086538314819|        Stiles|   90| 96|[{Stiles, 0.99987...|
|         AGE|0.9990577101707458|            78|  102|104|[{78, 0.481130033...|
|        DATE|0.9997254014015198|            05|  110|112|[{05/12/42, 0.995...|
|        DATE|0.9995417594909668|            12|  113|115|[{05/12/42, 0.995...|
|        DATE|0.8975017666816711|            42|  116|118|[{05/12/42, 0.995...|
+------------+------------------+--------------+-----+---+--------------------+

So as result we have detected sensitive data and we know coordinates on the image for each entity.

Step 6: Show final results and compare with original file

As last step we run pipeline withdebug=Falseparam and show original (left side) and processed file (right side):

result = pipeline(False).transform(df).cache()
show_dicom(result, "content,dicom", show_meta=True)
Dicom anonymization

Step 7: Save and Verify Anonymized Files

Once the anonymization process is complete, save the anonymized files to your desired location. It’s crucial to verify that all sensitive data has been successfully removed. ApicomPro provides annotation/verification tools to help you ensure that the anonymization has been thorough and effective.

result \
    .withColumn("fileName", get_name_udf(f.col("path"))) \
    .withColumn("dicom", f.col("dicom.data")) \
    .write \
    .format("dicomFormat") \
    .option("type", "dicom") \
    .option("field", "dicom") \
    .option("nameField", "fileName") \
    .option("extension", "dcm") \
    .option("prefix", "") \
    .mode("append") \
    .save("de-dicom")

For a more detailed walk through, refer to thede-identification pageon ApicomPro’s website.

Benefits of Using CleverDoc

By utilizing CleverDoc for anonymizing DICOM data, healthcare providers can enjoy several significant benefits:

  1. Enhanced Patient Privacy:Protects patient identities by ensuring that all personal information is removed from medical images.
  2. Regulatory Compliance:Helps healthcare organizations comply with data protection laws, avoiding legal issues and potential fines.
  3. Streamlined Workflows:Automation capabilities save time and reduce the manual effort required to anonymize large volumes of data.
  4. Facilitated Research:Enables the sharing of medical data for research purposes without compromising patient confidentiality.
  5. Operational Efficiency:Scalability make it suitable for healthcare providers of all sizes.

Conclusion

Anonymizing DICOM data is crucial for protecting patient privacy, ensuring regulatory compliance, and enabling secure medical research. CleverDoc from ApicomPro provides a powerful, user-friendly solution to address these needs, ensuring that sensitive patient information is effectively anonymized. For more information and to get started, visit thede-identification pageor explore theworkshop notebookon GitHub.

Embrace the power of CleverDoc and make patient privacy a top priority in your medical imaging workflows. With CleverDoc, you can confidently handle DICOM data, knowing that patient information is secure and compliant with the highest standards of data protection.

Originally was posted on Medium