Anonymizing healthcare data: a case study on the blood ...

Anonymizing healthcare data: a case study on the blood ...

Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Noman Mohammed Benjamin C.M. Concordia University Fung Patrick C. K. Hung Cheuk-kwong Lee Montreal, QC, Canada Concordia University UOIT Hong Kong Red Cross [email protected] Montreal, QC, Oshawa, ON, dia.ca Canada Canada Blood Transfusion Service [email protected] [email protected] a.ca a Kowloon, Hong Kong [email protected] KDD 2009 Outline 2 Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work

Conclusions Motivation & background 3 Organization: Hong Kong Red Cross Blood Transfusion Service and Hospital Authority 4 Data flow in Hong Kong Red Cross Read Blood Usage Report Generator Blood Donor Data & Blood Information Read C o lle c t B lo o d Submit Report Donors C o lle c t P a tie n t Distribute Blood D a ta Publish Report Public Hospitals Manage Patients Own Privacy Aware Health Information Sharing Service Write

Read Patient Health Data & Blood Usage Healthcare IT Policies 5 Hong Kong Personal Data (Privacy) Ordinance Personal Information Protection and Electronic Documents Act (PIPEDA) Underlying Principles Principle 1: collection Principle 2: retention Principle 3: Principle 4: Principle 5: Available Purpose and manner of Accuracy and duration of Use of personal data Security of Personal Data Information to be Generally Contributions 6

Very successful showcase of privacypreserving technology Proposed LKC-privacy model for anonymizing healthcare data Provided an algorithm to satisfy both privacy and information requirement Will benefit similar challenges in information sharing Outline 7 Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions Privacy threats 8 Identity Linkage: takes place when the number of records containing same QID values is small or unique. Data Adversary recipients Knowledge: Mover, age 34 Identity Linkage Attack Privacy threats 9

Identity Linkage: takes place when the number of records that contain the known pair sequence is small or unique. Attribute Linkage: takes place when the attacker can infer the value of the sensitive attribute with a higher confidence. Adversary Knowledge: Male, age 34 Attribute Linkage Attack Information needs 10 Two types of data analysis Classification model on blood transfusion data Some general count statistics why does not release a classifier or some statistical information? no expertise and interest . impractical to continuously request. much better flexibility to perform. Outline 11

Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions Challenges 12 Why not use the existing techniques ? The blood transfusion data is highdimensional It suffers from the curse of dimensionality Our experiments also confirm this reality Curse of Highdimensionality 13 ID Job Sex Ag e Educati on Sensitive Attribute 1 Janitor M

25 Primary 2 Janitor M 40 Primary 3 Janitor F 25 Secondar y 4 Janitor F 40 Secondar y 5 Mover M

25 Secondar y 6 Mover F 40 Primary 7 Mover M 40 Secondar y 8 Mover ANY Job F 25 ANY Mover Janitor Sex Male

Age ANY Primary Female 25 K=2 QID = {Job, Sex, Age, Education} Education ANY 40 Primary Secondary Curse of Highdimensionality 14 ID Job Sex Ag e Educati on Sensitive Attribute 1 Any M 25

Primary 2 Any M 40 Primary 3 Any F 25 Secondar y 4 Any F 40 Secondar y 5 Any M 25

Secondar y 6 Any F 40 Primary 7 Any M 40 Secondar y 8 Job Any ANY F 25 ANY Mover Janitor Sex Male Age ANY

Primary Female 25 K=2 QID = {Job, Sex, Age, Education} Education ANY 40 Primary Secondary Curse of Highdimensionality 15 ID Job Sex Ag e Educati on Sensitive Attribute 1 Any Any 25 Primary

2 Any Any 40 Primary 3 Any Any 25 Secondar y 4 Any Any 40 Secondar y 5 Any Any 25 Secondar y

6 Any Any 40 Primary 7 Any Any 40 Secondar y 8 Job Any ANY Any 25 ANY Mover Janitor Male Sex Age ANY Primary

Female 25 K=2 QID = {Job, Sex, Age, Education} What What if we have 20 have 40 10 attributes ? attributes Education ANY 40 Primary Secondary Outline 16 Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions LKC-privacy 17 ID

Job Sex Ag e Educati on 1 Janitor M 25 Primary 2 Janitor M 40 Primary 3 Janitor F 25 Secondar y 4 Janitor F 40 Secondar y

5 Mover M 25 Secondar y 6 Mover F 40 Primary 7 Mover M 40 Secondar y 8 Mover Job ANY Mover Janitor Sex ANY F 25 Male Primary

Female 25 Surgery L=2, K=2, C=50% for Plastic Is it possible Transgende an adversary QID 1= Transgende the r QID2= about a Urology QID3= Plastic victirm? QID4= Urology 5=

Sex Ag e 1 Janitor M 25 2 Janitor M 40 3 Janitor F 25 4 Janitor F 40 5 Mover M 25 6 Mover F

40 7 Mover M 40 8 Mover Job ANY Mover Janitor Sex ANY F 25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Secondar Transgende y r QID2= y Secondar Urology QID3=

y Edu> Primary Plastic Secondar QID4= Primary Urology 5=

F 25 4 Janitor F 40 5 Mover M 25 6 Mover F 40 7 Mover M 40 8 Mover Job ANY Mover Janitor Sex ANY F

25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Seconda Transgende ry r QID2= ry Seconda Urology QID3= Primary Plastic Seconda QID4= Primary Urology 5=

ID Job Sex Age 1 Janitor M 25 2 Janitor M 40 3 Janitor F 25 4 Janitor F 40 5 Mover M 25 6 Mover

F 40 7 Mover M 40 8 Mover Job ANY Mover Janitor Sex ANY F 25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Seconda Transgende ry r QID2= ry

Seconda Urology QID3= Primary Plastic Seconda QID4= Primary Urology 5=

3 Janitor F 25 4 Janitor F 40 5 Mover M 25 6 Mover F 40 7 Mover M 40 8 Mover Job ANY Mover Janitor Sex ANY

F 25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Seconda Transgende ry r QID2= ry Seconda Urology QID3= Primary Plastic Seconda QID4= Primary Urology 5=

LKC-privacy 22 ID Job Sex Age 1 Janitor M 25 2 Janitor M 40 3 Janitor F 25 4 Janitor F 40 5 Mover M 25

6 Mover F 40 7 Mover M 40 8 Mover Job ANY Mover Janitor Sex ANY F 25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Seconda Transgende ry r QID2=

Vascular Age> ry Seconda Urology QID3= Primary Plastic Seconda QID4= Primary Urology 5=

40 3 Janitor F 25 4 Janitor F 40 5 Mover M 25 6 Mover F 40 7 Mover M 40 8 Mover Job ANY Mover Janitor

Sex ANY F 25 Male Educati on Surgery L=2, K=2, C=50% Primary Plastic Primary Transgende QID1= Seconda Transgende ry r QID2= ry Seconda Urology QID3= Primary Plastic Seconda QID4= Primary Urology 5=

Primary QID Secondary LKC-privacy 24 A database, T meets LKC-privacy if and only if |T(qid)|>=K and Pr(s|T(qid))<=C for any given attacker knowledge q, where |q|<=L s is the sensitive attribute k is a positive integer qid to denote adversarys prior knowledge T(qid) is the group of records that contains qid LKC-privacy 25 Some properties of LKC-privacy: it only requires a subset of QID attributes to be shared by at least K records K-anonymity is a special case of LKCprivacy with L = |QID| and C = 100% Confidence bounding is also a special case of LKC-privacy with L = |QID| and K = 1 (a, k)-anonymity is also a special case of LKC-privacy with L = |QID|, K = k, and C = a Algorithm for LKC-privacy 26

We extended the TDS to incorporate LKCprivacy B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. In TKDE, 2007. LKC-privacy model can also be achieved by other algorithms R. J. Bayardo and R. Agrawal. Data Privacy Through Optimal k-Anonymization. In ICDE 2005. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. In TODS, 2008. Outline 27 Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions Experimental Evaluation 28 We employ two real-life datasets Blood: is a real-life blood transfusion

dataset attributes are QID attributes Blood Group represents the Class attribute (8 values) Diagnosis Codes represents sensitive attribute (15 values) 10,000 blood transfusion records in 2008. 41 Adult: is a Census data (from UCI repository) 6 continuous attributes. Data Utility 29 Blood dataset Data Utility 30 Blood dataset Data Utility 31 Adult dataset Data Utility 32 Adult dataset Efficiency and Scalability 33 Took at most 30 seconds for all previous

experiments Outline 34 Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions Related work 35 Y. Xu, K. Wang, A. W. C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In SIGKDD, 2008. Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In ICDM, 2008. M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of setvalued data. In VLDB, 2008. G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, 2008. Outline 36

Motivation & background Privacy threats & information needs Challenges LKC-privacy model Experimental results Related work Conclusions Conclusions 37 Successful demonstration of a real life application It is important to educate health institute managements and medical practitioners Health data are complex: combination of relational, transaction and textual data Source codes and datasets download: http://www.ciise.concordia.ca/~fung/pub/RedCrossKDD09/ 38 Thank You Very Much Q&A

Recently Viewed Presentations

  • Topic 3

    Topic 3

    3. Allowed customs officers & other officials to be tried in Canada or Britain. (Meant to prevent officials from breaking laws because they thought they would get a sympathetic colonial jury.) 4. Quartering Act made the colonists house British troops...
  • Workplace Technology 1 Objectives  To outline the role

    Workplace Technology 1 Objectives To outline the role

    Web Browser. Is a program used to view websites. Examples include: Mozilla® Firefox® Microsoft® Internet Explorer. Google ChromeTM. Apple Safari® Fun Fact: MosaicTM, the world's first freely available web browser, allowed for graphics and text and was developed in 1993...
  • The ABC&#x27;s of Healthy Relationships

    The ABC's of Healthy Relationships

    * Our slogan for the ABC's is: Run your relationships, don't let them run you. We believe that you are in charge of each of your relationships and that you deserve respect in EVERY relationship. But sometimes it is too...
  • Hell: What is it? Where is it? How to Get There

    Hell: What is it? Where is it? How to Get There

    The dead were judged according to what they had done as recorded in the books. 13 The sea gave up the dead that were in it, and death and Hades gave up the dead that were in them, and each...
  • Rebuilding Lynmouth - Think Geography

    Rebuilding Lynmouth - Think Geography

    They can use the following strategies: 'Hard Engineering strategies' - these require major alterations and changes to the river and try to stop the river from flooding 'Soft Engineering strategies' - this is where limited alterations take place and flooding...
  • One Room, Many Doors: Logistics OF Recruiting and Training ...

    One Room, Many Doors: Logistics OF Recruiting and Training ...

    BIOM 600 was large, lecture only (120+ students), more undergrad than grad style, high level. Not neuro- focused. Some of our students in computational and cognitive neuroscience didn't have the background to succeed in BIOM 600. NGG Negotiated a Compromise.
  • Introduction of the Research Paper

    Introduction of the Research Paper

    (If you wouldn't say it to the face of someone you respect but disagree with, it doesn't belong in your research paper.) Medium: Written paper, with the option of added images at the end. Also, if you choose to do...
  • Phylum Mollusca

    Phylum Mollusca

    PHYLUM MOLLUSCA CLAMS,SQUIDS, OCTOPUSES, SNAILS, SLUGS, ETC. General Characteristics More than 100,000 species "soft bodied" Bilaterally symetrical Trochophore larva 3 main body parts Mantle: thin membrane that surrounds the visceral mass, and secretes shell Visceral mass: digestion, excretion, and reproductive...