SUPOR: Precise and Scalable Sensitive User Input Detection

SUPOR: Precise and Scalable Sensitive User Input Detection

SUPOR: Precise and Scalable Sensitive User Input Detection for Android Apps Jianjun Huang, Zhichun Li, Xusheng Xiao, Zhenyu Wu, Kangjie Lu, Xiangyu Zhang, Guofei Jiang Sensitive Data Disclosures Local Local Storage Storage Disclosed to public Hijacked/maliciously retrieved 8/14/15 USENIX Security 2015

2 Sensitive Data Existing work focused on sensitive data defined by certain API methods. Most of them are permission protected E.g., in Android, TelephonyManager.getDeviceId() TaintDroid[OSDI10], AndroidLeaks[TRUST12], FlowDroid[PLDI14] PiOS[NDSS11] 8/14/15 USENIX Security 2015 3 Sensitive User Inputs We are among the first to detect user inputs as sensitive sources in mobile apps.

None of them are permission protected E.g., user id/password, credit card number Insensitive Sensitive 8/14/15 USENIX Security 2015 4 Example User Inputs Disclosures 1 EditText txtCN = findViewById(; 2 String cnum = txtCN.getText().toString(); 3 1 EditText txtCM = findViewById(; 2 String comment = txtCM.getText().toString();

3 HTTP HTTP Web Server 8/14/15 USENIX Security 2015 5 Research Problems How to systematically discover the input fields from an apps UI? How to identify which input fields are sensitive? How to associate the sensitive input fields to the corresponding variables in the apps that store their values? 8/14/15

USENIX Security 2015 6 Intuition From the users perspective, if we can mimic how a user looks at the UIs, we can determine which input fields can contain sensitive data within the UI context. 8/14/15 USENIX Security 2015 7 Feasibility Render the statically defined UI layouts Android

iOS Windows Phone Layout format XML NIB/XIB/Storyboard XAML/HTML Static UI Render ADT Xcode Visual Studio APIs map widgets to code

Yes Yes Yes Associate labels to input fields based on physical locations 8/14/15 USENIX Security 2015 8 SUPOR: Sensitive User inPut detectOR 8/14/15

USENIX Security 2015 9 Background - UI Text Label Input Field Input Hint Widget 8/14/15 USENIX Security 2015 10 Background Layout File A piece in an Android layout example.

Identifier android:id="@+id/pwd android:inputType=textPassword/> Interesting Attribute 8/14/15 USENIX Security 2015 11 Overview of SUPOR App Layout Parsing Layout

Analysis 8/14/15 Disclosure UI Sensitiveness Analysis SUPOR USENIX Security 2015 Variable Binding Vulnerability UI Rendering

Keywords Privacy Analysis 12 Parsing Layout We need to know which layout files contain input fields. Is Sensitive User Input Detection Needed? Layout file layout doesnt contain input fields layout contains input fields 8/14/15

USENIX Security 2015 13 Rendering UI Statically render layout files to UIs as users look at on smartphones via tools like ADT in Android. Layout file A Layout file B 8/14/15 USENIX Security 2015 14 Extracting Information Collect information Text Label

Text: Card Number Coordinates: [16, 231, 109, 249] Input Field Hint: 15 or 16 digit Coordinates: [16, 249, 464, 297] 8/14/15 USENIX Security 2015 15 UI Sensitiveness Analysis No Sensitive Attributes in Layout Files Sensitive Input Hint

android:inputType=textPassword/> Yes Challenge: How to precisely Yes associate the correlatedSensitive text Text Label label to a given input field? Card number Expiration date The Input Field is Sensitive 8/14/15 No 15 or 16 digit MM - YYYY

Yes USENIX Security 2015 No Comment The Input Field is Insensitive 16 Associating Labels (1) Intuition: labels at different positions relative to the input field have different probabilities to be correlated. Label Input Field

Input Field Label Input Field Input Field 8/14/15 Label Label USENIX Security 2015 17 Associating Labels (2) Assign position-based weights based on empirical observations

The smaller the weight, the closer the correlation 8/14/15 4 2 8 0.8 Input Field 9 8 9 10

USENIX Security 2015 18 Associating Labels (3) Geometry-based correlation score computation (x1, y1) Label (x2, y2) Input Field (I) 8/14/15 For each pixel (x,y) in a text label distance(I, x, y) * posWeight(I, x, y) Average the correlation score for the text label

USENIX Security 2015 19 Associating Labels (4) Find out the label with the smallest correlation score among all potential labels for a given input field Correlation scores 8/14/15 Label Number Field Date Field Credit card type 265.57

456.42 Card number 76.47 271.23 Expiration date 205.29 75.40 USENIX Security 2015 20 Determining Sensitiveness (1)

Keyword matching approach Sensitive Keywords Dataset Card number Expiration date Yes Sensitive No Insensitive Matches? Comment 8/14/15

USENIX Security 2015 21 Determining Sensitiveness (2) Why is keyword matching approach effective? Small screen and short phrases or sentences We only analyze the most relevant text label 8/14/15 USENIX Security 2015 22 Binding Variables (1) Identifier: X

1 Widget txtCN = findViewById(X); 2 Data cnum = txtCN.getText(); 3 // use of cnum 8/14/15 USENIX Security 2015 23 Binding Variables (2) Challenge: different widgets within one apps have the same identifier txtInput1 = this.findViewById(input1);

txtInput2 = this.findViewById(input1); 8/14/15 USENIX Security 2015 24 Binding Variables (3) [layout: billing_information.xml] Sensitive [layout: search.xml]

Insensitive id/input1 Sensitive txtInput1 = this.findViewById(input1); Insensitive this.setContentView(billing_information); txtInput2 = this.findViewById(input1); this.setContentView(search); 8/14/15 USENIX Security 2015 25 Implementation & Evaluation Implemented for Android apps and built on Dalysis[CHEX CCS12], IBM WALA and ADT. Only input fields of type EditText are analyzed, i.e.

other user inputs like checkbox are ignored. Implemented a sensitive user inputs disclosure detection system by combining SUPOR and static taint analysis 16,000 apps were evaluated 8/14/15 USENIX Security 2015 26 Evaluating UI Sensitiveness Analysis (1) 9,653 apps (60.33%) contains input fields Performance: Average analysis time is 5.7 seconds for one app 3.70% <= 10 seconds > 10 seconds

96.30% 8/14/15 USENIX Security 2015 27 Evaluating UI Sensitiveness Analysis (2) 9,653 apps (60.33%) contains input fields Accuracy Manually examined 40 apps . 115 layouts are rendered and 485 input fields are analyzed. TP: sensitive user inputs are identified as sensitive FP: insensitive user inputs are identified as sensitive FN: sensitive user inputs are identified as insensitive 8/14/15 USENIX Security 2015

28 Causes for FN and FP Insufficient context to identify sensitive keywords. False negative: Answer vs Security Answer False Positive: Height of an image file and for a human being Inaccurate text label association False positive: e.g. the long sentence (with keyword email) is associated with the Delivery Instructions field Input Field Text Label Input Field 8/14/15 USENIX Security 2015 29

Evaluating Disclosure Analysis For all 16,000 apps Throughput: 11.1 apps/minute A cluster of 8 servers 3 apps are analyzed on each server in parallel Manually examined 104 apps False positive rate is 8.7% Limitations of underlying taint analysis framework E.g. lack of accurate modeling of arrays 8/14/15 USENIX Security 2015 30 Case Studies (1) com.canofsleep.wwdiary

3 input fields associated with labels Weight, Height and Age are identified sensitive. The 3 marked inputs fields are identified sensitive and their data are disclosed. 8/14/15 USENIX Security 2015 31 Case Studies (2) Disclosure analysis based on SUPOR existing approach which directly define certain APIs as sensitive sources. txtWeight = this.findViewById(;

Source valWeight = txtWeight.getText().toString(); Sink Log.i(weight, 8/14/15 Un det ed t e c e c et ted D valWeight);

USENIX Security 2015 32 Conclusion We study the possibility of detecting sensitive user inputs, an important yet mostly neglected sensitive source in mobile apps. We propose SUPOR, among the first known approaches to detect sensitive user inputs with high recall and precision. Mimics from the users perspective by statically and scalably rendering the layout files. Leverages a geometry-based approach to precisely associated text labels to input fields. Utilizes textual analysis to determine the sensitiveness of the texts in labels. We perform a sensitive user inputs disclosure analysis, with FP rate of 8.7%, to demonstrate the usefulness of SUPOR. 8/14/15

USENIX Security 2015 33 Thank You! Q&A 8/14/15 USENIX Security 2015 34 Related work A lot of work focus on privacy disclosure problems on predefined sensitive data sources in the phone.[FlowDroid PLDI14, PiOS NDSS11, AAPL NDSS15] FlowDroid employs a limited form of sensitive input fields

password fields.[PLDI14] AsDroid checks checks UI text to detect the contradiction between the expected behaviors and program behaviors. [ICSE14] UIPicker uses supervised learning to collect sensitive keywords and corresponding layouts. It also uses the sibling elements in layout files as the description text for a widget. [USENIX Security15] 8/14/15 USENIX Security 2015 35 Keyword dataset construction Crawl texts from apps resource files Adapt NLP techniques to extract nouns and noun phrases from the top 5,000 frequent text lines.

Manually inspect top frequent nouns and noun phrases to identify sensitive keywords. 8/14/15 USENIX Security 2015 36 Why not use XML structure to compute correlation scores? Many developers defines relative positions of the widgets, which are not what users perceive XML structure in this case does not guarantee that sibling widgets are physically close. 8/14/15 USENIX Security 2015

37 Why not use XML structure to compute correlation scores? Some cases in real Android apps. 8/14/15 USENIX Security 2015

Label 1 Input 1 Label 2 Input 2 38

Recently Viewed Presentations

  • Workshop: FluencyBank, Child Language Assessment Project ...

    Workshop: FluencyBank, Child Language Assessment Project ...

    Fluency research is carried out by a minority of research settings, even in CSD ... the supposed content-function "switch" between childhood and adulthood in features of stuttered words. ... Almost all LSA measures are biased against speakers of non-mainstream American...
  • Chapter 8

    Chapter 8

    Chapter 8 Preserving and Applying Human Expertise: Knowledge-Based Systems Chapter Objectives Introduce the student to the internal operation of knowledge-based systems, including: Knowledge representation Automated reasoning Introduce the art of knowledge engineering - how to develop knowledge-based systems the tools...
  • A Great Partnership Enterprise Rent-A-Car and State of

    A Great Partnership Enterprise Rent-A-Car and State of

    A Great Partnership Enterprise Rent-A-Car and State of Indiana Customer Satisfaction It is a business strategy Required to succeed Measured extensively 2,000,000 customers surveyed annually Business Success Largest Rent-A-Car Company Indiana Largest Rent-A-Car Company in Indiana The Right Vehicle… 120...
  • Ministerio de Salud SITUACIN DE LA EPIDEMIA DEL

    Ministerio de Salud SITUACIN DE LA EPIDEMIA DEL

    Centro de operaciones regionales en VIH-SIDA Observatorio de la salud VIH-SIDA: Sede iniciativa (0) transmisión vertical Hotline ayuda sicológica (PASCA) Monitoreo regional del comportamiento y tendencias del VIH-SIDA (COMISCA) Sede de CONASIDA subregional OBJETIVOS ESTRATÉGICOS Establecidas las METAS ESPECÍFICAS de...
  • Capacity Management - Google Groups

    Capacity Management - Google Groups

    Monitors HW utilization in RBS. HW monitor. 3G RAN Capacity Management. 2008-09-22. It is important that the system can distribute the available critical resources among its users and avoid over-allocating resources when possible. Those monitored critical resources are called dedicated...
  • What to the Slave Is the Fourth of

    What to the Slave Is the Fourth of

    Evaluate Evidence Commonplace assertions are statements that many people assume to be true but are not necessarily so. Generalizations about life or human nature often fall into this category. One bad apple can spoil the bunch. As you read Douglass's...
  • Unit 2: Processes and Process Variables

    Unit 2: Processes and Process Variables

    Convert and solve basic calculations for process variables. Describe several methods on expressing variable conversion. ... is a unit of pressure defined as 1 atm, 101325 Pa (1.01325 bar), 14.696 psi, equivalent to 760 mmHg (torr) and 29.92 in Hg.
  • Putting It All Together -

    Putting It All Together -

    Without the chassis you have no where to place manipulators / drive system and electrical components. Without drive train the robot doesn't move. ... And 10's more, but several which are made specifically for FRC . AndyMark Chassis Parts -...