International Conference on Advanced Communications Technology(ICACT)679WhatsApp Network Forensics: Discovering theCommunication Payloads behind CybercriminalsFu-Ching TSAI, En-Cih CHANG, Da-Yu KAODepartment of Information Management, Central Police University, TaiwanCorresponding Author: [email protected], Tel: 886 3 328 2321*5100Abstract— The ubiquity of instant messaging (IM) apps on smartphones have provided criminals to communicate with channelswhich are difficult to decode. Investigators and analysts areincreasingly experiencing large data sets when conductingcybercrime investigations. Call record analysis is one of thecritical criminal investigation strategies for law enforcementagencies (LEAs). The aim of this paper is to investigatecybercriminals through network forensics and sniffingtechniques. The main difficulty of retrieving valuableinformation from specific IM apps is how to recognize thecriminal’ IP address records on the Internet. This paperproposes a packet filter framework to WhatsApp communicationpatterns from huge collections of network packets in order tolocate criminal’s identity more effectively. A rule extractionmethod in sniffing packets is proposed to retrieve relevantattributes from high dimensional analysis regarding togeolocation and pivot table. The results can support LEAs indiscovering criminal communication payloads, as well asfacilitating the effectiveness of modern call record analysis. Itwill be helpful for LEAs to prosecute cybercriminals and bringthem to justice.Keywords—Cybercrime Investigation, Network Forensics,Packet Analysis, VoIP, WhatsApp, Lawful InterceptionI. INTRODUCTIONCall record analysis is one of the critical criminalinvestigation strategies for law enforcement agencies (LEAs).Call records provide important information, such as dates,times, and lengths of outgoing and incoming calls, which ishighly relevant for examining crime scene [1]. However, theubiquity of instant messaging (IM) apps on smart phones hasprovided criminals to communicate with channels which aredifficult to track using traditional investigation technologies.Nowadays, most of the criminals use IM apps instead of voicephones to prevent been targeted by LEAs. Finding the identityof cybercriminal without foreign authority’s help is difficulton the Internet, which provides complete anonymity andprivacy and causes some challenges during investigation [2].Developing new techniques to analyse modern call records isan urgent task.The main difficulty of retrieving valuable information fromspecific IM apps is how to filter mass network connectionISBN 979-11-88428-00-7records on the Internet. The captured raw data from theInternet is full of packets which produced by different appsfrom various devices. The protocols, ports, connectionfrequencies are all varied from different apps. Moreover,smartphones can even establish connections through itsdifferent networks interfaces. Although retrieving call recordsor network connection logs from smartphones is greatchallenge, its data is capable of providing more advanced anddetailed information than traditional phone records. Forexample, the GIS or IP address reveals the locations of thecalls and the captured network packets provide the multimediacontent of the communications.WhatsApp is a cross-platform application for instantcommunications on electronic devices, including smartphones,tablet computers and personal computers. There are more than1.3 billion active WhatsApp users for July 2017. Itsworldwide popularity is not only because the low costsubscription model, but also its new features which allowpeople to group chat and send text, pictures and othermultimedia elements along with messages. Since WhatsAppwas acquired by Facebook in 2014, the snowball effect bringseven more users to communicate with this platform.Unfortunately, its characteristics of convenient and wellfunctioning also contributes the wild usage of criminals tocommunicate with each other in a more effective and secretway.This research tries to recognize WhatsApp communicationpatterns from huge collections of network logs and packets inorder to locate criminal activities more effectively.Discovering criminal communication contents from vagueconnections helps LEAs to filter criminal activities moreeffectively. The structure of this paper is organized as follows.Section 2 provides a review of network sniffing protocols andanalysing tools. Section 3 describes the specifics of researchframework. Section 4 demonstrates the experimental results.Finally, the last section concludes the paper and makes somesuggestions for future work.II. LITERATURE REVIEWSPacket analyzers are wild applied in network security fieldto analyze raw traffic, detect attacks, sniffing and networkroubleshooting [5].ICACT2018 February 11 14, 2018

International Conference on Advanced Communications Technology(ICACT)680Wireshark is one of the most well-known open sourcepacket analyzers. Wireshark provides both easy-to-useGUI and a command-line utility with very activecommunity support [6]. In addition, it supports offline andonline mode for flexible capturing operations. There aresome of the important features listing in the figure 2.Figure 1. The function of packet analyzersFigure 1 demonstrates the functions of packet analyzer.Packet analyzers can be used for different kinds of roles invarious applications. For the moral perspective, packetanalyzer helps perform a security audit through packets; fornetwork administrators, it becomes a tool to diagnoseproblems in a network. For white-hat hackers, the reports frompacket analyser help to find vulnerabilities of softwareapplications which are able to build an early warning beforecyber-attackers lunch the serious attacks. For protocoldevelopers, packet analyzers can be used to diagnose protocolrelated issues. Packet analyzer can also be used in immoralway, for example, inspecting packet payload to decryptpasswords or sniffing the traffic to deploy man-in-the-middleattack.Describing the process of capturing and interpreting livedata as it flows across a network in order to better understandwhat is happening on that network, packet analysis is typicallyperformed by a packet sniffer, which is a tool used to captureraw network data going across the wire or wireless interfaces.Packet analysis can help with understanding networkcharacteristics, determining who or what is utilizing availablebandwidth, finding unsecured and bloated applications,identifying summit network usage times, or figuring outmalicious activities.There are various types of packet-sniffing programs,including both free and commercial ones. Each program isdesigned with different goals in mind. A few popular packetanalysis programs are Tcpdump, OmniPeek, and Wireshark.Tcpdump is a command-line program, while OmniPeek andWireshark have graphical user interfaces [9].ISBN 979-11-88428-00-7Figure 2. Wireshark featuresVoIP, an abbreviation of Voice over Internet Protocol,sends voice over an IP-based network, which is totallydifferent from the circuit-switched public telephone network[7]. Circuit switching allocates resources to each individualcall; however, IP networks are packet switched, and eachpacket sent is semi-autonomous, which has its own IP header,forwarding separately by routers.VoIP employs session control and signalling protocols tomanage the signalling, set-up, and tear-down of calls. It workswith several protocols called SIP (Session Initiation Protocol),H.323, SDP (Session Description Protocol), RTP (Real-timeTransport Protocol), Inter-Asterisk eXchange (IAX) and so on.A traditional system involves a lot of control signalling toaccomplish the various tasks required, while VoIP takes all ofthese signalling messages and places them inside IP packets. Itis also worth mentioning that since the Internet Protocol canand does run over almost every single type of low-layercommunication architecture, VOIP can as well.By the VoIP architecture, researchers can side-by-sidecompare the topologies and a short list of the basic skillsrequired to work on VoIP and traditional telephony. Theequipment used in each, while serving the same functions,performs these functions differently and in fact operates usinga completely different set of protocols [3].ICACT2018 February 11 14, 2018

International Conference on Advanced Communications Technology(ICACT)The most frequent feature of WhatsApp is voice calling.While users start a call using private IP address behind thefirewall, STUN (Session Traversal of UDP through NetworkAddress Translations) protocol should be used for assistingdevices behind a NAT firewall or router with their packetrouting. It allows an end computer to discover its public IPaddress, and to permit NAT traversal for applications of realtime voice, message, and other interactive communications.RFC 5389 redefines the term STUN as 'Session TraversalUtilities for NAT' [8][10].III. RESEARCH FRAMEWORKThis paper simulates the scene of communications betweenvictim and suspect and tries to retrieve the patterns as the rulesto filter WhatsApp packet in order to help LEAs targetsuspects more effectively. Figure 3 shows the 3 steps of ourresearch framework, i.e. data collection, data preparation andpattern recognition.Figure 3. Research frameworkA. Data CollectionIn the data collection step, researchers use Wireshark tocapture all network traffic from the route between victim andsuspect. Setting a middle point for packet sniffing is one of thelaw enforcement strategies for investigating criminalbehaviour. In the procedures of two users making voice callsthrough WhatsApp, researchers assume the node of Wiresharkdeployment is under lawful interception warrant procedures.Since WhatsApp doesn’t support making voice calls frompersonal computers, we are not able to set up our experimentISBN 979-11-88428-00-7681to start capturing packets using Wireshark which should onlybe installed in personal computers. We choose to make apersonal computer as a hotspot for sharing networkconnections to the cell phone and as the node for capturingnetwork packets utilized by Wireshark.B. Data PreparationIn the data preparation step, the pcap data which is thecaptured packets file format is imported to Wireshark todemonstrate the information of header and payload. Since theoperation of voice calls relates to STUN protocol, the protocolfiltering function of Wireshark is applied to further analyseSTUN packets. Figure 4 demonstrates the STUN packets listordered by timestamp from imported pcap files.Besides the STUN packets filtering, the geolocation of anIP address plays an important role to transform locations fromnetwork space to physical space. Originally started in Unix,Whois database has become the most common mechanism forlocating the registration information for IP resourcesregistered with Internet number resources organizations. Byquerying registries into one of the open-source Whois Lookuptools, it returns all sorts of geolocation information, includingdomain ownership, addresses, locations, and phone numbers.For analysing high dimensional data, the pcap files arefurther exported to excel for applying Pivo table to view datafrom different angles.C. Pattern RecognitionIn the pattern recognition phase, researchers investigate thepacket records to identify patterns which generated byWhatsApp in order to facilitate the efficiency to identify thesuspect’s IP address. As shown in Figure 5, there are 90attributes in the flat file. After deeper analysis with applyingPivo table to view data from different angles, we have foundsome key attributes for construct patterns for LEAs, i.e.“Differentiated Services Field”, “Flags”, and “DifferentiatedServices Codepoint”. Differentiated services is for providinglow-latency to critical network traffic such as voice orstreaming media while providing simple best-effort service tonon-critical services such as web traffic or file transfers. Themeaning of the field value contained within the flag is oftendefined in the section related to the data structure, and the bitfield is usually associated with a property or privileges.D. Evaluation ResultsWe conduct the experiment to evaluate the effectiveness ofthe proposed framework. We collect network packets in theperiod of 34.505698 seconds which contains the WhatsAppcommunication. Although the traffic is only captured from thelocal area network, the IP list is very complicated due toadditional connections with its software companies andinternet service providers. The results of the geolocations of IPaddress in this experiments, which transformed by whoislookup tools, are shown in Table 1.ICACT2018 February 11 14, 2018

International Conference on Advanced Communications Technology(ICACT)682Figure 4. Packet of STUN ProtocolFigure 5. A flat file for pattern recognitionTABLE 1. THE GEOLOCATIONS OF IP ADDRESSESIP .87.50Whois LookupVictimSuspect(, Taiwan)United States Menlo Park Facebook Inc.Taiwan, Province Of China Taiwan, Province Of China Taipei Facebook Ireland Ltd31.13.70.48157.240.11.51United States United States Los Angeles Facebook Ireland LtdUnited States United States Menlo Park Facebook Inc.ISBN 979-11-88428-00-7ICACT2018 February 11 14, 2018

International Conference on Advanced Communications Technology(ICACT)683In order to investigate the patterns of WhatsAppcommunications, we further imported the headers andpayloads of the captured packets into Pivo table. With thefrequency distribution analysis, we discover that most of thepacket fields consist of random values which are notapplicable for identifying communication patterns. However,the values of several attributes, such as Differentiated ServiceField, Flags, and Differentiated Services Codepoint, are fixed.The attributes with fixed values are selected as the criteria forpattern recognition of WhatsApp communications. Thederived packet at