Criminal Profiling: Android Malware Axelle Apvrille - FortiGuard Labs, Fortinet
Nuit du Hack, June 2015
NDH2k15 Wargame
Keep an eye (or two ;) on my slides! Nuit du Hack 2015 - A. Apvrille
2/45
Criminal Profiling
Nuit du Hack 2015 - A. Apvrille
3/45
Kind warning
Plenty of stats (or else)
Feel free to
Please tweet stats correctly though :) Whenever possible, include how stats were computed: it matters (very much) Want to re-use? Sure - please credit (fair, isn’t it?)
Nuit du Hack 2015 - A. Apvrille
4/45
How are stats computed?
Android Package
File properties Manifest properties
DEX properties
Uncompress
Certificate properties
Dalvik executable
Resources, Assets...
Disassemble
ARM exec properties
API, action properties Nuit du Hack 2015 - A. Apvrille
5/45
How are stats computed? 289 static properties See SherlockDroid in Hack.Lu 2014 or IEEE TrustCom-15
Android Package
(upcoming)
DEX properties
File properties Manifest properties
Uncompress
Certificate properties
Dalvik executable
Resources, Assets...
Disassemble
ARM exec properties
API, action properties Nuit du Hack 2015 - A. Apvrille
5/45
Datasets I
Malware: taken from Fortinet’s DB - unique & non damaged samples only
I
Clean: apps we analyzed manually, open source apps, top apps with known developer in Play Store
Why so few clean? Hey, it’s very difficult (and long) to be sure it’s clean! Nuit du Hack 2015 - A. Apvrille
6/45
Number of samples Unless specified otherwise, we considered: Property type Nb of samples Package properties 945,785 DEX format properties 945,785 API call properties etc 945,092 Manifest properties 617,942 Properties in 3rd party kits (AdMob, JUnit...) are ruled out
Why not all? I
Some samples are incomplete (e.g. just classes.dex)
I
Some samples are damaged
I
Some properties are ’optional’ (e.g targetSDK) Nuit du Hack 2015 - A. Apvrille
7/45
Comparisons Many research papers use datasets of 100-1000 samples
We use close to 1 million
Nuit du Hack 2015 - A. Apvrille
8/45
Comparisons Many research papers use datasets of 100-1000 samples
We use close to 1 million
Android Malware Genome dates back to 2011 Our study is on samples collected before March 2015
Nuit du Hack 2015 - A. Apvrille
8/45
Comparisons Many research papers use datasets of 100-1000 samples
We use close to 1 million
Android Malware Genome dates back to 2011 Our study is on samples collected before March 2015
Extensive work: Andrubis (BADGERS’14), PlayDrone (SIGMETRICS’14) Our study focuses on malware with stats on code-level properties
Nuit du Hack 2015 - A. Apvrille
8/45
Criminal Profiling: What Do Malware Look Like? I'm I'm smaller smaller and and simpler simpler
Nuit du Hack 2015 - A. Apvrille
9/45
Sample file size
End of 2014 Clean: 9.2M average 4x bigger than Malware: 2.4M average Malware don’t need to implement all features
Nuit du Hack 2015 - A. Apvrille
10/45
Activities, services, receivers
Nuit du Hack 2015 - A. Apvrille
11/45
Criminal Profiling: What Do Malware Like? I'm I'm smaller smaller and and simpler simpler
II just just love love to to read read // send send SMS, SMS, Install Install apps, apps, create create shortcuts shortcuts
Nuit du Hack 2015 - A. Apvrille
12/45
SMS: a strong indicator!
I
56% of malware implement a SMS receiver! (only 3% of clean)
I
43% of malware send SMS!
I
32% of malware use abortBroadcast() to conceal incoming SMS!
Nuit du Hack 2015 - A. Apvrille
13/45
Criminal Profiling: Other Interests I'm I'm smaller smaller and and simpler simpler
II just just love love to to read read // send send SMS, SMS, Install Install apps, apps, create create shortcuts shortcuts
Camera? Camera? Vibrating? Vibrating? Send Send e-mails. e-mails. Pff! Pff! Not Not interesting. interesting.
Nuit du Hack 2015 - A. Apvrille
14/45
What Malware Like / Don’t Like Don’t Like
Like
I
I
INSTALL PACKAGES: 24% malware ask for it. Only 0.4% clean apps do. NB. Works for system applications only. Install shortcuts: 21% malware, 6% clean apps.
Nuit du Hack 2015 - A. Apvrille
I
Emails. 14% malware < 29% clean (support/contact)
I
Vibrate. 20% malware (ransomware?), 27% clean
I
Is the era of premium phone number dialers over? 1%
I
Camera. 3.7% malware, 7.1% clean. Only if you’re a VIP ? ;)
I
Disable the keyguard. Malware can run background tasks as services... 15/45
Criminal Profiling: Your Permissions, or Your Life! I'm I'm smaller smaller and and simpler simpler
II just just love love to to read read // send send SMS, SMS, Install Install apps, apps, create create shortcuts shortcuts
Camera? Camera? Vibrating? Vibrating? Send Send e-mails. e-mails. Pff! Pff! Not Not interesting. interesting.
Gimme Gimme all all ur ur permissions! permissions!
Web search for Angecryption
Nuit du Hack 2015 - A. Apvrille
16/45
Permissions indicate evil will...
Clear over-use of permissions!!! Nuit du Hack 2015 - A. Apvrille
17/45
Top 5 permissions
Nuit du Hack 2015 - A. Apvrille
18/45
Permissions are not so reliable
Nuit du Hack 2015 - A. Apvrille
19/45
Why can’t we rely on permission stats? A permission may be requested but never used Or the permission can be used within (legitimate?) third party code Example: call permission vs ACTION CALL/ DIAL
We don’t have the manifest for all malware Explains rare cases where use > request Example: BIND DEVICE ADMIN permission vs DeviceAdminReceiver
Bypassing permissions I
Call another app that has the permission
I
Escalate privileges via updating
I
Hijacking the Android installer
I
Use an exploit... Nuit du Hack 2015 - A. Apvrille
20/45
Wide Target
I target even “old” Android system Don't want to lose an opportunity, do I?
Nuit du Hack 2015 - A. Apvrille
21/45
Declared Target SDK
On average I
Malware target Gingerbread
I
Clean apps target Jelly Bean
Stats Considered ’only’ 6,976 malware and 707 clean Why not 900K? I
All samples don’t come with a manifest
I
All manifests don’t come with target SDK
Nuit du Hack 2015 - A. Apvrille
22/45
Malware profiling: targets
I target even “old” Android system Don't want to lose an opportunity, do I?
IItarget targetChina, China,USA, USA,Russia... Russia...
Nuit du Hack 2015 - A. Apvrille
23/45
Geographic attribution statistics
Amount of data I I
Country of application’s certificate (575,396) Rule out unknown countries, buggy and fake entries I I
I
e.g. GF is not a correct country code e.g. VU is Vanuatu but this entry is probably fake: CN=VU OU=VU O=VU L=VU ST=VU C=VU 63% ruled out!
I
Rule out dev / debug certificates (12%)
I
Remaining: 146,764 certificates. 14,919 in 2014, and only 6,308 in 2015 (incomplete).
Nuit du Hack 2015 - A. Apvrille
24/45
Geographic attribution is complicated
Attribution script turned out to be tricky Plenty of cases! I
Certificates using call codes (e.g. +86 for China) or zipcodes
I
Match towns or ’states’ to countries (e.g Gweru is in Zimbabwe)
I
Deal with errors e.g C=CH for China, C=CA for California... Fixed several bugs, but probably others :((
I
I I
C=gg-2 (fake country) was counting for ... Guernsay C=asd3f21asdf was counting for American Samoa
Nuit du Hack 2015 - A. Apvrille
25/45
Malware certificates: target or origin? Examples CN=Praveen Kumar Pendyala OU=Student O=IIT Bombay L=Mumbai ST=Maharastra C=400076 CN=Dau Dinh Manh O=Song Vang L=Ha noi ST=Ha Noi C=84 CN=Zhong Zhang OU=Zhainanzhi Inc O=Zhainanzhi Inc L=FuZhou ST=FuJian C=CN I I
Many certificates with a seemingly valid identity Why mention a particular name? I I I I I
For fame? Because they don’t believe their app is malicious? Because they think we won’t notice? To complexify attribution? Trojanized app where original certificate name was retained?
Nuit du Hack 2015 - A. Apvrille
26/45
Presumed Targets of 146,764 malware
Nuit du Hack 2015 - A. Apvrille
27/45
Top target countries in 2014
Nuit du Hack 2015 - A. Apvrille
28/45
Top target countries in 2015
Nuit du Hack 2015 - A. Apvrille
29/45
Information, I want information!
I target even “old” Android system Don't want to lose an opportunity, do I?
IItarget targetChina, China,USA, USA,Russia... Russia...
Trust Trustme! me!Gimme Gimmeall allur ursecrets! secrets!
Nuit du Hack 2015 - A. Apvrille
30/45
Most representative collected data
Nuit du Hack 2015 - A. Apvrille
31/45
Collected Data
Not so obvious We hadn’t expected the diff with clean apps would be so strong: Captain Obvious: I
IMEI, IMSI, Phone number...
I
IMEI collected ≈ 3 times more for malware
I
Phone number, IMSI, S/N: 6 times more
I
List apps, SIM operator: 4 times more
I
Android ID, MAC address: twice
What reason for those??? I
GPS (≈ 22% for both)
I
Get accounts (9% malware, 13% clean)
Nuit du Hack 2015 - A. Apvrille
32/45
Sidenote: comparing with F-Droid apps
F-Droid (Free and Open Source Software Android apps) far cleaner than the average Nuit du Hack 2015 - A. Apvrille
33/45
Malware authors: how much skills?
I target even “old” Android system Don't want to lose an opportunity, do I?
IItarget targetChina, China,USA, USA,Russia... Russia...
Trust Trustme! me!Gimme Gimmeall allur ursecrets! secrets!
IIlike likehigh highlevel leveldev devbut butnot notlow lowlevel level
Nuit du Hack 2015 - A. Apvrille
34/45
Most frequent techniques
Reminder: code from third party kits are ruled out Nuit du Hack 2015 - A. Apvrille
35/45
Techniques: What Do We Make Out of It? Malware authors are not Unix geeks: I
su (8-10%), chmod (< 2%), mount (< 1%), busybox (≈ 1.5%)
I
Command line installation pm install: only 2.2%
I
Android emulator detection: only 1.4%
Nuit du Hack 2015 - A. Apvrille
36/45
Techniques: What Do We Make Out of It? Malware authors are not Unix geeks: I
su (8-10%), chmod (< 2%), mount (< 1%), busybox (≈ 1.5%)
I
Command line installation pm install: only 2.2%
I
Android emulator detection: only 1.4%
Malware authors are not (particularly) keen on native dev: No significant difference in using JNI (23-26%), executing native process (21-24%)
Nuit du Hack 2015 - A. Apvrille
36/45
Techniques: What Do We Make Out of It? Malware authors are not Unix geeks: I
su (8-10%), chmod (< 2%), mount (< 1%), busybox (≈ 1.5%)
I
Command line installation pm install: only 2.2%
I
Android emulator detection: only 1.4%
Malware authors are not (particularly) keen on native dev: No significant difference in using JNI (23-26%), executing native process (21-24%) Malware authors have development skills: I
Android SDK: abortBroadcast(), DexClassLoader, setComponentEnabledSetting()
I
JavaScript (22.8% malware - only 0.6% clean)
Nuit du Hack 2015 - A. Apvrille
36/45
Techniques: surprises
Why is everybody fond of reflection and encryption? Reflection: 68.9% malware, 50.3% clean Encryption: 39.7% - 27.9% Because they’re old/well-known techniques?
Nuit du Hack 2015 - A. Apvrille
37/45
Techniques: surprises
Why is everybody fond of reflection and encryption? Reflection: 68.9% malware, 50.3% clean Encryption: 39.7% - 27.9% Because they’re old/well-known techniques?
What are clean apps doing with openDexFile and loadDex?! 0.3% malware - 0.4% clean Dalvik.system.DexFile - openDexFile() is private
Nuit du Hack 2015 - A. Apvrille
37/45
Obfuscation: smaller than expected?
Nuit du Hack 2015 - A. Apvrille
38/45
I
NOPs are meaningless
I
Basic obfuscation = ProGuard a, b, c renaming
I
@thuxnder obfuscation (2012) = abusing linear sweep with fill-array-data = 0.5%. All 4,800 samples in 2013.
I
APKProtect: since 2014
Obfuscation (continued) Reliable properties nop opcode, APKProtect string, @thuxnder if-eq v0, v0, +9 fill-array-data v0, +3 fill-array-data-payload
Unreliable property: basic obfuscation I
AESObfuscator-1: used by Android LVL
I
/a/a;->a: simplistic!!!
Issues I I
NOPs mentioned by Mody (VB 2013) Lipovsky (CARO 2014) estimates all abusing linear sweep up to 30% I I
Seems too high Unless I miss samples or case detections?
Nuit du Hack 2015 - A. Apvrille
39/45
Hash algorithms of app certificates
I
malware: 617,942
I
clean: 13,110
I
Are malware authors more tech-savvy than regular developers? Nuit du Hack 2015 - A. Apvrille
40/45
Sidenote: F-Droid developers even more tech-savvy?
Nuit du Hack 2015 - A. Apvrille
41/45
Use of exploits is not widespread Detectors Specific root exploits (Rage in the Cage, Levitator, Zerg Rush...) Generic (and very imperfect) exploit detector
Result 1: my specific root exploit detectors don’t work Rage in the Cage Exploid Levitator Mempodroid Towel Root Zerg Rush
3 4 0 0 0 0
Result 2: generic exploit detector works Detected in 1.6% malware - I certainly miss cases though Yet, exploits are not widespread Nuit du Hack 2015 - A. Apvrille
42/45
Rooting is not specific to malware
Property looks for evidence of tools used on rooted devices: I
com.cyanogenmod
I
com.noshufou.android.su
I
Superuser.apk
I
eu.chainfire.supersu
Both clean and malicious apps look for those ≈ 2%
Nuit du Hack 2015 - A. Apvrille
43/45
Recap I
Stats computed on ≈ 1 million malware. However, some properties (obfuscation, country...) are difficult to spot accurately.
Nuit du Hack 2015 - A. Apvrille
44/45
Recap I
Stats computed on ≈ 1 million malware. However, some properties (obfuscation, country...) are difficult to spot accurately.
I
There’s a general belief that malware are complicated (assembly, emulator detection, exploits etc). Statistically, this is wrong. I I I I
Rooting is not specific to malware Unix commands, exploits, emulator detection < 2% Malware authors are skilled Android developers They don’t like low level dev + Unix
Nuit du Hack 2015 - A. Apvrille
44/45
Recap I
Stats computed on ≈ 1 million malware. However, some properties (obfuscation, country...) are difficult to spot accurately.
I
There’s a general belief that malware are complicated (assembly, emulator detection, exploits etc). Statistically, this is wrong. I I I I
I
Rooting is not specific to malware Unix commands, exploits, emulator detection < 2% Malware authors are skilled Android developers They don’t like low level dev + Unix
Why implement complex schemes when simple code achieves the goal? I I
Malware focus on their goals: money! They are smaller (why code useless stuff?)
Nuit du Hack 2015 - A. Apvrille
44/45
Recap I
Stats computed on ≈ 1 million malware. However, some properties (obfuscation, country...) are difficult to spot accurately.
I
There’s a general belief that malware are complicated (assembly, emulator detection, exploits etc). Statistically, this is wrong. I I I I
I
Why implement complex schemes when simple code achieves the goal? I I
I
Rooting is not specific to malware Unix commands, exploits, emulator detection < 2% Malware authors are skilled Android developers They don’t like low level dev + Unix
Malware focus on their goals: money! They are smaller (why code useless stuff?)
≈ half malware read or send SMS, grab IMEI. They retrieve twice+ more sensitive data than clean apps
Nuit du Hack 2015 - A. Apvrille
44/45
Recap I
Stats computed on ≈ 1 million malware. However, some properties (obfuscation, country...) are difficult to spot accurately.
I
There’s a general belief that malware are complicated (assembly, emulator detection, exploits etc). Statistically, this is wrong. I I I I
I
Why implement complex schemes when simple code achieves the goal? I I
I
I
Rooting is not specific to malware Unix commands, exploits, emulator detection < 2% Malware authors are skilled Android developers They don’t like low level dev + Unix
Malware focus on their goals: money! They are smaller (why code useless stuff?)
≈ half malware read or send SMS, grab IMEI. They retrieve twice+ more sensitive data than clean apps Geographic attribution is difficult. Countries like China, Russia, USA, UK, Vietnam, Ukraine are top targets. Nuit du Hack 2015 - A. Apvrille
44/45
Thanks for your attention! Contact info @cryptax or aapvrille (at) fortinet (dot) com
Thanks to .. my husband
Alligator, Lobster...
More A. Apvrille, L. Apvrille, SherlockDroid: an Inspector for Android Marketplaces, Hack.Lu 2014 M. Lindorfer, M. Neugschwandtner et al ANDRUBIS - 1,000,000 Apps Later: A View on Current Android Malware Behaviors, BADGERS 2014 N. Viennot, E. Garcia, J. Nieh, A Measurement Study of Google Play, SIGMETRICS 2014 That’s the key: Polyglot-File007 Nuit du Hack 2015 - A. Apvrille
45/45