Tuesday, December 22, 2015

Big data is sometimes a big pain...

EDIT: scroll to bottom for a full script.

To start off, I want to explain the background of this project. I run an ADS-B reciever for FlightRadar24.com. If you don't know what ADS-B is, you should definitely check it out. It is the protocol that airplane radar beacons use to transmit location and other data. Using an RTL-SDR module, available for around $30, you too can see local aircraft using a program such as dump1090.

Anyway, the equipment I host is a slightly more expensive, professional grade reciever, with much better decoding capabilities. Combined with the supplied antenna, it is much more accurate than an RTL-SDR reciever. There is a little box that sits on my desk (call it "the module"?) containing the receiver circuitry and a small ARM-based Linux system that sends data back to the main website for everyone to view. After installing it and confirming it was working, I pretty much ignored it.

That is, until recently, when I was debugging some port scanning code I'm writing. I was doing service scans on several hosts on my LAN, and decided to scan all 65536 ports of the module. Something then caught my eye:

$ python main.py 10.0.1.75 1-65535
Scanning 1 hosts (65535 ports per host)
Hosts up: 1
10.0.1.75 22 : SSH-2.0-OpenSSH_6.0

10.0.1.75 10685 : 
10.0.1.75 30003 : AIR,,333,1,AB14D9,101,2015/12/22,23:23:43.125,2015/12/22,23:23:43.125

10.0.1.75 30334 : 2RDa{]?/???

Ports 30003 and 30334 didn't show up in any port databases. Netcatting 30334 resulted in a bunch of unprintables (sample hexdump here), but 30003 ended up more promising:

MSG,3,333,8,780214,108,2015/12/22,23:33:14.4294967089,2015/12/22,23:33:14.4294967089,,35000,,,47.94058,-116.75212,,,0,0,0,0
MSG,4,333,8,780214,108,2015/12/22,23:33:14.4294967119,2015/12/22,23:33:14.4294967119,,,550.0,94.6,,,128,,,,,
MSG,3,333,2,A49441,102,2015/12/22,23:33:14.4294967122,2015/12/22,23:33:14.4294967122,,34975,,,47.56073,-113.82550,,,0,0,0,0
MSG,4,333,2,A49441,102,2015/12/22,23:33:14.4294967122,2015/12/22,23:33:14.4294967122,,,523.0,97.5,,,0,,,,,
MSG,4,333,3,A968DF,103,2015/12/22,23:33:14.4294967163,2015/12/22,23:33:14.4294967163,,,410.0,259.9,,,-64,,,,,
MSG,3,333,7,A94B48,107,2015/12/22,23:33:14.001,2015/12/22,23:33:14.001,,32975,,,46.94609,-115.34965,,,0,0,0,0
MSG,3,333,8,780214,108,2015/12/22,23:33:14.222,2015/12/22,23:33:14.222,,35000,,,47.94049,-116.75061,,,0,0,0,0
MSG,3,333,2,A49441,102,2015/12/22,23:33:14.230,2015/12/22,23:33:14.230,,34975,,,47.56056,-113.82371,,,0,0,0,0


It looks to me like a real-time output of the ADS-B decoder, in CSV format. The only problem is that I have no idea what the columns mean. After capturing several thousand lines of output, I tried to analyze the columns by finding the most common values in each column, using a long shell command:

$ for i in {1..30}; do echo "=============== column $i ==============="; cat flightradar_dump.csv | cut -d ',' -f $i  | sort | uniq -c | sort -rn | head -n20; done

=============== column 1 ===============
6667 MSG
   5 AIR
   4 STA
   3 ID
=============== column 2 ===============
2607 8
1662 4
1650 3
 541 5
 203 1
  12 
   4 6
=============== column 3 ===============
6679 333
=============== column 4 ===============
1595 18
1485 8
1284 20
1127 17
 808 21
 220 6
 125 7
  22 22
  13 19
=============== column 5 ===============
1595 AD4C2A
1485 780214
1284 A48272
1127 AE07FF
 808 AB013C
 220 A81077
 125 A94B48
  22 A53317
  13 AA1F97
=============== column 6 ===============
1595 118
1485 108
1284 120
1127 117
 808 121
 220 106
 125 107
  22 122
  13 119
=============== column 7 ===============
6679 2015/12/22
=============== column 8 ===============
   4 23:45:55.349
   3 23:46:14.257
   3 23:46:10.4294966829
   3 23:46:10.4294966821
   3 23:46:05.4294967104
   3 23:46:05.4294967081
   3 23:46:05.4294967073
   3 23:46:00.4294967272
   3 23:46:00.090
   3 23:45:55.373
   3 23:45:41.221
   3 23:45:41.214
   3 23:45:37.4294966801
   3 23:45:37.4294966793
   3 23:45:32.4294967084
   3 23:45:32.4294967076
   3 23:45:32.252
   3 23:45:28.4294966934
   3 23:45:27.070
   3 23:45:18.4294966925
=============== column 9 ===============
6679 2015/12/22
=============== column 10 ===============
   4 23:45:55.349
   3 23:46:14.257
   3 23:46:10.4294966829
   3 23:46:10.4294966821
   3 23:46:05.4294967104
   3 23:46:05.4294967081
   3 23:46:05.4294967073
   3 23:46:00.4294967272
   3 23:46:00.090
   3 23:45:55.373
   3 23:45:41.221
   3 23:45:41.214
   3 23:45:37.4294966801
   3 23:45:37.4294966793
   3 23:45:32.4294967084
   3 23:45:32.4294967076
   3 23:45:32.252
   3 23:45:28.4294966934
   3 23:45:27.070
   3 23:45:18.4294966925
=============== column 11 ===============
6469 
  48 CPA846
  48 AAL1519
  46 HIRE71
  35 N39WP
  22 SCX285
   4 UAL670
   2 SL
   2 RM
   1 SCX285
   1 N39WP
   1 AAL1519
=============== column 12 ===============
4484 
 630 38000
 452 35000
 308 43000
 280 35975
 215 36000
  98 37975
  73 43025
  57 35025
  23 32975
  21 42975
  13 33000
  11 24000
   5 37025
   5 37000
   4 3675
=============== column 13 ===============
5017 
 163 394.0
 124 393.0
 111 541.0
 103 400.0
 102 543.0
  94 410.0
  92 411.0
  80 396.0
  69 545.0
  67 542.0
  58 395.0
  57 399.0
  55 547.0
  49 407.0
  46 546.0
  42 540.0
  37 406.0
  37 401.0
  35 392.0
=============== column 14 ===============
5017 
 175 273.2
 132 270.3
 125 270.1
 111 272.8
 110 96.1
  96 270.4
  92 272.5
  87 273.3
  84 96.4
  82 96.5
  75 96.6
  51 273.1
  51 272.6
  46 96.2
  42 273.5
  40 272.4
  39 96.3
  27 96.7
  25 272.9
=============== column 15 ===============
5029 
  21 47.47307
  15 47.47311
  15 47.47302
  15 47.47298
  14 47.47293
  14 47.47284
  12 47.47275
  11 47.47243
  10 47.47289
  10 47.47285
  10 47.47270
  10 47.47266
   9 47.47279
   9 47.47234
   9 47.47220
   8 47.47304
   8 47.47183
   7 47.47309
   7 47.47261
=============== column 16 ===============
5029 
   2 -114.02531
   2 -114.01941
   2 -113.90083
   2 -113.83525
   2 -113.78294
   2 -113.61738
   2 -113.53587
   2 -113.51678
   2 -113.48582
   2 -113.46893
   2 -113.46729
   2 -113.46295
   2 -113.45102
   2 -113.42651
   2 -113.41482
   2 -113.35567
   2 -113.34924
   2 -113.29115
   2 -113.28186
=============== column 17 ===============
5017 
 880 0
 388 64
 308 -64
  74 -128
  12 128
=============== column 18 ===============
6675 
   2 1756
   2 1366
=============== column 19 ===============
4484 
2195 0
=============== column 20 ===============
5025 
1654 0
=============== column 21 ===============
4484 
2195 0
=============== column 22 ===============
4582 0
1865 
 220 1
  12 

The first column in the output is the number of occurrences of the value in the second column.

Now allow me to guess using this example message, based on what I know:

MSG,3,333,7,A94B48,107,2015/12/22,23:33:14.001,2015/12/22,23:33:14.001,,32975,,,46.94609,-115.34965,,,0,0,0,0

Column 1 looks like the message type. MSG is the most common, but AIR, STA, and ID were also seen occasionally. No idea what the difference is.

Column 5 is almost certainly the airplane identification number.

Column 6 might be the message length.

Columns 7 and 9 are both a date field. I don't understand why there are two dates, as they are exactly the same.

Columns 8 and 10 are a time field. Yet again they are the same, so the presence of two is strange. None seemed to be different at all:

>>> for i in open('flightradar_dump.csv').readlines():
...     a = i.strip().split(',')
...     if a[7] != a[9]:
...             print 'different:',a[7],a[9]
(no output)

Column 11 appears to be the flight number, including the airline identifier.

Column 12, I would guess to be the plane's altitude. Whether this is meters or feet is unknown.

I'm relatively sure column 13 is speed in mph.

Column 14 could be the plane's direction in degrees.

Columns 15 and 16 are almost certainly latitude and longitude of the plane.

As for the rest of the columns, I have no idea. The internet was no help in this endeavour, though I might be able to find something out from FlightRadar support. I doubt they'd add this feature if they didn't intend people to use it.

EDIT: I've built a simple script to process streamed data. It has a similar, somewhat messy interface as dump1090, but uses local feed data. Run it with:

python fr24_feed.py <ip of receiver>

Output looks something like this now:


ID:         Flight          Alt         Speed              Lat              Lon       Heading     Last seen   PPS
A7D3FE:                 11175ft           mph                E                N           deg          0sec     3
A58672:                 34950ft           mph                E                N           deg          1sec     0
A4CA6A:        752      35000ft      484.0mph      -115.41179E        46.45871N      107.2deg          0sec     7
A500CA:                 20450ft           mph                E                N           deg          0sec     2
89911D:                 38000ft           mph                E                N           deg          0sec     3
A5C4E2:         36      35000ft      466.0mph      -113.48465E        47.69261N       88.9deg          0sec     6
86DCE6:       JAL4      40975ft      507.0mph      -113.85406E        48.29695N       96.7deg          0sec     8
AB3FC8:                  8475ft           mph                E                N           deg          0sec     1
AB4508:                      ft           mph                E                N           deg         14sec     0

Bytes/sec: 3075  Packets/sec: 30  Avg bytes/pkt: 102

Planes visible: 9  Total seen: 9

In conclusion, the FlightRadar receiver is much more accurate than others I have seen. At a roughly calculated rate of 30-40 message per second, lots of processing may be unfeasible. Recording it may also be hard, as mine used about 150 kilobytes per minute of disk space.

Also here's the script:





Sunday, December 20, 2015

Bro IDS + Python == success!

I recently learned about the Bro IDS project, and I think it's really cool! The only problem is that I didn't want to have to learn their special language to process network data. I'm more used to Python's powerful tools to do processing like this. So I wrote BroScanner, which essentially streams from Bro logfiles and reads the CSV into an easily parseable python dict. It can also pipe out the data as JSON for other programs to analyze.

Today, I want to show you how to use BroScanner by creating some small passive IDS scripts with it. First, you need to download and configure Bro. Then you'll need BroScanner and its dependencies, see the aforementioned Github repo. Once Bro is up and running, you can run main.py to extract json from one of the logfiles. Assuming Mac or Linux, you can list the logfiles with ls /usr/local/bro/spool/bro. 


$ ls /usr/local/bro/spool/bro
app_stats.log dns.log       files.log     notice.log    ssl.log       stdout.log    x509.log    conn.log      dpd.log       http.log      software.log  stderr.log    weird.log

(This isn't a complete list of the files, they are created as needed by bro)

$ python main.py software.log
{"host_p": null, "name": "Unspecified WebKit", "unparsed_version": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.20.94 Safari/537.36", "version.minor": 36, "version.minor3": null, "ts": 1450646720.519703, "host": "10.0.1.19", "version.minor2": null, "version.addl": null, "software_type": "HTTP::BROWSER", "version.major": 537}

(don't include the path in any calls to BroScanner, it is added automatically)

$ cat /usr/local/bro/spool/bro/software.log 
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path software
#open 2015-12-20-14-25-21
#fields ts host host_p software_type name version.major version.minor version.minor2 version.minor3 version.addl unparsed_version
#types time addr port enum string count count count count string string
1450646720.519703 10.0.1.19 - HTTP::BROWSER Unspecified WebKit 537 36 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Spotify/1.0.20.94 Safari/537.36

As you can see, this json is a lot more readable than the default CSV-type logfiles. It can be piped to any program that can decode JSON, but it can also be read directly inside your Python file as a dict. This makes it really easy to create your own intrusion detection programs.


Virus scanning


For example, I wrote this simple script to check for prohibited mime types and report their hash digests:

## simple file type detection.
## if i had unlimited api access, i would send every single md5 to something like VirusTotal

forbidden_types = ['text/x-bash', 'text/scriptlet', 'application/x-opc+zip', 'application/com', 'application/x-msmetafile', 'application/x-shellscript', 'text/x-sh', 'text/x-csh', 'application/x-com', 'application/x-helpfile', 'application/hta', 'application/x-bat', 'application/x-php', 'application/x-winexe', 'application/x-msdownload', 'text/x-javascript', 'application/x-msdos-program', 'application/bat', 'application/x-winhelp', 'application/vnd.ms-powerpoint', 'text/x-perl', 'application/x-javascript', 'application/x-ms-shortcut', 'application/vnd.msexcel', 'application/x-msdos-windows', 'text/x-python', 'application/x-download', 'text/javascript', 'text/x-php', 'application/exe', 'application/x-exe', 'application/x-winhlp', 'application/msword', 'application/zip']

import bparser

for i in bparser.parseentries('files.log'):
if i['mime_type'] in forbidden_types: ## scan all of these types
print
print "{} downloaded a file from {} via {}".format(i['rx_hosts'],i['tx_hosts'],i['source'])
print "Filename: {}  length: {}  mime type: {}".format(i['filename'],i['total_bytes'],i['mime_type'])
print "MD5: {}  SHA1: {}".format(i['md5'],i['sha1'])

This script could be implemented in a workplace, to monitor employee downloads and possible viruses. If a virus is detected, one could implement the appropriate security measures on the affected machine (as we know its IP address). As stated in the comment, one could implement a check for each hash to be sent to a virus database. Rate-limiting would be a problem though, so in a workplace setting, local virus databases would be more suitable.


Authorized devices only


Another example of a use of this is scanning for unauthorized devices on the network. Since each device has its own unique MAC address, only specific addresses could be allowed.

$ python main.py dhcp.log
{"lease_time": 86400.0, "uid": "CO9XE21x6UMR8CZvZ2", "id.orig_p": 68, "id.resp_h": "10.0.1.1", "ts": 1450648870.748178, "id.orig_h": "10.0.1.19", "id.resp_p": 67, "mac": "a8:bb:cf:07:92:50", "trans_id": 3937219451, "assigned_ip": "10.0.1.19"}

Mac addresses start with a code that identifies the manufacturer of the network interface card. One could write a script that whitelists certain MAC addresses or manufacturers. Every day, each host has to register with DHCP services, so there is little chance of escaping this detection. Additionally, once an unauthorized host is detected, a managed router could be used to sinkhole that host from any network communication.

One small problem with using only the manufacturer ID to whitelist hosts is the fact that MAC addresses can be spoofed. However, if there is a small list of full MAC addresses whitelisted, this problem virtually disappears.

If a company decided to disallow all mobile phones from using the protected corporate network, they could scan for MAC addresses that contained the manufacturer ID of popular smartphone brands. Since MAC address spoofing is practically nonexistent on phones, this would be highly effective.


Notifying IT


A final example of a use for BroScanner is notifying admins of portscans and other possible intrusions that Bro detects. I discovered a logfile that contains this information, and all I needed to do was implement some kind of admin notification.

$ python main.py notice.log
{"uid": null, "id.resp_p": null, "actions": "Notice::ACTION_LOG", "fuid": null, "dropped": false, "remote_location.region": null, "remote_location.country_code": null, "sub": "local", "proto": null, "dst": "10.0.1.1", "ts": 1450650532.338251, "note": "Scan::Port_Scan", "peer_descr": "bro", "msg": "10.0.1.19 scanned at least 15 unique ports of host 10.0.1.1 in 0m0s", "remote_location.city": null, "remote_location.longitude": null, "remote_location.latitude": null, "src": "10.0.1.19", "id.orig_p": null, "id.resp_h": null, "n": null, "id.orig_h": null, "p": null, "file_mime_type": null, "file_desc": null, "suppress_for": 3600.0}

So a simple Python file can notify the appropriate people:

import bparser
import notifiers

for i in bparser.parseentries('notice.log'):
print i
notifiers.notify(i['msg'])

As you can see, there was a portscan detected, even though I ran it with nmap -v -PR -Pn -sV 10.0.1.1 to try to minimize traffic. Bro's builtin threat detection is no slouch.

The notifiers module contains whatever types of notifications you would like to send. I implemented the OSX notification center for a local Bro instance, so OSX users will see a popup notification informing them of the intrusion. Email, SMS, and many other forms of notification could be implemented, giving IT the edge in eliminating threats. As previously stated, network routers could sinkhole these hosts, effectively removing the threat.

With Python, the possibilities are endless! There are many more uses for this program, and I'm interested to see what people come up with.

Extracting data from iPhone backups, part 2

Continuing with my previous post, here are some more things I've found digging through iPhone backups.

I found a gzipped json file, which apparently contained the entire StarWalk database. It has orbital parameters, links to 3d models, descriptions, and all kinds of other stuff. Have fun exploring that...

I also wrote a simple script to pull specific files into a local directory, findfiles.py. It looks through the locally generated database (from gendb.py) and copies matching file types to the local data directory. It makes it easier to sift through file types because it adds an extension you specify.

A typelist of this backup looks like this:


{u'ASCII C++ program text': 4,
 u'ASCII English text': 17,
 u'ASCII FORTRAN program text': 5,
 u'ASCII Java program text': 61,
 u'ASCII Pascal program text': 17,
 u'ASCII text': 1396,
 u'Apple binary property list': 463,
 u'AppleDouble encoded Macintosh file': 44,
 u'Bio-Rad .PIC Image File 12850 x 29472': 8,
 u'CoreAudio Format audio file version 1': 1,
 u'DBase 3 data file': 1,
 u'DBase 3 data file (247624 records)': 1,
 u'DBase 3 data file with memo(s)': 1,
 u'Dyalog APL component file version 240 .64': 1,
 u'GIF image data': 3,
 u'HTML document text': 1,
 u'IFF data': 7,
 u'ISO Media': 11,
 u'JPEG image data': 2242,
 u'Little-endian UTF-16 Unicode English text': 3,
 u'Little-endian UTF-16 Unicode text': 2,
 u'Non-ISO extended-ASCII text': 3,
 u'PDP-11 UNIX/RT ldp': 7,
 u'PNG image data': 122,
 u'RIFF (little-endian) data': 466,
 u'SQLite 3.x database': 132,
 u'SysEx File -': 1,
 u'TIFF image data': 1,
 u'UTF-8 Unicode English text': 3,
 u'XML  document text': 1334,
 u'Zip archive data': 2,
 u'a python script text executable': 111,
 u'core file (Xenix)': 1,
 u'data': 190,
 u'empty': 25,
 u'gzip compressed data': 1,
 u'text/html;': 2,
 u'troff or preprocessor input text': 1,

 u'vCard visiting card': 3}

You'll notice a lot of source code files. Most of these are miscategorized python files that came with a python app I downloaded. Others, such as the APL file, are complete gibberish. The Bio-Rad .PIC Image File 12850 x 29472 files were a mystery to me, as I couldn't open them by any conventional means. I also couldn't find a lot of info online about them. 

I found several "UTF-8 Unicode English text" files with lots of entries like this:

{
            "date_added": "13055051350450917",
            "id": "211",
            "meta_info": {
               "stars.id": "ssc_94e670585b99b60a",
               "stars.imageData": "Cg51dVN5dTZsZHZlWWM1TRIyCipodHRwOi8vaW1ncy54a2NkLmNvbS9jb21pY3MvZWZmaWNpZW5jeS5wbmcQyQIY1gEaYwpbaHR0cDovL3QyLmdzdGF0aWMuY29tL2ltYWdlcz9xPXRibjpBTmQ5R2NScmtkUkVtSmJ0WnFYV1ZJelRYMTJ4ZjNQSENtUDFNRVB5NzI1LW5TYVp5cTFMa1pqNRCHAhirASIeCAEQhwIYqwEgtTlCBhIEgAAAAEoJaW1hZ2UvcG5n",
               "stars.isSynced": "true",
               "stars.pageData": "Ig5rT21tdS1VQ2Z1MnR2TQ==",
               "stars.type": "2"
            },
            "name": "xkcd",
            "sync_transaction_version": "2",
            "type": "url",
            "url": "http://m.xkcd.com/"
         } ],
         "date_added": "13055044120539455",
         "date_modified": "13055051350450917",
         "id": "3",
         "name": "Mobile Bookmarks",
         "sync_transaction_version": "3",
         "type": "folder"
}

I assume this is from the Chrome app, and it looks to me like a list of my bookmarks.

Upon extracting all "ASCII text" files, I was mystified by the number of files looking like this:

22 serialization::archive 10 0 0 0 0 1025 2 0 2 0.32507935166358948 0.0028225002 3.1072781 0.51664400100708008 0.074370861 10.875457 0.62693876028060913 0.12955415 0.8343581 0.85913830995559692 0.16931733 0 1.3990023136138916 0.16308928 10.289221 1.4454421997070312 0.02279284 8.7684822 1.7705215215682983 0.24627544 0 

I have no idea what these are, and a search for "serialization::archive" was inconclusive. Many hundreds of these contained thousands of seemingly random floats. Based on related data, They may actually be encoded sound files for the app iMaschine, though I can't really prove that.

I am uploading some mysterious files to https://gist.github.com/red-green/132e5680b8ffc09e91d7, in the hope that someone can help me. The first one, mystery1.js, I can only assume to be internally executed, perhaps inside the AppStore. I added one of the mysterious lists of floats as serialization_archive1.txt. 

Another file contained lists of this:

"+T9EE5FPCpiAKOlsptLH050hrtQ+BwuL73a9JmAvGgY=": {
      "dynamic_spki_hashes_expiry": 0.0,
      "expiry": 1452557234.251196,
      "mode": "force-https",
      "pkp_include_subdomains": false,
      "pkp_observed": 1421811398.470302,
      "sts_include_subdomains": true,
      "sts_observed": 1421811398.470302
   }

My thought was HTTPS certificates, but I'm clueless. The base64 doesn't seem to have anything in the way of printable characters.

Another text file contained some settings for the GSM radio:

TxPowerModel::fProxSetting=0x0
TxPowerModel::fProxLogging=0x0
BootTime=0x551d95a5
CommCenterStartsThisBoot=0x1
TraceModuleExtreme::CSILog::fEnabled=0x0
TraceModuleExtreme::CSILog::fHistory=0xffffffff
GsmRadioModule::fPreviousBootUptime=0x538a5271
GsmSettingsModel::UnconditionalCallForwardingActiveIccid=<a big number>
GsmSettingsModel::UnconditionalCallForwardingActiveValue=0x0
Settings::AttachAPNSupport=0x2
MaxDataRateManager::Enable3GSwitchSupport=0x1
MaxDataRateManager::EnableLTESwitchSupport=0x2
SettingsModel::CallingLineIdRestriction=0x0
RegistrationModel::fLastKnownServingMCC=0x137
RegistrationModel::fLastKnownServingMNC=0x1e0
RegistrationModel::fNetworkSelection=
Capabilities::SimCallAndDataSupported=0x0
Capabilities::SimCallAndDataCurrentlySupported=0x0
MaxDataRateManager::EnableLTE=0x1
GsmRadioModule::fCellularDataIsEnabled=0x1
EURQMIC2KRegistrationController::fEnableOnlyHomeNetwork=0x0

It may be possible to edit one of these files and then restore the phone from backup to change settings. However, the fact that every file has an MD5 hash may prevent that from happening.

A few of these were found too:

{"secondsSpentInCurrentSession":29,"lastSuspendTime":1421191136,"numInterruptions":0}

They might be for some kind of power timer.

Finally, for now, I found several files claiming to be "PDP-11 UNIX/RT ldp" floppy drive images. I can't extract them with the tools I have, but I doubt they actually are what they claim to be. The likelihood of finding several kilobyte-sized floppy images on an iPhone is highly unlikely.

In conclusion, data scraping is hard to do when you don't have the original file name or path. I hope somebody learned something from this. I might do more in the future.

Saturday, December 19, 2015

Extracting data from iPhone backups, part 1

If you ever get the opportunity to iTunes sync with someone's iPhone, there is a huge treasure trove of data to be had that you may not have known about. Every time a phone syncs, it downloads a whole bunch of data to ~/Library/Application Support/MobileSync/Backup/<md5 folder name>/ (assuming you're on a Mac). All the files are named with an md5 with no extension, so at first glance, it's impossible to sift through the data. I wrote a few tools to help me through it.

First, you need to set up the database by plugging the appropriate md5 into the gendb.py, and then running it. It uses the unix file utility to identify the file and then stores all that in the files table of filetypes.db, which has the schema of name, type, xtype, len, atime, mtime.

The other two scripts use this database to scan through it all. typecount.py will tell you how many of each filetype there are. sqlist.py will scan through all the SQLite3 databases and report the tables they contain. This is useful in identifying the databases in question.

With that out of the way, I found several interesting databases:

$ python sqllist.py 
/Users/jacksonservheen/Library/Application Support/MobileSync/Backup/c02ecb74b86a49f95ec96d1a87f4dd4b6abf4b46-20151205-132750/Snapshot
Database 06196a61cf209070363f0b0a05b385a25976c2df
Tables: _SqliteDatabaseProperties, PairedDevices
------------------------------
Database 0d609c54856a9bb2d56729df1d68f2958a88426b
Tables: ZCHECKUPEVENT, ZDEMOLIVEUSAGE, ZEVENT, ZEVENTSCENE, ZLIVEUSAGE, ZPEER, ZPROCESS, ZTSHOOTINGDATA, ZWIFIDATA, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database 12b144c0bd44f2b3dffd9186d3f9c05b917cee25
Tables: Z_METADATA, ZCLOUDSHAREDALBUMINVITATIONRECORD, ZCLOUDFEEDENTRY, ZMOMENTLIBRARY, ZASSETDESCRIPTION, ZALBUMLIST, ZMOMENT, ZADDITIONALASSETATTRIBUTES, Z_1KEYWORDS, ZMOMENTLIST, ZGENERICALBUM, Z_15ALBUMLISTS, Z_16ASSETS, ZFACE, ZCLOUDSHAREDCOMMENT, ZSEARCHDATA, ZGENERICASSET, ZADJUSTMENT, ZCLOUDMASTER, ZUNMANAGEDADJUSTMENT, ZASSETTOALBUMORDER, ZCLOUDRESOURCE, ZCLOUDMASTERMEDIAMETADATA, ZKEYWORD, ZSIDECARFILE, Z_PRIMARYKEY
------------------------------
Database 2041457d5fe04d39d0ab481178355df6781e6858
Tables: _SqliteDatabaseProperties, sqlite_sequence, Identity, ExceptionDate, Category, CategoryLink, ResourceChange, OccurrenceCache, OccurrenceCacheDays, Store, StoreChanges, Calendar, CalendarChanges, Participant, ParticipantChanges, Location, CalendarItem, CalendarItemChanges, Alarm, AlarmChanges, Recurrence, RecurrenceChanges, EventAction, EventActionChanges, Attachment, AttachmentChanges, Sharee, ShareeChanges, Notification, NotificationChanges, ScheduledTaskCache, ClientCursor, ClientSequence, ClientCursorConsumed
------------------------------
Database 22b5fb3c3890cfc5cee685c923922e8ebe8ee9fd
Tables: ZSETTING, Z_PRIMARYKEY, Z_METADATA, ZALARM
------------------------------
Database 2b2b0084a1bc3a5ac8c27afdf14afb42c61a19ca
Tables: _SqliteDatabaseProperties, call, sqlite_sequence
------------------------------
Database 303e04f2a5b473c5ca2127d65365db4c3e055c05
Tables: Z_METADATA, ZRECORDING, ZENTITYREVISION, ZDATABASEPROPERTY, Z_PRIMARYKEY
------------------------------
Database 31bb7ba8914766d4ba40d6dfb6113c8b614be442
Tables: _SqliteDatabaseProperties, ABStore, sqlite_sequence, ABPersonSearchKey, ABPersonBasicChanges, ABGroup, ABAccount, ABGroupMembers, ABMultiValueLabel, ABMultiValueEntry, ABMultiValueEntryKey, ABPhoneLastFour, ABPersonMultiValueDeletes, ABRecent, ABGroupChanges, ABPerson, ABPersonChanges, ABPersonLink, FirstSortSectionCount, LastSortSectionCount, FirstSortSectionCountTotal, LastSortSectionCountTotal, ABMultiValue, ABPersonFullTextSearch, ABPersonFullTextSearch_content, ABPersonFullTextSearch_segments, ABPersonFullTextSearch_segdir, ABPersonFullTextSearch_docsize, ABPersonFullTextSearch_stat
------------------------------
Database 3d0d7e5fb2ce288813306e4d4636395e047a3d28
Tables: _SqliteDatabaseProperties, message, sqlite_sequence, chat, attachment, handle, message_attachment_join, chat_handle_join, chat_message_join
------------------------------
Database 4096c9ec676f2847dc283405900e284a7c815836
Tables: TableInfo, Fences, CompassCalibration, ClusterCompassCalibration, BeaconFences
------------------------------
Database 462db712aa8d833ff164035c1244726c477891bd
Tables: cookies
------------------------------
Database 5a4935c78a5255723f707230a451d79c540d2741
Tables: ZCALLDBPROPERTIES, ZCALLRECORD, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database 609b8b513a0f7eee156dd7fc06ffe1919d9da812
Tables: ZALBUMACTIVITY, ZALBUMCHANGECOUNTER, ZASSETACTIVITY, ZASSETCHANGECOUNTER, ZCHANGECOUNTERMAX, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database 64d0019cb3d46bfc8cce545a8ba54b93e7ea9347
Tables: admin, access, access_times, access_overrides
------------------------------
Database 75b12106910f0b106f64d72eb75397427884fd5a
Tables: sqlite_sequence, properties, metadata, recents, contacts
------------------------------
Database 80c42a429a2e9877c4972b1e1ae246efc55f9c3c
Tables: ZCANCELEDDOWNLOAD, ZPUSHNOTIFICATION, ZPUSHNOTIFICATIONCLIENT, ZPUSHNOTIFICATIONENVIRONMENT, ZRINGTONEPURCHASE, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database 8896671f94fe1f6dc638d66154c4799ebd07f7d3
Tables: TableInfo, GyroCalibration
------------------------------
Database 9143d986a77ab8cf5878e4e9ac80627477eb6674
Tables: ZMICROPAYMENTBASE, ZMICROPAYMENTCLIENT, ZMICROPAYMENTDOWNLOAD, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database 943624fd13e27b800cc6d9ce1100c22356ee365c
Tables: ZACCESSOPTIONSKEY, Z_METADATA, ZDATACLASS, ZACCOUNT, Z_2PROVISIONEDDATACLASSES, Z_2ENABLEDDATACLASSES, ZACCOUNTTYPE, Z_4SUPPORTEDDATACLASSES, Z_4SYNCABLEDATACLASSES, ZAUTHORIZATION, ZACCOUNTPROPERTY, Z_1OWNINGACCOUNTTYPES, Z_PRIMARYKEY
------------------------------
Database 992df473bbb9e132f4b3b6e4d33f72171e97bc7a
Tables: _SqliteDatabaseProperties, voicemail, sqlite_sequence
------------------------------
Database 9c2390b6a6db7028ca5b61aef42d90cb6065bfc5
Tables: software_update
------------------------------
Database af0a461cff85322d0c029fedc42e7841ecbd5b9f
Tables: kvs_value
------------------------------
Database ca3bc056d4da0bbf88b5fb3be254f3b7147e639c
Tables: ZACCOUNT, ZNEXTID, ZNOTE, ZNOTEATTACHMENT, ZNOTEBODY, ZNOTECHANGE, ZPROPERTY, ZSTORE, Z_PRIMARYKEY, Z_METADATA
------------------------------
Database cd6702cea29fe89cf280a76794405adb17f9a0ee
Tables: _SqliteDatabaseProperties, ABThumbnailImage, ABFullSizeImage
------------------------------
Database d1f062e2da26192a6625d968274bfda8d07821e4
Tables: generations, sync_properties, bookmarks, folder_ancestors, bookmark_title_words
------------------------------
Database d22d4e3c06ed92f73666c24a1cd0c80466c7d4d5
Tables: ZMICROPAYMENTDOWNLOAD, Z_PRIMARYKEY, Z_METADATA, ZMICROPAYMENTCLIENT, ZMICROPAYMENTBASE
------------------------------
Database d2acb1ec24ed4669ec97974578478cff5bd236f9
Tables: CacheGroups, sqlite_sequence, Caches, CacheWhitelistURLs, CacheAllowsAllNetworkRequests, FallbackURLs, CacheEntries, CacheResources, CacheResourceData, DeletedCacheResources, Origins
------------------------------
Database ed1f8fb5a948b40504c19580a458c384659a605e
Tables: _SqliteDatabaseProperties, bundle_uuid, sqlite_sequence, bundle_info, subscriber_info
------------------------------
Database ff04ff376697dd4223132d3c16218676bbc24890

Tables: ZGAIHIT, ZGAIPROPERTY, Z_PRIMARYKEY, Z_METADATA

The first db that caught my attention contained tables such as message, chat, and attachment. Some SQLite digging revealed that the message table contained all the SMS/iMessage conversations on the phone.

I also found a tool that supposedly allows you to extract the cellphone location from whatever database contains the table CellLocation.

Along with all these SQLite dbs, I found a number of "Adaptive Multi-Rate Codec (GSM telephony)" files. I assume these are either voicemail recordings or some other recorded data, maybe even recorded phone calls. Since this backup wasn't from my phone, I was hesitant to try to listen to them.

More to come soon I guess?