Sunday, January 3, 2016

Easy passive host discovery, featuring Scapy

If you've ever Wiresharked a large, high traffic LAN, you may have been overwhelmed by the volume of data. Package captures can easily exceed 500 packets per second, and although there is a lot of useful information, you as a human can't possibly process it fast enough.

Enter the world of Python and Scapy. If you haven't ever heard of Scapy, you're missing out. Among other things, it can send and receive custom packets with whatever layers you desire. It's also capable of dissecting packets, including pcap streaming and live sniffing. The latter is what I'll be talking about today. Most packet sniffing tools, like Wireshark and tcpdump, can dissect packets and do lots of powerful analysis on them. The main problem is that both these tools, and others like them, only can do per-packet analysis. What's really useful is when you can do per-host inspection. That's where Python comes in.

If you understand all this stuff, take a quick look at the Github for this project. If not, read on.

Packet Dissection

To understand the following scripts, you'll have to know a bit about what is going on inside a network packet. Each packet is transmitted as simply a string of binary, but depending on the type of information, this binary packet can have many different layers of information.

The lowest layer (usually) is called the Ethernet layer. It usually contains information like the source MAC address, destination MAC address, and a few other parameters, along with an encapsulated higher-level packet. A MAC address is a unique identifier of an interface card. It's not necessarily unique to each computer though, since your Wifi card, ethernet card, bluetooth module, etc all have different MAC addresses. In the following script, I identify each computer by its MAC address, assuming it only has one interface card connected to the network at one time. A useful feature of knowing a MAC address is that you can take a good guess as to the manufacturer of the network card or device. Manufacturers prefix MAC addresses with identifying information, so the manufacturer of a network card can usually be found.

Often, the next layer up is Internet Protocol, either IPv4 or IPv6. This layer is also used for routing of raw packets, but instead of identifying everything by its MAC address, it identifies by an IP address. This is the primary mode of communication for mainstream internet protocols, like HTTP.

Many different messages can be contained within these two layers of packets. For example, DHCP requests are sent every time a host connects to a network. These help give a host its IP address dynamically when it joins. For our purposes, it associates a hostname with an IP address. This can be extremely useful, since most people by default will leave their name in the hostname of their computer (e.g. JoeSmith-Macbook-Pro).

Passive Versus Active

The benefits of passive scanning over active are immense. Although active scanning can reveal much more information, it is much noisier on the network. Mass portscans are the biggest contender for noise, since most IDS or sysadmins will become suspicious of thousands of packets being sent at random to different hosts. Try Wiresharking while you perform an Nmap scan:


If it's done right, you can passively scan without sending a single packet onto the network. Then, once you have enough info, targeting single hosts and specific ports will likely remain unnoticed.

On To The Script

I have created a Python program to attempt to simplify per-host analysis. It uses Scapy to sniff and dissect packets, and then uses various plugins to process the data. The idea is that each plugin has a specific list of required layers, and if a packet has all those layers, it is sent to the plugin for processing.

For example I have a plugin called "ip" which associates IP addresses with MAC addresses in a table of the database. I also have a DHCP plugin that associates IP addresses with hostnames. There are a few more, and custom add-ons can be made easily.

Running it

Running it is pretty simple. You need to do some prep work the first time, like install a 2.x version of p0f. You also need to build the database of MAC address vendors, by going to analyzers/data and running gendb.py.  Finally, you need to make the data directory inside the main dir. Once that is done, you can run scanner.py <interface> to begin sniffing on the selected interface. You can also specify a pcap file, though I haven't tested that yet. 

The program creates no output until you end it with ctrl-c or similar. It will then spit out a line saying how many packets it captured and then exit. Depending on how long it was running, it will have amassed various amount of data. I recommend leaving it for at least an hour or two, and more than 24 hours is the most preferred since it can grab all the DHCP requests.

Viewing the data

There is also a viewer script, viewer.py, which displays all the database information in an easy-to-browse format. The database works on association, so there is a table associating each MAC address to an IP address, a table associating IP to hostname, etc. Along with simply pulling this kind of thing directly from dissected packets, I wrote a plugin to fingerprint operating systems using Scapy's p0f plugin (note that Scapy's p0f support is only for the p0f 2.x databases, so you can't simply install the latest p0f). It doesn't always work, but sometimes it is useful.

Upon running viewer.py, you get a nice listing of all the hosts that were identified. Unfortunately there is often a lot of traffic that falls below the IP layer, so there are a bunch of listings that only show a MAC address. Here's a sample of some more interesting listings:

mac: a8:bb:cf:07:92:50:
manuf: Apple
ip: 10.0.1.19
hostname: None
os: Linux:2.4.2x
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
mac: 68:64:4b:55:93:8e:
manuf: Apple
ip: 10.0.1.10
hostname: LivingRmAppleTV
os: None
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
mac: 00:16:cb:c1:d9:7f:
manuf: Apple
ip: 10.0.1.70
hostname: None
os: None
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
mac: f8:1e:df:df:8c:41:
manuf: Apple
ip: 10.0.1.16
hostname: <redacted>-MBP
os: None
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
mac: ec:35:86:4e:50:d2:
manuf: Apple
ip: 10.0.1.4
hostname: <redacted>-iMac
os: None

Soon, I hope to sort hosts by the amount of information known about them to make looking through this data a bit easier.

As you can see, the first entry's MAC address identified as Apple, but the OS fingerprint identified it as Linux. I don't usually trust these OS fingerprints because they've never really been accurate in my experience. The hostname is really what I deem as important. As you can see, the default hostname was created for each one, so I can see that there are some Mac computers and an Apple TV. This is somewhat of a vulnerability when it comes to default installations, as the default hostnames for most personal computers are usually based on the computer model and name of the owner. If you know someone's name, you now know their IP.