Capturing Packets with Scala Native and libpcap

By William Narmontas,

For the source code, see the Scala application (updated July 2017, March 2018): ScalaWilliam/scala-native-libpcap @ GitHub (PcapExample.scala in particular).

Background

libcap (also: pcap) is a network traffic packet capture library that enables real-time and offline packet capture and analysis. Packet capture and analysis has many use cases.

Scala Native is an ahead-of-time compiler for Scala targeting LLVM and so capable of producing native binaries. This brings the promise of high performance coding using existing Scala skills and high quality tooling such as SBT (Scala Build Tool) and ScalaTest as well as availability of patterns like type classes.

Scala, which runs on the JVM, can interact with native libraries in two ways: JNA (Java Native Access) and JNI (Java Native Interface). JNA slower but easier than JNI. For JNA you need nothing more than a dependency but for JNI you need to write native code. When doing in Scala, you can benefit from the sbt-jni plugin that automates this compilation. Scala Native's interop is similar to JNA.

Pcap4j is an actively maintained library that wraps libpcap using JNA. And the otherway would be to use JNI with Unsafe for the highest performance. The performance is difference is huge. There may be other even higher performance ways, but this is beyond the scope - if you have ideas do let me know on Twitter.

Packets can be captured in live mode using tcpdump, replayed with tcpreplay and visually analysed with Wireshark. libpcap supports live capture and reading from files.

libpcap flow involving data copy from kernel to user space

In live capture mode, the Kernel will look for the next packet at the pcap_next call, pass through any defined filters, and then copy the data into user-space.

There are solutions for pure zero-copy approach but it's beyond the scope of this article.

Why this interests me

  1. A client needed a high performance online packet analyser for the binary-encoded GPRS Tunnelling Protocol (GTP) which runs over UDP (see the GTPv2 specification — large PDF!) and contains cell tower identifiers for 4G mobile subscribers. I implemented one solution with Scala & libpcap.
  2. I worked on Scala projects of mine needed a native (JNA) interface layer to ENet, "Reliable UDP networking library" and lots of binary parsing.
  3. I was researching ways of achieving high performance data processing, including not only JVM Unsafe but also memory mapping and ring buffers.
  4. I'm a big Scala fan, having worked on it professionally and non-commercially since 2013.

Developing a program

For showcasing what's possible - we're going to go through developing a simple libpcap program using Scala Native.

Goal of the program would be to output packet information from files and also from a live interface. Minimal packet information would be: Timestamp, source IP, destination IP, packet length, a few bytes of the packet in hexadecimal.

Note that functional and pure programming are not in scope of this article.

Resources

Preparing some reference data

Which is basically a pcap file. Using tcpdump:

$ tcpdump -i [interface] -w sample.pcap

Capture some packets, maybe do a speed test, then terminate the app and now you have a pcap file which you can consume later.

Use capinfos to get basic information about your pcap file. You can also look at it visually with Wireshark.

How a packet from sample.pcap looks in Wireshark

Tools

I use SBT to continuously run the program using Triggered Execution.

I use Docker and ScalaWilliam/scala-native-sbt Docker image to get an isolated Linux execution environment:

$ docker run -v $PWD:/workspace -w /workspace -it scalawilliam/scala-native-sbt
root@0c957f870d61:/workspace# apt-get -y install libpcap-dev

And of course IntelliJ IDEA for its excellent Scala support.

Minimal required application flow

Packet-reading flows

libpcap offline flow

libpcap live flow

We shall combine two flows into one app: Live and Offline (File).

Processing the packet

tshark can give you this output already, but we're not interested in replacing tshark.

  1. Read timestamp
  2. Read packet length
  3. Determine whether packet is IPv4
  4. Read source IP
  5. Read destination IP
  6. Read some bytes of data
  7. ... and at every step of the way, bound-check

Minimal native mapping

It's the same in memory and in storage

In order to call native methods we need some sort of interface definition. It's similar to defining a C header file which then is then #import'ed.

This was not particularly difficult achieve with the resources. Will be obvious to anyone who's done some C.

@native.link("pcap")
@native.extern
object pcap {

  /** This is just a pointer for us, we don't care what is inside **/
  type pcap_handle = native.Ptr[Unit]

  type pcap_pkthdr = native.CStruct4[native.CUnsignedLong,
                                     native.CUnsignedLong,
                                     native.CUnsignedInt,
                                     native.CUnsignedInt]

  def pcap_open_live(deviceName: CString,
                     snapLen: CInt,
                     promisc: CInt,
                     to_ms: CInt,
                     errbuf: CString): pcap_handle =
    native.extern

  def pcap_open_offline(fname: CString, errbuf: CString): pcap_handle =
    native.extern

  def pcap_next(p: native.Ptr[Unit],
                h: native.Ptr[pcap_pkthdr]): native.CString = native.extern

  def pcap_close(p: native.Ptr[Unit]): Unit = native.extern

}

Minimal code for opening a pcap handle

Here we'll capture from any interface by default.

val pcapHandle = if (live) {
  pcap.pcap_open_live(
    deviceName = c"any",
    snapLen = Short.MaxValue,
    promisc = 0,
    to_ms = 10,
    errbuf = errorBuffer
  )
} else {
  pcap.pcap_open_offline(fname = toCString(args.last),
                         errbuf = errorBuffer)
}

Minimal code for continuously reading the handle

At this point, I was getting closer to pointers and the like and if I did something wrong, I'd get a segfault with exit code 139. Still dislike Java exceptions and verbose stack traces?

val packetHeaderPointer: native.Ptr[pcap.pcap_pkthdr] =
  native.stackalloc[pcap.pcap_pkthdr]
var packetReadData = pcap.pcap_next(pcapHandle, packetHeaderPointer)
var continue = true
while (continue) {
  if (packetReadData != null) {
    process_packet(
      epochSecond = (!packetHeaderPointer._1).toLong,
      dataLength = (!packetHeaderPointer._3).toInt,
      data = packetReadData,
      cooked = cooked
    )
  } else if (!live) {
    continue = false
  }
  if (continue) {
    packetReadData = pcap.pcap_next(pcapHandle, packetHeaderPointer)
  }
}

Processing an individual packet

So at this point we have extracted the key information and pass a C-style string (pointer) to the method.

Note that this C-string is NOT a null-terminated string because packets may contain the byte 0x00 anywhere. So you have to rely on input length to manipulate the incoming data.

/**
  * We have a separate processing function to separate out the plumbing.
  *
  * @param data remember this is a pointer! But note that it may contain byte 0x00
  *             which is typically a string termination character - so we must pass dataLength explicitly.
  */
def process_packet(epochSecond: Long,
                   dataLength: Int,
                   data: CString,
                   cooked: Boolean): Unit = {

Linux Cooked Capture

IF we're capturing on Linux, consider Linux cooked-mode capture (SLL) which can be confusing.

When we're in Cooked mode, there are 2 extra bytes at the front of the packet.

val offsetBytes = if (cooked) 2 else 0

Check bounds and verify it's IPv4

Dealing with IPv6 and and others is a different matter. Here we start incrementing pointers, in a pure manner, mind you.

val hasEnoughData = dataLength > (offsetBytes + PcapDestinationIpv4AddressOffset + 4)
if (!hasEnoughData) return

/** IP version is stored in the first nibble of the target byte **/
val isIpv4 = (!(data + IpVersionByteOffset + offsetBytes) >> 4) == 4
if (!isIpv4) return

Read source and destination IPs

val ip = !(data + PcapSourceIpv4AddressOffset + offsetBytes)
  .cast[Ptr[CUnsignedInt]]

Make an IP human readable

This was one thing that was easier in native land than in JVM land. As far as I'm aware there is no inet_ntoa in the JVM to convert an IP address from Int into text form. This was easily achievable by native binding.

/**
  * We use this to avoid our own byte manipulation.
  * Ironically I have to do this with bytes in Java, so scala-native is already proving itself!
  */
@native.extern
object inet {

  def inet_ntoa(input: CUnsignedInt): native.CString = native.extern

}

The usage is super simple:

val sourceIp = {
  val ip = !(data + PcapSourceIpv4AddressOffset + offsetBytes)
    .cast[Ptr[CUnsignedInt]]
  fromCString(inet.inet_ntoa(ip))
}

Printing the packet summary

One line, one packet - with some data bytes in hex.

print(s"Time: $epochSecond, $sourceIp --> $destIp, $dataLength bytes: [")
(0 to Math.min(dataLength, 12))
  .map { n =>
    !(data + offsetBytes + n)
  }
  .foreach { v =>
    native.stdio.printf(c"%02X", v)
  }
println("...]")

Running for yourself

...so you can reproduce this yourself.

Let's assume you've already started the Docker container as earlier in the article, and produced a sample pcap file. After cloning the ScalaWilliam/scala-native-libpcap repository, do:

root@0c957f870d61:/workspace/scala-native-libpcap# sbt clean 'show nativeLink'
...
[info] /workspace/scala-native-libpcap/target/scala-2.11/scala-native-libpcap-out
[success] Total time: 37 s, completed Mar 26, 2017 3:49:38 AM
root@0c957f870d61:/workspace/scala-native-libpcap# /workspace/scala-native-libpcap/target/scala-2.11/scala-native-libpcap-out /workspace/sample.pcap |head
Time: 1490492402, 192.168.1.53 --> 74.125.68.149, 790 bytes: [1C872...]       <-- Google
Time: 1490492402, 74.125.68.149 --> 192.168.1.53, 579 bytes: [38C9E...]       <-- Google
Time: 1490492403, 192.168.1.53 --> 104.16.59.5, 54 bytes: [1C872...]          <-- CloudFare
Time: 1490492403, 104.16.59.5 --> 192.168.1.53, 64 bytes: [38C98...]          <-- CloudFare
...

Running live

Exercise for the reader... just read the source :-)

Impact

scala-native opens up a plethora of integration opportunities: you are no longer restricted to using JVM-only libraries or waiting for those wrappers, no longer restricted to using C++/C for lower level or high performance programming. You can now rapidly iterate and test your code in JVM mode and port it to native easily.

This opens up the possibility of native interoperation with Python and Lua - and from JVM via Luaj, jep and Jython. Good article: Integrating Python into Scala Stack.

You can write your application in the JVM first knowing you can potentially scale it out later, should the JVM be the pain point. Though in my experience it really isn't, but still worth having that possibility.

Of course there will be many other use cases, and I'd like to mention them here as well - so why not Tweet me about them?

Make sure to watch Denys Shabalin's Scala Days talk "Scala Goes Native" (slides).

Conclusion

We managed to read offline and online packets using a native packet capture library and Scala.

There were no major difficulties while doing so, proving scala-native is a viable platform for native applications for an existing JVM development team who should strongly consider Scala.

I also came across some interesting (PDF) reading about packet capture. And learned that ifconfig is missing from latest Debian!