Example Parser
About this Tutorial¶
This tutorial will walk you through constructing your own PCAP parser using BFP! You can read more about PCAP files themselves here
Obtain a Sample File¶
Whenever you want to make a new parser for a file format, it is always a good idea to get a few small file samples so that you can test the parser as you code. For the purposes of this tutorial, we'll use the ipv4frags.pcap file from the wireshark wiki
Defining the Top Level Struct¶
We begin by defining a new class PcapFile
which inherits from BaseStruct
:
1 2 3 4 |
|
This is the class where we will define all the fields for the format, and it will also be used to create instances when BFP parses a file
Pcap Header¶
Next, we'll define the header for the PCAP file according to its specification:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
A couple of things are new here, so let's go over them one by one:
Retriever
- This defines a new property that this struct will read or write during parsing. This process is also often known as serialization (going from struct to binary representation) and deserialization (going from binary to struct representation)-
Each retriever takes a type which specifies how it will interpret the bytes from the file and assign them to the properties. This is the first argument to the constructor.
Use the API Reference!
If you can't guess what each of the types do, you can always check out the API reference for what the types (e.g.
Bytes
) in eachRetriever
do. You can do the same for any other types here that you may not recognise! -
Each
Retriever
optionally accepts adefault
argument - this is used if you ever decide to create a new instance of your struct in code. This may not always be required, but it is good practice to specify it anyway. - The properties are wrapped in
@formatter:off
and@formatter:on
so that the vertical alignment of the types and arguments does not get messed up. This is the recommended way to write BFP structs, as it maintains readability and allows you to determine a struct's schema at a glance. - At this point, you can create a default
PcapHeader
instance of your own:1
header = PcapHeader()
But we're here to parse files into instances, not create our own instances! So let's add the header definition to the PcapFile
struct:
1 2 3 4 |
|
- Yes, we have just passed
PcapHeader
as the type forRetriever
to serialize! EveryBaseStruct
subclass is a valid type to be provided toRetriever
- this is part of what makes BFP serialization powerful - you can easily define nested structs and the parsing will just work! - Notice that instead of using
default
, we've used adefault_factory
. This is the recommended method to provide default values for any mutable types in structs. - This function is called with a
ver: Version
and it must return an instance ofPcapHeader
which will be assigned toheader
whenPcapFile
is default initialized (when you usePcapFile()
in code)
Why do we have two different syntaxes for defining defaults?
What stops us from using default = PcapHeader()
? There are two reasons:
- The
PcapHeader
instance has no way to know what struct version it is in (this will make more sense in the next section on Struct Versioning in this tutorial). - If done this way, every default instance of a
PcapFile
would point to the samePcapHeader
. Read more about this here.
At this point, the full code looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
You can try to read the pcap file now and see if all goes well:
1 |
|
We get an error:
1 2 3 4 |
|
So what happened?
When BFP parses a file into a struct, it expects all the data to be fully consumed. When this does not happen, it will raise a ParsingError
.
So how do we test the definition we have so far? We can set strict = False
in the from_file
call, and BFP will ignore the unused bytes at the end:
1 2 |
|
Completing the Definition¶
Now we need to be able to parse a list of packets, so let's define a struct for it:
1 2 3 4 5 6 7 8 9 10 |
|
This is almost correct, but notice that currently the data
field is only 1 byte long! We need some way to tell BFP that data
is actually a list of bytes with length captured_length
number of times. We can use a set_repeat
combinator to achieve this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We're almost there! Now let's add the Packet
definition to the PcapFile
and we'll be done:
1 2 3 4 5 6 |
|
Here, Tail
reads a list of it's given type until the end of file
Once again notice that we can simply pass a struct (or any other type in BFP) to a container type like Tail
. This composition of types is at the heart of BFP's declarative style and ease of use.
We're now ready to remove strict = False
and parse the whole file:
1 |
|
Yippee!! You've just created your first serialization file format using BFP!
If you now make edits to this file programmatically, you can save it to a new file:
1 |
|
The Code¶
Here's the completed code in all it's glory:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|