PyBGPStream Tutorial

PyBGPStream provides a Python interface to the libBGPStream C library.

Below we provide the following tutorials:

Get familiar with the API (code)
Enable caching of downloaded MRT files (code)
Print the MOAS prefixes (code)
Measuring the extent of AS path inflation (code)
Studying the communities (code)
Accessing live-stream data sources ( routeviews-stream code ris-live code )

Get familiar with the API

As a first example, we use pybgpstream to output the information extracted from BGP records and BGP elems. We provide a step by step description and the link to download the script at the end of the section. The example is fully functioning and it can be run using the following command:

$ python pybgpstream-print.py
update|A|1499385779.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.224.0/19|206.126.236.24|11666 3356 3786|11666:1000 3356:3 3356:2003 3356:575 3786:0 3356:22 11666:1002 3356:666 3356:86|None|None
update|A|1499385779.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.0.0/19|206.126.236.24|11666 3356 3786|11666:1000 3356:3 3356:2003 3356:575 3786:0 3356:22 11666:1002 3356:666 3356:86|None|None
update|A|1499385788.000000|routeviews|route-views.eqix|None|None|11666|206.126.236.24|210.180.64.0/19|206.126.236.24|11666 6939 4766 4766|11666:2000 11666:2001|None|None
...

Step by step description

The first step in each pybgpstream script is to import the Python modules and create a new BGPStream instance.

import pybgpstream
stream = pybgpstream.BGPStream(
    from_time="2017-07-07 00:00:00", until_time="2017-07-07 00:10:00 UTC",
    collectors=["route-views.sg", "route-views.eqix"],
    record_type="updates",
    filter="peer 11666 and prefix more 210.180.0.0/16"
)

During the creation of the BGPStream instance, we also added a few filters to narrow the stream.

from_time and until_time specifies the beginning and ending time of the stream.
collectors narrows the stream to have records from the specified collectors
record_type="updates" indicating we want only updates (i.e. not RIB dumps)
lastly, filter string specifies more flexible and power filter conditions. More on filtering.

At this point we can start the stream, and repeatedly ask for new BGP elems. Each time a valid record is read, we extract from it the elems that it contains and print the record and elem fields. If a non-valid record is found, we do not attempt to extract elems.

for elem in stream:
    # record fields can be accessed directly from elem
    # e.g. elem.time
    # or via elem.record
    # e.g. elem.record.time
    print(elem)

Complete Example

Get the code: pybgpstream-print.py.

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    from_time="2017-07-07 00:00:00", until_time="2017-07-07 00:10:00 UTC",
    collectors=["route-views.sg", "route-views.eqix"],
    record_type="updates",
    filter="peer 11666 and prefix more 210.180.0.0/16"
)

for elem in stream:
    # record fields can be accessed directly from elem
    # e.g. elem.time
    # or via elem.record
    # e.g. elem.record.time
    print(elem)

Enable caching of downloaded MRT files

Data files processed by the broker can now be cached to a local directory which is checked before downloading a dump file. Previously, when using BGPStream to repeatedly process the same data (e.g., when testing/debugging code), poor network connectivity could add overhead to processing time. The caching implementation is thread safe and can support parallel instances of BGPStream (either as threads or separate processes). The cache can be enabled by setting the cache-dir parameter of the "broker" data interface by calling:

stream.set_data_interface_option("broker", "cache-dir", "/path/to/cache")

Print the MOAS prefixes

In this second tutorial we show how to use pybgpstream to output the MOAS prefixes and their origin ASes. The example is fully functioning and it can be run using the following command:

$ python pybgpstream-moas.py
('194.68.55.0/24', '43893,30893')
('199.45.53.0/24', '701,65403')
('207.188.170.0/24', '13332,26640')
('8.6.245.0/24', '11096,10490')
('65.111.243.0/24', '20193,30691')
('195.246.126.0/23', '6714,49258')
('63.139.84.0/24', '65000,53286')
('219.232.108.0/24', '4808,4847')
...

The program parses the BGP elems extracted from the BGP records that match the filters (collectors, record type, and time), saves in a hash map the list of unique origin ASns for each prefix, and outputs those that have multiple origin ASns.

Step by step description

In this case the stream is configured to return the BGP records read from a RIBs dump generated by the Route View Singapore collector, having a timestamp in the interval 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT.

stream = pybgpstream.BGPStream(
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["route-views.sg"],
    record_type="ribs",
)

We use a dictionary to associate a list of origin ASns with each prefix observed in the RIB dump.

from collections import defaultdict

prefix_origin = defaultdict(set)

Each time a new BGP elem is extracted, the program extracts the prefix and the origin ASn and updates the prefix_origin dictionary. Prefix and AS-path are string fields that are present in any BGP elem of type RIB. The split function converts the AS path string into an array of strings, each one representing an AS hop, the last hop is the origin AS.

pfx = elem.fields["prefix"]
ases = elem.fields["as-path"].split(" ")
if len(ases) > 0:
    origin = ases[-1]
    prefix_origin[pfx].add(origin)

Complete Example

Get the code.

#!/usr/bin/env python

from collections import defaultdict
import pybgpstream

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc00"],
    record_type="ribs",
)

# <prefix, origin-ASns-set > dictionary
prefix_origin = defaultdict(set)

for rec in stream.records():
    for elem in rec:
        # Get the prefix
        pfx = elem.fields["prefix"]
        # Get the list of ASes in the AS path
        ases = elem.fields["as-path"].split(" ")
        if len(ases) > 0:
            # Get the origin ASn (rightmost)
            origin = ases[-1]
            # Insert the origin ASn in the set of
            # origins for the prefix
            prefix_origin[pfx].add(origin)

# Print the list of MOAS prefix and their origin ASns
for pfx in prefix_origin:
    if len(prefix_origin[pfx]) > 1:
        print((pfx, ",".join(prefix_origin[pfx])))

Measuring the extent of AS path inflation

In this example, we show how to use pybgpstream to measure the extent of AS path inflation, i.e. measure how many AS paths are longer than the shortest path between two ASes due to the adoption of routing policies. The example is fully functioning and it can be run using the following command:

$ python pybgpstream-aspath.py
   ...
   3549 27316 6 5
   3549 27314 3 3
   3549 27313 3 3
   3549 27310 3 3
   3549 27311 3 3
   3549 45834 4 4
   3549 27318 4 3
   3549 27319 5 4
   3549 18173 4 4
...

The program reads a RIB dump as originated by the RIS RRC00 collector, it computes the number of AS hops between the peer ASn and the origin AS, and it compares it to the shortest path between the same AS pairs in an simple undirected graph built using the AS path adjacencies. The output complies with the following format:

<monitor ASn> <destination ASn> <#AS hops in BGP> <#AS hops in undirected graph>

Step by step description

In this case the stream is configured to return the BGP records read from a RIBs dump generated by RIS RRC00 collector, having a timestamp in the interval 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT.

stream = pybgpstream.BGPStream(
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc00"],
    record_type="ribs",
)

The script uses the NetworkX package utilities to generate a simple undirected graph (i.e. a graph that does not have loops or self-edges). A dictionary of dictionaries is used to maintain the shortest path between the peer ASn and the origin ASn as observed in BGP.

import networkx as nx
from collections import defaultdict

as_graph = nx.Graph()

bgp_lens = defaultdict(lambda: defaultdict(lambda: None))

Each time a new BGP elem is extracted, the program removes the ASns that are repeatedly prepended in the AS path (using the groupby function), counts the number of AS hops between the peer and the destination AS (i.e. the origin AS), and saves this information in the bgp_lens dictionary. Each adjacency in the reduced AS path is used to add a new link to the NetworkX graph.

hops = [k for k, g in groupby(elem.fields['as-path'].split(" "))]
if len(hops) > 1 and hops[0] == peer:
            origin = hops[-1]
            for i in range(0,len(hops)-1):
                as_graph.add_edge(hops[i],hops[i+1])
            bgp_lens[peer][origin] = min(filter(bool,[bgp_lens[peer][origin],len(hops)]))

Finally, for each peer and origin pair, the script uses the NetworkX utility functions to compute the length of the shortest path between the two nodes in the simple undirected graph. The output juxtaposes the minimum length observed in BGP and the shortest path computed in the simple undirected graph.

for peer in bgp_lens:
    for origin in bgp_lens[peer]:
       nxlen = len(nx.shortest_path(as_graph, peer, origin))
        print(peer, origin, bgp_lens[peer][origin], nxlen)

Complete Example

Get the code.

import pybgpstream
import networkx as nx
from collections import defaultdict
from itertools import groupby

# Create an instance of a simple undirected graph
as_graph = nx.Graph()

bgp_lens = defaultdict(lambda: defaultdict(lambda: None))

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc00"],
    record_type="ribs",
)

for rec in stream.records():
    for elem in rec:
        # Get the peer ASn
        peer = str(elem.peer_asn)
        # Get the array of ASns in the AS path and remove repeatedly prepended ASns
        hops = [k for k, g in groupby(elem.fields['as-path'].split(" "))]
        if len(hops) > 1 and hops[0] == peer:
            # Get the origin ASn
            origin = hops[-1]
            # Add new edges to the NetworkX graph
            for i in range(0,len(hops)-1):
                as_graph.add_edge(hops[i],hops[i+1])
            # Update the AS path length between 'peer' and 'origin'
            bgp_lens[peer][origin] = \
                min(list(filter(bool,[bgp_lens[peer][origin],len(hops)])))

# For each 'peer' and 'origin' pair
for peer in bgp_lens:
    for origin in bgp_lens[peer]:
        # compute the shortest path in the NetworkX graph
        nxlen = len(nx.shortest_path(as_graph, peer, origin))
        # and compare it to the BGP hop length
        print((peer, origin, bgp_lens[peer][origin], nxlen))

Studying the communities

In this example, we show how to use pybgpstream to extract information the prefixes that are associated with a specific type of communities. Specifically we use the bgpstream filtering options to select a specific set of prefixes of interest, as well as a specific peer ASn, and any message having at least one community with 3400 as value. The example is fully functioning and it can be run using the following command:

$ python pybgpstream-communities.py
Community: 2914:3400 ==> 185.84.167.0/24,185.84.166.0/24,185.84.166.0/23
Community: 2914:2406 ==> 185.84.167.0/24,185.84.166.0/24,185.84.166.0/23
Community: 2914:410 ==> 185.84.167.0/24,185.84.166.0/24,185.84.166.0/23
Community: 2914:3475 ==> 185.84.167.0/24,185.84.166.0/24,185.84.166.0/23
Community: 2914:1405 ==> 185.84.167.0/24,185.84.166.0/24,185.84.166.0/23

The program reads a RIB dump as originated by the RIS RRC06 collector, it selects messages originated by the 25152 peer that are associated with 185.84.166.0/23 (or more specifics), and have at least one community that has 3400 as value. The output complies with the following format:

Community: <ASn>:<value> ==> <prefixes affected by the community>

Step by step description

In this case the stream is configured to return the BGP records read from a RIBs dump generated by RIS RRC06 collector, having a timestamp in the interval 7:50:00 - 08:10:00 Sat, 01 Aug 2015 GMT. The elems are filtered considering three conditions: the originating peer AS number, the prefix announced, and the presence of at least one community having 3400 as value.

stream = pybgpstream.BGPStream(
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc06"],
    record_type="ribs",
    filter="peer 25152 and prefix more 185.84.166.0/23 and community *:3400"
)

A dictionary of sets maintains the list of prefixes affected by a specific community.

from collections import defaultdict

community_prefix = defaultdict(set)

Each time a new BGP elem is extracted, the program build a string with the ASn and value fields of each community, and add the prefix to the set.

pfx = elem.fields['prefix']
communities = elem.fields['communities']
for c in communities:
    community_prefix[c].add(pfx)

Finally, the dictionary is written to standard output.

for ct in community_prefix:
    print("Community:", ct, "==>", ",".join(community_prefix[ct]))

Complete Example

Get the code.

#!/usr/bin/env python

import pybgpstream
from collections import defaultdict

stream = pybgpstream.BGPStream(
    # Consider this time interval:
    # Sat, 01 Aug 2015 7:50:00 GMT -  08:10:00 GMT
    from_time="2015-08-01 07:50:00", until_time="2015-08-01 08:10:00",
    collectors=["rrc06"],
    record_type="ribs",
    filter="peer 25152 and prefix more 185.84.166.0/23 and community *:3400"
)

# <community, prefix > dictionary
community_prefix = defaultdict(set)

# Get next record
for rec in stream.records():
    for elem in rec:
        # Get the prefix
        pfx = elem.fields['prefix']
        # Get the associated communities
        communities = elem.fields['communities']
        # for each community save the set of prefixes
        # that are affected
        for c in communities:
            community_prefix[c].add(pfx)

# Print the list of MOAS prefix and their origin ASns
for ct in community_prefix:
    print("Community:", ct, "==>", ",".join(community_prefix[ct]))

Accessing live stream data sources

In this example, we show how to use pybgpstream to access live data streams from Route Views and RIPE RIS. The example programs print out real-time BGP updates received from Route Views BMP Stream (routeivews-stream) and RIPE RIS Live (ris-live).

Accessing these live stream data sources is as simple as setting the project or projects field to routeviews-stream or ris-live when initiating a BGPStream object in your script.

Route Views Stream

Route Views Stream code.

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    # accessing routeview-stream
    project="routeviews-stream",
    # filter to show only stream from amsix bmp stream
    filter="router amsix",
)

for elem in stream:
    print(elem)

RIPE RIS Live

RIPE RIS Live code.

#!/usr/bin/env python

import pybgpstream
stream = pybgpstream.BGPStream(
    # accessing ris-live
    project="ris-live",
    # filter to show only stream from rrc00
    filter="collector rrc00",
)

for elem in stream:
    print(elem)