The following are a few data encoding practices we apply in the design and implementation of libBGPStream and PyBGPStream.
Each AS path contains a list of AS path segment separated by space.
Each AS Path segment is represented in following String format:
BGPSTREAM_AS_PATH_SEG_ASN), then the string
will be the decimal representation of the ASN (not dotted-decimal).BGPSTREAM_AS_PATH_SEG_SET), then the string
will be a comma-separated list of ASNs, enclosed in braces. E.g.,
"{12345,6789}".BGPSTREAM_AS_PATH_SEG_CONFED_SET), then the string will be a
comma-separated list of ASNs, enclosed in brackets. E.g., "[12345,6789]".BGPSTREAM_AS_PATH_SEG_CONFED_SEQ), then the string will be a
space-separated list of ASNs, enclosed in parentheses.
E.g., "(12345 6789)".Note: it is possible to have a set/sequence with only a single element.
IP prefix is represented normally as NETWORK_ADDR/MASK for both IPv4 and IPv6 prefixes.
For example, you may see announcements from Google with prefixes of 8.8.8.0/24 or 2001:4860::/32.
Community values (see RFC1997 and RFC8642) in BGP announcements are represented by a number of community value segment separated by space.
Each community value segment is represented as ASN:VALUE, where ASN is the AS number of the
AS that originally set the community value, and VALUE is the actual community value.
Both ASN and VALUE are represented as 16 bit numbers.
For example, a community value 10000:65535 means AS10000 originally set NO_EXPORT community value,
and this update should only be propagated internally within the destination AS via iBGP.
In BGPStream we have two types of records: RIB (BGP RIB table dump) and UPDATE (BGP update), represented as the following:
R: BGP RIB dumpU: BGP updateFor BGP RIB dump, we also represent the position of the dump as:
B: start of dumpM: middle of dumpE: end of dumpEach record has a status, represented as:
V: valid recordF: filtered sourceE: empty sourceO: outside time intervalS: corrupted sourceR: corrupted recordU: unsupported recordIn most of the cases, you will likely to see valid record (V) in your stream.
For example, from the bgpreader tutorial, you can see the following output:
$ bgpreader -w 1445306400,1445306402 -c route-views.sfmix -r
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|B|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
R|M|1445306400.000000|routeviews|route-views.sfmix|||V|1445306400
This command outputs all the records collected by route-views.sfmix between 1445306400 and 1445306402.
The first column shows the record type, which is a RIB dump (R).
The second column shows the position of the record in the resource, which we see it starts as start (B), and the middle (M).
The second to last column shows the status of the record, which all are valid (V) in this example .
Each record may contain multiple elements. For example, an BGP update message may contain announcements, withdrawals all in the same message.
Each element can be of the following type:
R: RIB table entryA: prefix announcementW: prefix withdrawalS: peer state changeA peer of a BGP route collectors can have one of the following states:
IDLECONNECTACTIVEOPENSENTOPENCONFIRMESTABLISHEDCLEARINGDELETEDFor a peer state update, BGPStream shows both the new and old states, represented as above.
Each resource used in BGPStream can be identified by the following unique string representation:
PROJECT.COLLECTOR.RECORD_TYPE.INITIAL_TIME.DURATION.
PROJECT: project of the resource (e.g. routeviews or rrc)COLLECTOR: name of the collector (e.g. rrc02)RECORD_TYPE: the type of records this resource contains ribs or updatesINITIAL_TIME: the start time of the resource, represented by the Unix-time in integerDURATION: the duration of data this resource includes, represented by number of seconds