gtk-gnutella Binary Format in GGEP Extensions
=============================================

Date: 2002-06-09
Status: Draft, UNUSED (2011-05-06)

GGEP does not define the format of each extension.

This memo describes the binary format used by gtk-gnutella to encode its
various extensions (held within a GGEP payload).

The aim of this format is to retain the compactness of the binary format,
yet allow servents to selectively implement what they want, allowing them
to skip the parts they don't understand.

The extension describes a set of properties.  Each property is identified
by an architecturally-defined numerical ID.  The property value follows
the ID.  In order to be skippable, the property length is also described.
Since a servent may not know about an ID, the length is not implied by the
ID but is explicited.

The scope of each ID is limited to a given extension, i.e. two different GGEP
extensions may use the same numerical ID for some of their properties.

Formally, the extension is a sequence of:

    <ID> <length> <field value..>
    <-- 1 byte -> <-- n bytes -->

The leading byte is architected as:

    bits 7-3    ID
    bit 2-0     length

Valid ID values in that byte range from 1 to 31.  We'll call this value
the relative ID value.

ID 0 is special and enables segmentation (in order to be able to code more
than 31 fields).  ID 0 has no attached value, and the length bits indicate
the segment number (from 0 to 7), each segment defining a 31-slot range.
Subsequent IDs are then offsetted by that amount, so if a segment of 2 is
seen, then a following relative ID of 5 is really 31*2+5, an absolute ID of 67.

Segments are active until a new segment change is seen in the extension, using
a scanning from the start to the end of the extension.

Here are the absolute ID ranges, depending on the segment:

    Segment #    Range
        0       001-031
        1       032-062
        2       063-093
        3       094-124
        4       125-155
        5       156-186
        6       187-217
        7       218-248

Implicitly, at the beginning of the extension, the segment #0 is active (i.e
a leading NUL byte is implied, but not physically present).

All IDs belonging to the same segment should be groupped together, to avoid
the overhead of switching segments.  Segments may appear in any order.
A given ID may appear more than once, if it makes sense repeating it (with
different values each time, most probably).

For IDs other than 0, the length bits are interpreted thusly:

    0           NUL-terminated value
    1           1 byte value
    2           2-byte value
    3           3-byte value
    4           4-byte value
    5           8-byte value
    6           explicit length (next byte gives length)
    7           RESERVED

All extensions using this binary format also use big-endian to store
their numbers.  For boolean values: TRUE is encoded as 0x1, FALSE as 0x2
(hence bit 0 is really the boolean value, whilst preventing a NUL byte
that could trigger COBS encoding).

Note that when a field length says "1 byte value", nothing is said about
the semantics of that byte.  It could be a boolean, a signed value or an
unsigned value.  Its interpretation depends on the ID: it is architecturally
defined and immutable for a given ID.

Example:

We want to encode the following properties (absolute IDs given):

    ID=4, boolean, 1-byte long, value=FALSE
    ID=28, timestamp, 4-byte long, value=1023567521
    ID=55, unsigned value, 1-byte long, value=0x32
    ID=89, string, value="sample"

In our first representation, we'll show <ID> <length> as <4:1>, meaning ID=4
and length = 1, which would be coded as "00100 001" in binary, or 0x21 hexa.

Recall that we begin the extension being implictly in segment #0, so 4 and
28 belong to that segment (covers IDs 1 to 31). Segment #1 holds ID values
from 32 to 62, therefore ID=55 is within segment #1.  Segement #2 holds
IDs from 63 to 93, hence holds ID=89.

It follows the following representation:

    <4:1> 0x02                          ; we're in segment #0
    <28:4> 1023567521 = 0x3d0266a1      ; UNIX timestamp
    <0:1>                               ; entering segment #1
    <24:1> 0x32                         ; 55 = 1*31+24
    <0:2>                               ; entering segment #2
    <27:0> "sample" 0x00                ; 89 = 2*31+27

The string "sample" is coded thusly:

    's' = 0x73
    'a' = 0x61
    'm' = 0x6d
    'p' = 0x70
    'l' = 0x6c
    'e' = 0x65
    NUL = 0x00

The final binary representation is therefore:

    0x21 0x02
    0xe4 0x3d 0x02 0x66 0xa1
    0x01
    0xc1 0x32
    0x02
    0xd8 0x73 0x61 0x6d 0x70 0x6c 0x65 0x00

Total payload length: 19 bytes.
