Skip to content

Encoding scheme

An encoding scheme is a mapping between structured data and a series of bytes, uniquely defined by the pair of serialization and deserialization functions.

Serialization

The serialize function serializes a data value into a bytestring.

type Serialize = DataValue -> Bytes

Deserialization

The deserialize function attempts to deserialize a bytestring into a data value of the specified type.

type Deserialize = DataType -> Bytestring -> Maybe DataValue

Properties

  • The serialize function must be injective (distinct data values have distinct encodings).

  • The deserialize function must be surjective (any data value can be represented by a serialized value)

  • Fixing a type, the deserialize function must be injective (no two serialized values deserialize to the same data value). An injective mapping without fixing a type can be achieved simply by also serializing the type.

Multiformat

The protocol standardizes a table of encoding schemes, where each encoding scheme is associated with a unique natural number.

Nodes running the protocol then associate each number with a pair of serialialize and deserialize functions. In order to interoperate correctly, nodes must agree on which number is associated with which encoding scheme, so this table is part of the definition of any particular protocol version, and new entries to the table, once added, cannot be changed. In general, adding new entries to the table should not break anything - a node encountering an encoding scheme it does not know simply fails.

The concrete table is provided in the Implementation section of the specs.

(Wiki) links on this page