//// WireProto Specification © 2024 by Brent Saner is licensed under Creative Commons Attribution-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/ //// = WireProto Specification Brent Saner Last rendered {localdatetime} :doctype: book :docinfo: shared :data-uri: :imagesdir: images :sectlinks: :sectnums: :sectnumlevels: 7 :toc: preamble :toc2: left :idprefix: :toclevels: 7 :source-highlighter: rouge :docinfo: shared :this_protover: 1 :this_protover_hex: 0x00000001 :lib_ver: master :lib_ver_ref: branch [id="license"] == License ++++ include::LICENSE.html[] ++++ In a nutshell, means you can: * Use it in commercial/proprietary/internal works... * Expand upon/change the specification... ** (As long as it is released under the same Creative Commons license) As long as you attribute the original (this document). This can be as simple as something like: ==== Based on WireProto version as found at https://wireproto.io/. ==== More details certainly helps, though; you may want to mention the exact date you "forked" it, etc. Please see the full text as collapsed above or https://creativecommons.org/licenses/by-sa/4.0/legalcode.en[the online version^] of the license for full legal copy. NOTE: In the event of the embedded text in this document differing from the online version, the online version is assumed to take precedence as the valid license applicable to this work. [id="proto"] == Protocol The WireProto data packing API is a custom wire protocol//message format designed for incredibly performant, unambiguous, predictable, platform-agnostic, client-agnostic communication. It is based heavily on the https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.key[OpenSSH "v1" key format^] https://git.r00t2.io/r00t2/go_sshkeys/src/branch/master/_ref/KEY_GUIDE.html#v1_plain_2[(example/details)] packing method. It supports arbitrary binary values, which means they can be anything according to the implementation-specific details; a common practice is to encode ("marshal") a Go struct to JSON bytes, and set that as a WireProto field's value. It supports both static construction/parsing/dissection and stream approaches in a single format, as well as multiple commands per request message/multiple answers per response message. *All* packed uint32 values are big-endian. This specification <> is `{this_protover}` (`{this_protover_hex}`). [id="lib"] === Library This protocol specification is accompanied with a reference library for Golang, https://git.r00t2.io/r00t2/go_wireproto["WireProto"^] (https://git.r00t2.io/r00t2/wireproto[_source_^]): ++++ Go Reference ++++ [id="ytho"] === Why a Custom Message Format? Because existing ones (e.g. JSON, XML, YAML) are slow/bloaty, inaccurate, and/or inflexible. They struggle with binary or abritrary data (or in e.g. XML's case requiring intermediate conditional encoding/decoding). If it can be represented as bytes (which all digital data can), WireProto can send and receive it. Additionally: * https://protobuf.dev/[*Protobuf*^] has performance issues (yes, really; protobufs have large overhead) and is restrictive on data types for future-proofing. * https://go.dev/blog/gob[*Gob*^] is very language-limiting and does not support e.g. nil pointers and cyclical values. * https://capnproto.org/[Cap'n Proto^] has wide language support and excellent performance but is terribly non-idiomatic, requiring the code to be generated from the schema and not vice versa (which is only ideal if you have only one communication interface). * https://en.wikipedia.org/wiki/JSON_streaming[JSON streams^] have no delimiters defined, and thus this makes it an inconvenience if using a parser that does not know when the message ends/is complete, or if it is expecting a standalone JSON object. [TIP] ==== WireProto is only used for binary packing/unpacking; this means it can be used with any e.g. https://pkg.go.dev/net#Conn[`net.Conn`^] (and even has helper functions explicitly to facilitate this), storage on-disk, etc. Thus it is transport/storage-agnostic, and can be used with a https://pkg.go.dev/net#Dial[TCP socket, UDP socket, IPC (InterProcess Communication)/UDS (UNIX Domain Socket) handle,^] https://pkg.go.dev/crypto/tls#Dial[TLS-tunneled TCP socket^], etc. ==== [id="msg"] == Message Format [TIP] ==== Throughout this document, you may see references to things like `LF`, `SOH`, and so forth. These refer to _ASCII control characters_. You will also see many values represented in hex. You can find more details about this (along with a full ASCII reference) https://square-r00t.net/ascii.html[here^]. Note that the socket API fully supports UTF-8 -- just be sure that your <> are aligned to the byte count, not character count. ==== Each *message* is generally composed of: * The <>footnote:responly[Response messages only.] * A <>footnote:optclient[Optional for Request.]footnote:reqsrv[Required for Response.] * A <> * A <> * A <> <> * A <> <> * A <> * One (or more) <>(s), each of which contain: ** One (or more) <>(s), each of which contain: *** One (or more) <>(s), each of which contain: **** A <> **** A <> **** A <>footnote:responly[] * A <> * A <> [id="msg_respstatus"] === Response Status For responses, their messages have an additional byte prepended; a status indicator. This allows client programs to quickly bail in the case of an error if no further parsing is desired. The status will be indicated by one of <>: an ASCII `ACK` (`0x06`) for all requests being returned successfully or an ASCII `NAK` (`0x15`) if one or more errors were encountered across all records. [id="proto_ver"] === Protocol Version The protocol version is a packed uint32 that denotes which version of this protocol specification is being used. It is maintained seperately from the *library* version/repo tags. The current protocol version (as demonstrated in this document) is `{this_protover}` (`{this_protover_hex}`). NOTE: Version `0` is reserved for current `HEAD` of the `master` branch of this specification and should be considered experimental. [id="msg_grp"] === Record Group A record group contains multiple related <>. It is common to only have a single Record Group. Its structure is: . <> <> . <> <> . One (or more) <> [id="msg_grp_rec"] ==== Record A record contains multiple related <>. It is typical to only have a single Record. Its structure is: . <> <> . <> <> . One (or more) <> [IMPORTANT] ==== For response messages, the record's size allocator (but NOT the count allocator) includes the <> size for each response record copy!footnote:responly[] ==== [id="msg_grp_rec_kv"] ===== Field/Value Pair (Key/Value Pair) A field/value pair (also referred to as a key/value pair) contains a matched <> and its <>. Its structure is: . <> <> . <> <> . A single <> . A single matching <> [IMPORTANT] ==== Unlike most/all other <> for other sections/levels, the field name and value allocators are consecutive <>! This is because there is only one field name and value per record. ==== [id="msg_grp_rec_kv_nm"] ====== Field Name The field name is usually from a finite set of allowed names. The <>, while written as bytes, often contains a data structure defined by the field name. (A field name is closer to a "value type".) It *must* be a UTF-8 string. Its structure is: . The name in bytes [id="msg_grp_rec_kv_val"] ====== Field Value A field's value is, on the wire, just a series of bytes. The actual content of those bytes, including any structure or encoding, is likely to/probably depends on the paired <>. Its structure is: . The value in bytes [id="msg_grp_recresp"] ===== Copy Record (Response Copy of Request) This contains a "copy" of the original/request's <> that this record is in response to. It is a variant of a <> used exclusively in responses, and is tied to (included in) each response's <>. Its structure is: . <> <> . <> <> . One (or more) <> [id="msg_grp_rec_kvcpy"] ====== Field/Value Pair (Key/Value Pair) (Response Copy) A field/value pair (also referred to as a key/value pair) contains a matched <> and its <>. It is a variant of a <> used exclusively in response copies of the original request's FVP. Its structure is: . <> <> . <> <> . A single <> . A single matching <> [IMPORTANT] ==== Unlike most/all other <> for other sections/levels, the field name and value allocators are consecutive <>! This is because there is only one field name and value per record. ==== [id="cksum"] == Checksums Checksums are optional for the client but the server will *always* send them. *If present* in the request, the server will validate to ensure the checksum matches the message body (<> to <>, headers included). If the checksum does not match, an error will be returned. They are represented as a big-endian-packed uint32. The checksum must be prefixed with a <>. If no checksum is provided, this prefix must *not* be included in the sequence. [TIP] ==== You can quickly check if a checksum is present by checking the first byte in requests or the second byte in responses. If it is `ESC` (`0x1b`), a checksum is provided. If it is `SOH` (`0x01`), one was *not* provided. ==== The checksum method used is the https://users.ece.cmu.edu/~koopman/crc/crc32.html[IEEE 802.3 CRC-32^], which should be natively available for all/most client implementations as it is perhaps the most ubiquitous of CRC-32 variants. (Polynomial `0x04c11db7`, reversed polynomial `0xedb88320`.) To confirm you are using the correct CRC32 implementation (as there are a *ton* of "CRC-32" algorithms and methods out there), use the following validations: .CRC-32 Validations [cols="^.^2m,3m,^.^1m,^.^2m,^.^2m",options="header"] |=== | String ^.^| Bytes | Checksum (integer) | Checksum (bytes, little-endian) | Checksum (bytes, big-endian) | FooBarBazQuux | 0x466f6f42617242617a51757578 | 983022564 | 0xe4bb973a | 0x3a97bbe4 | 0123456789abcdef | 0x30313233343536373839616263646566 | 1757737011 | 0x33f0c468 | 0x68c4f033 |=== [id="hdrs"] == Headers Certain sections are wrapped with an identifying header. Those headers are included below for reference. [id="hdrs_respstart"] === `RESPSTART` Byte Sequence Responses have a <>.footnote:responly[] It is either an `ACK` (`0x06`) or `NAK` (`0x15`). [id="hdrs_cksum"] === `CKSUM` Header Prefix A checksum, if provided, will have a prefix header of `ESC` (`0x1b`). [id="hdrs_msgstart"] === `MSGSTART` Header Prefix The message start header indicates a start of a message. It is an `SOH` (`0x01`). [id="hdrs_bodystart"] === `BODYSTART` Header Prefix The body start header indicates that actual data/records follows. It is an `STX` (`0x02`). [id="hdrs_bodyend"] === `BODYEND` Sequence The body end prefix indicates the end of data/records. It is an `ETX` (`0x03`). [id="hdrs_msgend"] === `MSGEND` Sequence The message end prefix indicates that a message in its entirety has ended. It is an `EOT` (`0x04`). [id="alloc"] == Allocators There are two type of allocators included for each following sequence of bytes: `count allocators` and `size allocators`. They can be used by clients to determine the size of destination buffers, and are used by the server to efficiently unpack requests. They are usually paired together with the count allocator preceding the size allocator, but not always (e.g. <> have two <>). All allocators are unsigned 32-bit integers, little-endian-packed. [id="alloc_cnt"] === Count Allocator Count allocators indicate *how many* children objects are contained. [id="alloc_size"] === Size Allocator Size allocators indicate *how much* (in bytes) all children objects are combined together. It includes e.g. separators, etc. [id="ref"] == Reference Model and Examples For a more visual explanation, given the following e.g. Golang structs from the https://pkg.go.dev/r00t2.io/wireproto[Golang reference library^] (`wireproto.Request{}` and `wireproto.Response{}`): [id="ref_single"] === Single/Simple [id="ref_single_req"] ==== Single/Simple Request [%collapsible] .Example Message Structure (Simple Request) ==== [source,go] ---- include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_simple_req.go[] ---- ==== Would then serialize as (in hex): [%collapsible] .Annotated Hex ==== [source,text] ---- include::docs/data/request.simple.txt[] ---- ==== Or, non-annotated: [source,text] ---- include::docs/data/request.simple.hex[] ---- [id="ref_single_resp"] ==== Single/Simple Response [%collapsible] .Example Message Structure (Simple Response) ==== [source,go] ---- include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_simple_resp.go[] ---- ==== Would then serialize as (in hex): [%collapsible] .Annotated Hex ==== [source,text] ---- include::docs/data/response.simple.txt[] ---- ==== Or, non-annotated: [source,text] ---- include::docs/data/response.simple.hex[] ---- [id="ref_multi"] === Multiple/Many/Complex Multiple commands, parameters, etc. can be specified in one message. [id="ref_multi_req"] ==== Complex Request [%collapsible] .Example Message Structure (Multiple/Many Requests, Single Message) ==== [source,go] ---- include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_multi_req.go[] ---- ==== Would then serialize as (in hex): [%collapsible] .Annotated Hex ==== [source,text] ---- include::docs/data/request.multi.txt[] ---- ==== Or, non-annotated: [source,text] ---- include::docs/data/request.multi.hex[] ---- [id="ref_multi_resp"] ==== Complex Response [%collapsible] .Example Message Structure (Response to Multiple/Many Requests, Single Message) ==== [source,go] ---- include::https://git.r00t2.io/r00t2/go_wireproto/raw/{lib_ver_ref}/{lib_ver}/test_obj_multi_resp.go[] ---- ==== Would then serialize as (in hex): [%collapsible] .Annotated Hex ==== [source,text] ---- include::docs/data/response.multi.txt[] ---- ==== Or, non-annotated: [source,text] ---- include::docs/data/response.multi.hex[] ----