lighthouse-pulse/beacon_chain/utils/ssz/README.md
Paul Hauner 0fbe4179b3
Heavily restructure repo
Separate most modules into crates
2018-10-02 16:41:10 +10:00

544 lines
15 KiB
Markdown

# simpleserialize (ssz) [WIP]
This is currently a ***Work In Progress*** crate.
SimpleSerialize is a serialization protocol described by Vitalik Buterin. The
method is tentatively intended for use in the Ethereum Beacon Chain as
described in the [Ethereum 2.1 Spec](https://notes.ethereum.org/s/Syj3QZSxm).
The Beacon Chain specification is the core, canonical specification which we
are following.
The current reference implementation has been described in the [Beacon Chain
Repository](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py).
*Please Note: This implementation is presently a placeholder until the final
spec is decided.*\
*Do not rely upon it for reference.*
## Table of Contents
* [SimpleSerialize Overview](#simpleserialize-overview)
+ [Serialize/Encode](#serializeencode)
- [int or uint: 8/16/24/32/64/256](#int-or-uint-816243264256)
- [Address](#address)
- [Hash32](#hash32)
- [Bytes](#bytes)
- [List](#list)
+ [Deserialize/Decode](#deserializedecode)
- [Int or Uint: 8/16/24/32/64/256](#int-or-uint-816243264256)
- [Address](#address-1)
- [Hash32](#hash32-1)
- [Bytes](#bytes-1)
- [List](#list-1)
* [Technical Overview](#technical-overview)
* [Building](#building)
+ [Installing Rust](#installing-rust)
* [Dependencies](#dependencies)
+ [bytes v0.4.9](#bytes-v049)
+ [ethereum-types](#ethereum-types)
* [Interface](#interface)
+ [Encodable](#encodable)
+ [Decodable](#decodable)
+ [SszStream](#sszstream)
- [new()](#new)
- [append(&mut self, value: &E) -> &mut Self](#appendmut-self-value-e---mut-self)
- [append_encoded_val(&mut self, vec: &Vec)](#append_encoded_valmut-self-vec-vec)
- [append_vec(&mut self, vec: &Vec)](#append_vecmut-self-vec-vec)
- [drain(self) -> Vec](#drainself---vec)
+ [decode_ssz(ssz_bytes: &(u8), index: usize) -> Result](#decode_sszssz_bytes-u8-index-usize---resultt-usize-decodeerror)
+ [decode_ssz_list(ssz_bytes: &(u8), index: usize) -> Result, usize), DecodeError>](#decode_ssz_listssz_bytes-u8-index-usize---resultvec-usize-decodeerror)
+ [decode_length(bytes: &(u8), index: usize, length_bytes: usize) -> Result](#decode_lengthbytes-u8-index-usize-length_bytes-usize---resultusize-decodeerror)
* [Usage](#usage)
+ [Serializing/Encoding](#serializingencoding)
- [Rust](#rust)
* [Deserializing/Decoding](#deserializingdecoding)
- [Rust](#rust-1)
---
## SimpleSerialize Overview
The ``simpleserialize`` method for serialization follows simple byte conversion,
making it effective and efficient for encoding and decoding.
The decoding requires knowledge of the data **type** and the order of the
serialization.
Syntax:
| Shorthand | Meaning |
|:-------------|:----------------------------------------------------|
| `big` | ``big endian`` |
| `to_bytes` | convert to bytes. Params: ``(size, byte order)`` |
| `from_bytes` | convert from bytes. Params: ``(bytes, byte order)`` |
| `value` | the value to serialize |
| `rawbytes` | raw encoded/serialized bytes |
| `len(value)` | get the length of the value. (number of bytes etc) |
### Serialize/Encode
#### int or uint: 8/16/24/32/64/256
Convert directly to bytes the size of the int. (e.g. ``int16 = 2 bytes``)
All integers are serialized as **big endian**.
| Check to perform | Code |
|:-----------------------|:------------------------|
| Int size is not 0 | ``int_size > 0`` |
| Size is a byte integer | ``int_size % 8 == 0`` |
| Value is less than max | ``2**int_size > value`` |
```python
buffer_size = int_size / 8
return value.to_bytes(buffer_size, 'big')
```
#### Address
The address should already come as a hash/byte format. Ensure that length is
**20**.
| Check to perform | Code |
|:-----------------------|:---------------------|
| Length is correct (20) | ``len(value) == 20`` |
```python
assert( len(value) == 20 )
return value
```
#### Hash32
The hash32 should already be a 32 byte length serialized data format. The safety
check ensures the 32 byte length is satisfied.
| Check to perform | Code |
|:-----------------------|:---------------------|
| Length is correct (32) | ``len(value) == 32`` |
```python
assert( len(value) == 32 )
return value
```
#### Bytes
For general `byte` type:
1. Get the length/number of bytes; Encode into a 4 byte integer.
2. Append the value to the length and return: ``[ length_bytes ] + [
value_bytes ]``
```python
byte_length = (len(value)).to_bytes(4, 'big')
return byte_length + value
```
#### List
For lists of values, get the length of the list and then serialize the value
of each item in the list:
1. For each item in list:
1. serialize.
2. append to string.
2. Get size of serialized string. Encode into a 4 byte integer.
```python
serialized_list_string = ''
for item in value:
serialized_list_string += serialize(item)
serialized_len = len(serialized_list_string)
return serialized_len + serialized_list_string
```
### Deserialize/Decode
The decoding requires knowledge of the type of the item to be decoded. When
performing decoding on an entire serialized string, it also requires knowledge
of what order the objects have been serialized in.
Note: Each return will provide ``deserialized_object, new_index`` keeping track
of the new index.
At each step, the following checks should be made:
| Check Type | Check |
|:-------------------------|:----------------------------------------------------------|
| Ensure sufficient length | ``length(rawbytes) > current_index + deserialize_length`` |
#### Int or Uint: 8/16/24/32/64/256
Convert directly from bytes into integer utilising the number of bytes the same
size as the integer length. (e.g. ``int16 == 2 bytes``)
All integers are interpreted as **big endian**.
```python
byte_length = int_size / 8
new_index = current_index + int_size
return int.from_bytes(rawbytes[current_index:current_index+int_size], 'big'), new_index
```
#### Address
Return the 20 bytes.
```python
new_index = current_index + 20
return rawbytes[current_index:current_index+20], new_index
```
#### Hash32
Return the 32 bytes.
```python
new_index = current_index + 32
return rawbytes[current_index:current_index+32], new_index
```
#### Bytes
Get the length of the bytes, return the bytes.
```python
bytes_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big')
new_index = current_index + 4 + bytes_lenth
return rawbytes[current_index+4:current_index+4+bytes_length], new_index
```
#### List
Deserailize each object in the list.
1. Get the length of the serialized list.
2. Loop through deseralizing each item in the list until you reach the
entire length of the list.
| Check type | code |
|:------------------------------------|:--------------------------------------|
| rawbytes has enough left for length | ``len(rawbytes) > current_index + 4`` |
```python
total_length = int.from_bytes(rawbytes[current_index:current_index+4], 'big')
new_index = current_index + 4 + total_length
item_index = current_index + 4
deserialized_list = []
while item_index < new_index:
object, item_index = deserialize(rawbytes, item_index, item_type)
deserialized_list.append(object)
return deserialized_list, new_index
```
## Technical Overview
The SimpleSerialize is a simple method for serializing objects for use in the
Ethereum beacon chain proposed by Vitalik Buterin. There are currently two
implementations denoting the functionality, the [Reference
Implementation](https://github.com/ethereum/beacon_chain/blob/master/ssz/ssz.py)
and the [Module](https://github.com/ethereum/research/tree/master/py_ssz) in
Ethereum research. It is being developed as a crate for the [**Rust programming
language**](https://www.rust-lang.org).
The crate will provide the functionality to serialize several types in
accordance with the spec and provide a serialized stream of bytes.
## Building
ssz currently builds on **rust v1.27.1**
### Installing Rust
The [**Rustup**](https://rustup.rs/) tool provides functionality to easily
manage rust on your local instance. It is a recommended method for installing
rust.
Installing on Linux or OSX:
```bash
curl https://sh.rustup.rs -sSf | sh
```
Installing on Windows:
* 32 Bit: [ https://win.rustup.rs/i686 ](https://win.rustup.rs/i686)
* 64 Bit: [ https://win.rustup.rs/x86_64 ](https://win.rustup.rs/x86_64)
## Dependencies
All dependencies are listed in the ``Cargo.toml`` file.
To build and install all related dependencies:
```bash
cargo build
```
### bytes v0.4.9
The `bytes` crate provides effective Byte Buffer implementations and
interfaces.
Documentation: [ https://docs.rs/bytes/0.4.9/bytes/ ](https://docs.rs/bytes/0.4.9/bytes/)
### ethereum-types
The `ethereum-types` provide primitives for types that are commonly used in the
ethereum protocol. This crate is provided by [Parity](https://www.parity.io/).
Github: [ https://github.com/paritytech/primitives ](https://github.com/paritytech/primitives)
---
## Interface
### Encodable
A type is **Encodable** if it has a valid ``ssz_append`` function. This is
used to ensure that the object/type can be serialized.
```rust
pub trait Encodable {
fn ssz_append(&self, s: &mut SszStream);
}
```
### Decodable
A type is **Decodable** if it has a valid ``ssz_decode`` function. This is
used to ensure the object is deserializable.
```rust
pub trait Decodable: Sized {
fn ssz_decode(bytes: &[u8], index: usize) -> Result<(Self, usize), DecodeError>;
}
```
### SszStream
The main implementation is the `SszStream` struct. The struct contains a
buffer of bytes, a Vector of `uint8`.
#### new()
Create a new, empty instance of the SszStream.
**Example**
```rust
let mut ssz = SszStream::new()
```
#### append<E>(&mut self, value: &E) -> &mut Self
Appends a value that can be encoded into the stream.
| Parameter | Description |
|:---------:|:-----------------------------------------|
| ``value`` | Encodable value to append to the stream. |
**Example**
```rust
ssz.append(&x)
```
#### append_encoded_val(&mut self, vec: &Vec<u8>)
Appends some ssz encoded bytes to the stream.
| Parameter | Description |
|:---------:|:----------------------------------|
| ``vec`` | A vector of serialized ssz bytes. |
**Example**
```rust
let mut a = [0, 1];
ssz.append_encoded_val(&a.to_vec());
```
#### append_vec<E>(&mut self, vec: &Vec<E>)
Appends some vector (list) of encodable values to the stream.
| Parameter | Description |
|:---------:|:----------------------------------------------|
| ``vec`` | Vector of Encodable objects to be serialized. |
**Example**
```rust
ssz.append_vec(attestations);
```
#### drain(self) -> Vec<u8>
Consumes the ssz stream and returns the buffer of bytes.
**Example**
```rust
ssz.drain()
```
### decode_ssz<T>(ssz_bytes: &[u8], index: usize) -> Result<(T, usize), DecodeError>
Decodes a single ssz serialized value of type `T`. Note: `T` must be decodable.
| Parameter | Description |
|:-------------:|:------------------------------------|
| ``ssz_bytes`` | Serialized list of bytes. |
| ``index`` | Starting index to deserialize from. |
**Returns**
| Return Value | Description |
|:-------------------:|:----------------------------------------------|
| ``Tuple(T, usize)`` | Returns the tuple of the type and next index. |
| ``DecodeError`` | Error if the decoding could not be performed. |
**Example**
```rust
let res: Result<(u16, usize), DecodeError> = decode_ssz(&encoded_ssz, 0);
```
### decode_ssz_list<T>(ssz_bytes: &[u8], index: usize) -> Result<(Vec<T>, usize), DecodeError>
Decodes a list of serialized values into a vector.
| Parameter | Description |
|:-------------:|:------------------------------------|
| ``ssz_bytes`` | Serialized list of bytes. |
| ``index`` | Starting index to deserialize from. |
**Returns**
| Return Value | Description |
|:------------------------:|:----------------------------------------------|
| ``Tuple(Vec<T>, usize)`` | Returns the tuple of the type and next index. |
| ``DecodeError`` | Error if the decoding could not be performed. |
**Example**
```rust
let decoded: Result<(Vec<usize>, usize), DecodeError> = decode_ssz_list( &encoded_ssz, 0);
```
### decode_length(bytes: &[u8], index: usize, length_bytes: usize) -> Result<usize, DecodeError>
Deserializes the "length" value in the serialized bytes from the index. The
length of bytes is given (usually 4 stated in the reference implementation) and
is often the value appended to the list infront of the actual serialized
object.
| Parameter | Description |
|:----------------:|:-------------------------------------------|
| ``bytes`` | Serialized list of bytes. |
| ``index`` | Starting index to deserialize from. |
| ``length_bytes`` | Number of bytes to deserialize into usize. |
**Returns**
| Return Value | Description |
|:---------------:|:-----------------------------------------------------------|
| ``usize`` | The length of the serialized object following this length. |
| ``DecodeError`` | Error if the decoding could not be performed. |
**Example**
```rust
let length_of_serialized: Result<usize, DecodeError> = decode_length(&encoded, 0, 4);
```
---
## Usage
### Serializing/Encoding
#### Rust
Create the `simpleserialize` stream that will produce the serialized objects.
```rust
let mut ssz = SszStream::new();
```
Encode the values that you need by using the ``append(..)`` method on the `SszStream`.
The **append** function is how the value gets serialized.
```rust
let x: u64 = 1 << 32;
ssz.append(&x);
```
To get the serialized byte vector use ``drain()`` on the `SszStream`.
```rust
ssz.drain()
```
**Example**
```rust
// 1 << 32 = 4294967296;
// As bytes it should equal: [0,0,0,1,0,0,0]
let x: u64 = 1 << 32;
// Create the new ssz stream
let mut ssz = SszStream::new();
// Serialize x
ssz.append(&x);
// Check that it is correct.
assert_eq!(ssz.drain(), vec![0,0,0,1,0,0,0]);
```
## Deserializing/Decoding
#### Rust
From the `simpleserialize` bytes, we are converting to the object.
```rust
let ssz = vec![0, 0, 8, 255, 255, 255, 255, 255, 255, 255, 255];
// Returns the result and the next index to decode.
let (result, index): (u64, usize) = decode_ssz(&ssz, 3).unwrap();
// Check for correctness
// 2**64-1 = 18446744073709551615
assert_eq!(result, 18446744073709551615);
// Index = 3 (initial index) + 8 (8 byte int) = 11
assert_eq!(index, 11);
```
Decoding a list of items:
```rust
// Encoded/Serialized list with junk numbers at the front
let serialized_list = vec![ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 32, 0, 0, 0,
0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 15, 0, 0, 0, 0,
0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 15];
// Returns the result (Vector of usize) and the index of the next
let decoded: (Vec<usize>, usize) = decode_ssz_list(&serialized_list, 10).unwrap();
// Check for correctness
assert_eq!(decoded.0, vec![15,15,15,15]);
assert_eq!(decoded.1, 46);
```