Ordering Machine¶
Introduction¶
Purpose¶
The ordering machine is a set of communicating engines that collaborate in
-
receiving transaction candidates from users or solvers
-
updated the state accordingly,
-
making the state available.
Background¶
In full generality, Anoma nodes form a distributed system that implements a transaction processing system (TPS) and a replicated state machine (RSM). This state machine must represent all anoma state, including commitments and nullifiers. The Ordering Machine, however, does not need to understand the full complexity of the anoma state: only that it can be represented as a Key Value Store, where transaction candidates each read some (arbitrary) data from keys, and can write some (arbitrary) data to keys. We outline the full requirements of the abstraction boundary with the State Machine in the Execution Engines page.
In V1, the implementation is running on a single physical machine. As a consequence the consensus problem is trivial. Roughly, in V1, we only have the machine that is going to be replicated from V2 onward.
Scope
We start by describing the overall structure of the ordering machine while the details of the protocol and the functionality of the engines is described in the respective pages. The collection of all engines of the ordering machine is organized into three groups:
We mention the consensus engine, because it is one of the most important components from V2 onward, although it is not present in V1.
Overview¶
Transaction requests trigger transaction processing. Users or Solvers send transaction requests to the ordering machine: more specifically, to a worker engine. The ordering machine will eventually order and execute the transaction candidates included in these requests. Successfully processing a transaction request amounts to invoking the transition function of the RSM, according to an agreed upon order (determined by the mempool in collaboration with consensus). Typical transactions contain read and write operations to a “global” key-value store representing the state of the RSM. In general, transactions may have side effects besides state updates, but these are not considered in V1.
Todo
Are they ignored? Are ExecutionSummary and pub sub information of execution data side effects?
-
The mempool engines are responsible for managing transaction data, (pre-)processing them for consensus and execution.
-
The trivial consensus problem is already implicitly solved by the worker in V1.
-
In V1, there is only one worker. There will be multiple (on each Node) in future versions.
-
The execution engines execute the transactions:
-
Shards maintain the local copy of the global state (of the RSM), i.e., serve read and write requests of Executors to the key-value store (of the RSM).
-
Executors process transaction candidates, effectively invoking the transition function of the RSM.
A life cycle with some details¶
Let us consider a typical/generic case of what transaction submission triggers in the ordering machine. Note that all message sending is asynchronous.
Todo
removed ExecutorProcess--)ExecutorProcess: .
just before
activate ExecutorProcess
. I'm not sure what it represented. -->
sequenceDiagram
participant User
participant Worker
participant ExecutionSupervisor
participant ExecutorProcess
User-)Worker: TransactionRequest
Worker--)Worker: fix batch №
Worker-)User: TransactionAck
Worker--)Worker: Buffering & Shuffling
Worker--)Worker: fix tx №
Worker-)ExecutionSupervisor: spawnExecutor
ExecutionSupervisor-)Worker: EPID
Worker-)ExecutorProcess: ExecuteTransaction
Worker-)Shard: KVSAcquireLock
Shard-)Worker: KVSLockAcquired
Worker-)Shard: UpdateSeenAll
activate ExecutorProcess
ExecutorProcess-)Shard: KVSReadRequest
Shard-)ExecutorProcess: KVSRead
ExecutorProcess-)Shard: KVSWrite(Request)
%% ExecutorProcess-)WhereToIdontKnow: pub sub information of execution data
ExecutorProcess-)User: ExecutionSummary
ExecutorProcess-)Worker: ExecutorFinished
deactivate ExecutorProcess
The origin of the request¶
at user
The user creates a TransactionCandidate \(T\).
Quote
transaction executable \(\mathit{code}\) eager read keys \(r_e\) lazy read keys \(r_l\) \(w_w\) will-write keys \(w_m\) may-write keys
The User transmits the TransactionCandidate, packaged as a TransactionRequest, to the worker.
-
TransactionRequest → Worker Engine
This message is the origin of the life cycle.
Acknowledging the Users's TransactionRequest¶
Once it is clear that the transaction can be included into the current batch, the transaction candidate is "stamped" with the current batch number.
-
TransactionAck → user
This is mainly for purposes of UX, but gives also information for re-submission strategies.
Buffering and shuffling (optional)¶
One may want to order each transaction request within a batch only after a minimum number of other requests are received such that they can be "slightly" re-ordered within a very short time period---for several reasons.
Todo
add footnote / explain exactly the issues this avoids
Assigning a transaction number¶
Eventually, possibly triggered by a timer or when sufficiently many other transaction candidates have been received, the transaction request will be assigned a transaction number (within the current batch). Together the batch and transaction numbers can form a unique TxFingerprint for the TransactionRequest.
Transaction Numbers can be assigned immediately from some set of available transaction numbers, provided this complies with the buffering and shuffling scheme. For example, a worker could maintain a shuffled list of natural numbers for each batch, and assign them sequentially from the list. Alternatively, a worker could assign Transaction Numbers sequentially, in which case transaction candidates will end up ordered in the order they were received. This may slightly change in V2.
Requesting an available executor engine¶
The Worker Engine either directly spawns an Executor process or contacts an Execution Supervisor that provides an Executor Process ID (EPID).
-
SpawnExecutor→Execution Supervisor
Request a fresh process id of an available Executor, typically a newly spawned executor engine instance.
Note that the Execution Supervisor cannot, in general, wait for some existing Executor processes to terminate before spawning a new Executor process. It is possible that the existing Executor is waiting for a transaction candidate with an earlier TxFingerprint to be executed, but which does not yet have an Executor, in order to run. Therefore waiting for existing Executors to terminate can cause deadlock.
Also, we do not want to Execution Supervisor to become a bottleneck. Hence, the supervisor itself may be a concurrently running group of engines.
Providing a "fresh" executor¶
-
Send the external identity of a "fresh" executor engine instance, either newly spawned or a waiting in a fixed pool of available executors.
Todo
add one supervisor for each executor -->
Informing shard(s) about upcoming read and write requests¶
-
The Worker Engine informs all relevant Shards about locks for this TransactionCandidate (at this TxFingerprint). This also allows the shard to prepare for read and write requests (which can be used for optimizations like warming up disk storage).
If it helps, these messages can be batched and sent periodically.
Notifying the curator about acquired locks¶
at Shard
-
KVSLockAcquired → Worker Engine
The Shard informs the Worker Engine about locks acquired or "recorded" for this Worker Engine's TransactionCandidate. This becomes crucial below, at "Notifying shards about locks seen."
Starting transaction execution¶
-
This will trigger the actual execution. This execution includes any finalization or resource logic checks. Reads, for executor process purposes, include any reads of state necessary for any post-ordering execution, resource logic checks, or anything else dependent on the "current" state of the state machine. Writes, for purposes of the state machine, include only final, valid updates to state that are definitely committed.
Sending read requests¶
at Executor
-
While executing the transaction (depending on previous reads and or static data in the code) send the optional read requests to the Shard.
Sending write requests¶
at Executor
Todo
this should be a Request!
-
When the transaction candidate has run, for each write lock, the Executor informs the relevant Shard of a value to write (or, for
may_write
s, maybe to not update this value).
Notifying shards about locks "seen"¶
-
The Worker Engine informs each Shard of the greatest TxFingerprint such that it can be sure (because it has heard enough KVSLockAcquired messages) that the Shard will never hear a KVSAcquireLock from this Worker Engine with an equal or lower TxFingerprint in the future. This is crucial, because Shards cannot "read" values from storage at a particular TxFingerprint until they are sure that no writes with earlier TxFingerprints will happen.
Serving read and write requests from executors¶
at Shards
-
For each read lock, if all preceding write locks have been recorded, and the unique preceding write has produced a value, the Shard can read a value. For eager reads (as defined in KVSAcquireLock), it sends this value immediately to the Executor, and for lazy reads, it sends it in response to a KVSReadRequest from the Executor.
Handling transaction candidate side effects¶
at Executor
In addition to updates to state, transaction candidates can do other stuff (so long as it does not non-deterministically effect updates to state). This can include logging and sending messages to users.
Informing users about results and logs¶
The user is informed about the results of the transaction outcome. In future versions (from V2 onward), this comes in two flavors:
-
communication from only one validator's executor process (probably whatever validator the user submitted to)
-
some reliable transmission protocol between the user and at least a weak quorum of validators.
Informing the curator about execution termination¶
at Executor
-
ExecutorFinished → Worker Engine
Notify the Worker Engine about the finished execution. This triggers a log dump at the Worker Engine, and will be used for garbage collection from V2 onward. For logging purposes, we could encode values read / written in here.
-
In fact it is the latter time stamp that is most relevant; the former is merely an indicator about performance of the worker. ↩
-
This response may be delayed until the TxFingerprint is assigned. In V2, the "shuffling" of transactions may be pseudo-random so that we can quickly pass on transaction data to mirror workers. ↩