I was looking into Protocol Buffers as an alternative to represent events in RMS architectures and trying to figure out which of its features are a match for RMS requirements, and which ones might be a drag.
As a reminder, RMS is a collection of patterns that are ideal for event-driven, low-latency applications, especially useful in high-frequency finance. Events in RMS are described in terms of a header and a body. The header is usually a dictionary, and the body must support the representation of elaborate graphs of objects, representing real world analytical use cases. QuantLET is an open source RMS framework.
Protocol Buffers
“Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data”. Comparisons with other alternatives like XML or JSON come to mind immediately, but just let’s say for now that Protocol Buffer translates to a smaller binary wire format, and therefore transfers are usually done in a fraction of a time of alternatives like XML or JSON.
Benefits?
A few features of Protocol Buffers are a great match to represent RMS events and high frequency applications
Fast
The transport format is based in a binary format, compact, and therefore really fast for transfers and serialization. It is ideal for high frequency solutions where the expected latency is less than a millisecond.
Backward Compatible
We can see that backward compatibility is a notion that is present from the grounds up. The notion of optional attributes and unique identification of fields through a number allow messages to be consumed by older clients as you roll out new versions of your information model.
Drawbacks?
A few features of Protocol Buffers that speak against its use as a transport protocol in RMS architectures.
Not Plain Objects
In RMS lingo we define ‘plain object’ as a structure capable of describing cyclical graphs of objects, in which objects are represented through an interface allowing access to its members either directly or through getters and setters.
With that definition in mind, Protocol Buffer objects cannot be seen as RMS plain objects. Protocol Buffers objects carry programmatic features – like fluent builders, and binding to language specific structures (like streams in Java or C++).
While bringing benefits, these same features make objects dependent on the framework when Protocol Buffers are not really necessary (in RMS a transport format is only necessary on endpoints, usually during marshalling/unmarshalling and transformation).
Lacks Inheritance Support
Ok, I know – Protocol Buffers allows the representation of a quasi inheritance in a few different ways, depending on how you intend on using it.
When a consumer knows in advance the expected subtype of a message, you can simply embed an instance of your base type on the derived type, you can embed subtypes as optional aggregated messages:
message Animal { // General animal properties.
optional double weight = 1;
extensions 1000 to max;
}
message Dog { // Dog-specific properties.
optional float average_bark_frequency = 1; }
message Cat { // Cat-specific properties.
enum Breed { TABBY = 1; CALICO = 2; ... }
optional Breed breed = 1;
}
extend Animal {
optional Dog dog = 1000;
optional Cat cat = 1001;
}
Ok, it might work. The problem with that is that you will need to know all the possible derived types in advance, for anything that is a bit more than really trivial use cases this is an impediment. Here is the alternative, for when a consumer does not know in advance what types to expect, you have to play with a mix of features.
message Animal reserved 10 {
optional double weight = 1;
optional Color color = 2;
}
message Dog extends Animal reserved 10 {
optional float average_bark_frequence = 1;
}
message Cat extends Animal reserved 10 {
optional Breed breed = 1;
}
message Lion extends Cat reserved 10 {
optional boolean isAlphaMale = 1;
}
Now the dirty details: the description above is made in a ‘protoext’ file, and in order to give you something useful it has to be compiled to produce a proto file like this:
//@Hierarchi Lion:Cat:Animal
message Lion {
//-- Animal's indices start at 1 because it is the base class (Lifeform does not count since it is an interface)
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- Cat's indices starts at 11 because reserved is set to 10: 10+1
//@Class Cat
optional Breed breed = 11;
//-- Lion's indices starts at 21 because reserved is set to 10: 10+10+1
//@Class Lion
optional boolean isAlphaMale = 21;
}
//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Lion above
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 11;
}
So you have already compiled it once, and this thing here will have to be compiled again, to give you something you can finally use. You got the picture: yes, it is complicated.
Alternatives
So are there any alternatives to Protocol Buffers out there? Yes there are a few, each bringing their own benefits and drawbacks:
Specialized Types
You can use a set of collection type of your language of choice to represent graphs through keys and values. Some examples are maps of maps in Java, or dictionary sets in Python. Expensive and loosely typed, I just can’t think of any good reason to use it, you should avoid it if you can.
Native Serialization
This is language dependent, and brings its obvious and critical drawbacks. Java’s serialization, Python pickles, C++ streams, and so forth. Of course those formats are hardly interchangeable. In some cases (Java) the format is so large and cumbersome that makes it a hard fit for any serious low-latency application.
XML
Large, slow, over engineered, over complicated, human readable (why if it is intended to be used by computers?). I still did not figure out why people seem to like it and its use is so overspread. I can only see justification for XML use away from low-latency solutions, representing maybe complex and text-driven structures like documents.
JSON
JSON (JavaScript Object Notation) is a lightweight data-interchange format, based on a subset of JavaScript. Again, human readable (can someone explain me why this is a good thing?) and supported by virtually any major platform out there. Faster and a bit smarter than XML, JSON is still too slow for low-latency applications.
SLICE
SLICE (Specification Language for ICE) is the abstraction mechanism for separating ICE object interfaces from their implementations, on various languages. Really fast and lightweight, the problem is that it forces in the ICE paraphernalia. It is an option if you plan on using the whole stack.
So?
Overall, as long as you clearly understand its limitations and pitfalls, and plan a work around them, given the alternatives and pros and cons, Protocol Buffers is a reasonable option to define and transport events and the information model in RMS systems.