I was looking into Protocol Buffers as an alternative to represent events in RMS architectures and trying to figure out which of its features are a match for RMS requirements, and which ones might be a drag.

As a reminder, RMS is a collection of patterns that are ideal for event-driven, low-latency applications, especially useful in high-frequency finance. Events in RMS are described in terms of a header and a body. The header is usually a dictionary, and the body must support the representation of elaborate graphs of objects, representing real world analytical use cases. QuantLET is an open source RMS framework.

Protocol Buffers

“Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data”. Comparisons with other alternatives like XML or JSON come to mind immediately, but just let’s say for now that Protocol Buffer translates to a smaller binary wire format, and therefore transfers are usually done in a fraction of a time of alternatives like XML or JSON.

Benefits?

A few features of Protocol Buffers are a great match to represent RMS events and high frequency applications

Fast

The transport format is based in a binary format, compact, and therefore really fast for transfers and serialization. It is ideal for high frequency solutions where the expected latency is less than a millisecond.

Backward Compatible

We can see that backward compatibility is a notion that is present from the grounds up. The notion of optional attributes and unique identification of fields through a number allow messages to be consumed by older clients as you roll out new versions of your information model.

Drawbacks?

A few features of Protocol Buffers that speak against its use as a transport protocol in RMS architectures.

Not Plain Objects

In RMS lingo we define ‘plain object’ as a structure capable of describing cyclical graphs of objects, in which objects are represented through an interface allowing access to its members either directly or through getters and setters.

With that definition in mind, Protocol Buffer objects cannot be seen as RMS plain objects. Protocol Buffers objects carry programmatic features – like fluent builders, and binding to language specific structures (like streams in Java or C++).

While bringing benefits, these same features make objects dependent on the framework when Protocol Buffers are not really necessary (in RMS a transport format is only necessary on endpoints, usually during marshalling/unmarshalling and transformation).

Lacks Inheritance Support

Ok, I know – Protocol Buffers allows the representation of a quasi inheritance in a few different ways, depending on how you intend on using it.

When a consumer knows in advance the expected subtype of a message, you can simply embed an instance of your base type on the derived type, you can embed subtypes as optional aggregated messages:

message Animal { // General animal properties.
optional double weight = 1;
extensions 1000 to max;
}
message Dog { // Dog-specific properties.
optional float average_bark_frequency = 1; }
message Cat { // Cat-specific properties.
enum Breed { TABBY = 1; CALICO = 2; ... }
optional Breed breed = 1;
}
extend Animal {
optional Dog dog = 1000;
optional Cat cat = 1001;
}

Ok, it might work. The problem with that is that you will need to know all the possible derived types in advance, for anything that is a bit more than really trivial use cases this is an impediment. Here is the alternative, for when a  consumer does not know in advance what types to expect, you have to play with a mix of features.

message Animal reserved 10 {
optional double weight = 1;
optional Color color = 2;
}
message Dog extends Animal reserved 10 {
optional float average_bark_frequence = 1;
}
message Cat extends Animal reserved 10 {
optional Breed breed = 1;
}
message Lion extends Cat reserved 10 {
optional boolean isAlphaMale = 1;
}

Now the dirty details: the description above is made in a ‘protoext’ file, and in order to give you something useful it has to be compiled to produce a proto file like this:

//@Hierarchi Lion:Cat:Animal
message Lion {
//-- Animal's indices start at 1 because it is the base class (Lifeform does not count since it is an interface)
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- Cat's indices starts at 11 because reserved is set to 10: 10+1
//@Class Cat
optional Breed breed = 11;
//-- Lion's indices starts at 21 because reserved is set to 10: 10+10+1
//@Class Lion
optional boolean isAlphaMale = 21;
}
//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Lion above
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 11;
}

So you have already compiled it once, and this thing here will have to be compiled again, to give you something you can finally use. You got the picture: yes, it is complicated.

Alternatives

So are there any alternatives to Protocol Buffers out there? Yes there are a few, each bringing their own benefits and drawbacks:

Specialized Types

You can use a set of collection type of your language of choice to represent graphs through keys and values. Some examples are maps of maps in Java, or dictionary sets in Python. Expensive and loosely typed, I just can’t think of any good reason to use it, you should avoid it if you can.

Native Serialization

This is language dependent, and brings its obvious and critical drawbacks. Java’s serialization, Python pickles, C++ streams, and so forth. Of course those formats are hardly interchangeable. In some cases (Java) the format is so large and cumbersome that makes it a hard fit for any serious low-latency application.

XML

Large, slow, over engineered, over complicated, human readable (why if it is intended to be used by computers?). I still did not figure out why people seem to like it and its use is so overspread. I can only see justification for XML use away from low-latency solutions, representing maybe complex and text-driven structures like documents.

JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format, based on a subset of JavaScript. Again, human readable (can someone explain me why this is a good thing?) and supported by virtually any major platform out there. Faster and a bit smarter than XML, JSON is still too slow for low-latency applications.

SLICE

SLICE (Specification Language for ICE) is the abstraction mechanism for separating ICE object interfaces from their implementations, on various languages. Really fast and lightweight, the problem is that it forces in the ICE paraphernalia. It is an option if you plan on using the whole stack.

So?

Overall, as long as you clearly understand its limitations and pitfalls, and plan a work around them, given the alternatives and pros and cons, Protocol Buffers is a reasonable option to define and transport events and the information model in RMS systems.

Finally found some time to get out of my bubble and go over the configuration of a new Snow Leopard installation. Time to make a choice: Fink or MacPorts? For some reason I have *both* wondering around on my hard disk.

First of all, they both have to be upgraded to Mac OS X 10.6.1 – so, to the uninstall:

Fink:

%% sudo rm -rf /sw

and MacPort:

%% sudo rm -rf \
    /opt/local \
    /Applications/DarwinPorts \
    /Applications/MacPorts \
    /Library/LaunchDaemons/org.macports.* \
    /Library/Receipts/DarwinPorts*.pkg \
    /Library/Receipts/MacPorts*.pkg \
    /Library/StartupItems/DarwinPortsStartup \
    /Library/Tcl/darwinports1.0 \
    /Library/Tcl/macports1.0 \
    ~/.macports

Finally the installation: I decided to stay with MacPort, seems to provide more packages and most of the tools I usually look for are included – downloaded the 10.6.1 DMG for it and

%% sudo port -v selfupdate

Simple and straightforward, all works like a breeze.

In event driven architectures following RMS patterns you have a few different ways of handling incoming events – we can think of four different patterns, three of them supported in QuantLET:

Specialized Declarative

Mostly through a declarative language (CEP guys/gals love that) like EQL or dialects. In other words, a language specialized for event handling (DSL). But be advised – despite of its ‘coolness’ and being (most times at least) a time saver and a shortcut it requires specialized skills and tools not available on every corner (translation: lots of $$$).

Functional Declarative

Similar to above, one example would be something like a lisp-like dialect. Another example are constructs based on paradigms like map-reduction for large data fabric clusters. All as above applies

Second-Order Declarative

Based on inferences derived from second-order logic descriptions. You: “What? Why are you bringing this up here?” Me: Well, inference engines, specially RETE engines, are basically a correlated forwarded-chain of rules and facts, optimized in a tree-like fashion for speed in which “triggering” can be seen as an event’s action.

A time saver, flexible, and ideal when proper tools and resources are not available.

Finite State Machines

Actions are associated to changes in state represented by a directed graph. Conditional transitions are also supported by ‘guards’. Incoming events carry either a next state of the graph, or the indication of a transition. Different types of actions are then performed based on the transition and the association, i.e: actions performed when entering a state (entry), leaving a state (exit), or when changed from one specific state to another (transition).

Observer-Observable Imperative

Simplest, but can lead to a lot of code to write that *will* turn into spaghetti if done by the uninitiated. Base yourself on proven paradigms that enforce separation of control and state to avoid adding to your own pain.

Procedural Imperative

Read ‘anything-else-imperative-except-the-observer-pattern-mentioned-above’ – Don’t even think about trying that unless your model is *really* simple…

Ok — Now, pick one, any one…

In finance (and in most domains for that matter) your information is as good as the way it is presented. Few features on your applications will be more critical than the function of visualization.

Price and Volume, 3D

Price, Volume, Gain & Loss in 3D

The example above from MIT’s financial visualization lab shows price, volume over a time series and instantaneous gain/loss. As you can see this and other similar applications out there resemble a neat 3D gaming interface and not grandpa’s trading blotter.

Anyway, needless to say, the choice of a framework to support your user experience can go as far as determining the success or failure of your application – in some cases it can even balance for limitations on other functions performed in the back-end.

As of now thankfully there are many good options out there. Hands down RIA (rich internet [or interactive in MS parlance] applications) frameworks are best suited for the task. In essence RIA frameworks allow thick, full featured clients to be deployed and launched over the internet. The main current players are JavaFX, Silverlight, Flex, GWT and Openlazslo.

JavaFX

The JavaFX platform was announced by Sun about a year ago. A long waited response to something to counter balance deficiencies in the standard Swing/AWT. It is available by default on JRE 6 and can optionally rely on battle tested JNLP for remote deployment, execution and security.

Some of its major limitations come from the fact that even after one year, good development tools are still not available. Most common development tools are provided as part of the NetBeans IDE, some limited plugins are available for Eclipse IDE. We could not find many commercial applications relying on the framework as well.

Strong points are embedded JDK support, reliance on stable Java protocols and Java language binding and a neat scripting language for view definition, for example:

Button {
  text: "Click Me"
  action:
    function():Void {
      MessageDialog {
        title: "JavaFX Script Rocks!"
        // This string has a newline in the source code
        message: "JavaFX Script is Simple, Elegant,
         and Leverages the Power of Java"
        visible: true
       }
     }
}

Defines a button, and a function to be triggered after an action to that button – this sample code is an exact copy of code found in the scripting tutorial.

Silverlight

Silverlight is MIcrosoft’s RIA framework for .NET, very popular in financial software shops. Runtime is supported for Mac and Windows, development environment in Visual Studio requires Windows. Views are declared following XML descriptors as well.

<UserControl x:Class="DiggSample.Page"
    xmlns="http://schemas.microsoft.com/client/2007"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:Digg="clr-namespace:DiggSample;assembly=DiggSample">
    <Grid Style="{StaticResource TopGrid}">
            <Button x:Name="btnSearch"
                    Content="Search"
                    Click="SearchBtn_Click"
                    Style="{StaticResource SearchButton}" />
    </Grid>
</UserControl>

Again, the same usage sample: a button and an action. This code snippet – reduced for simplicity – is from Silverlight’s tutorial and samples. Silverlight is target mainly at Windows platforms, Mac is supported as a run time as well. For an open source implementation covering other OSes as of this moment you will have to rely on Moonlight, Mono’s implementation.

Flex

Adobe’s Flex has a commercial and an open source SDK distribution. Since it was one of the early products on this arena, is the most widely used, specially for public web applications.

It is not an exception when it comes to XML-like language for UI definition called MXML and a “powerful object-oriented language” called ActionScript (yes, we need yet another). Below is a sample MXML and script of a Flickr-like photo application.

<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml"
    backgroundGradientColors="[0xFFFFFF, 0xAAAAAA]"
    horizontalAlign="left"
    verticalGap="15"
    horizontalGap="15">

    <mx:Script>
        <![CDATA[
            import mx.collections.ArrayCollection;
            import mx.rpc.events.ResultEvent;

            [Bindable]
            private var photoFeed:ArrayCollection;

            private function requestPhotos():void {
                photoService.cancel();
                var params:Object = new Object();
                params.format = 'rss_200_enc';
                params.tags = searchTerms.text;
                photoService.send(params);
            }

            private function photoHandler(event:ResultEvent):void {
                 photoFeed = event.result.rss.channel.item as ArrayCollection;
            }
         ]]>
    </mx:Script>

    <mx:HTTPService id="photoService"
        url="http://api.flickr.com/services/feeds/photos_public.gne"
        result="photoHandler(event)" />

	<mx:HBox>
		<mx:Label text="Flickr tags or search terms:" />
		<mx:TextInput id="searchTerms" />
		<mx:Button label="Search"
			click="requestPhotos()" />
	</mx:HBox>

	<mx:TileList width="100%" height="100%"
		dataProvider="{photoFeed}"
		itemRenderer="FlickrThumbnail">
	</mx:TileList>

</mx:Application>

There is extensive documentation and tutorials available. This code snippet came from there as well.

It is interesting to note the recent swap of designers and developers working on JavaFX and Flex SDK – some common strategies on these two arenas started to take shape. It would not come as a surprise if these platforms converge in the near future.

GWT

The GWT platform from all powerful Google is basically a set of development tools and IDE plugins that allow you to use Java to define RIA solutions. No surprises here as well, although you should not need to manipulate them directly, XML is used again to define GWT “modules”. You can find plenty of examples and documentation on GWT’s tutorial resources.

OpenLaszlo

My favorite, an DHTML (optionally FLASH) based framework, many strong points: stable, open source, simple to deploy, test. A variety of resources to support data biding and scripting. My only comment would be once more, the over use of XML as a language called LZX:

<canvas height="200" width="500">
  <window x="20" y="20" width="150" title="Simple Window" resizable="true">
    <button text="My button" onclick="this.parent.setAttribute('title', 'You clicked it');"/>
  </window>
</canvas>

Well, self explanatory. A button again and an action. You can embed scripts in LZX as well:

<canvas height="120">
   <script>
     <![CDATA[
     for (var i = 0; i < 11; i++) {
       Debug.write(i);
     }
     ]]>
   </script>
</canvas>

Again, these examples and many others can be found in Openlaszlo’s scripting tutorial.

Finally

Great options, you might want to try these tutorials and find out which ones fit your use cases better. Sorry if you were expecting this article to provide a definitive answer here, far from that.

This is an active area of development currently, so you should expect many improvements for each of these contenders over the upcoming months.

I like the simple approach taken by John Ousternhout on something that people fancily refer to as “Ousterhout’s Dichotomy”. In essence: high-level computer programming languages are separated into two groups: “system programming languages” and “scripting languages”.

The thing is that it is difficult to define something as a “script” – is C# or Java a scripting language?

On a correlated subject I came across a rather old post from Linus Torvalds in which, despite of the tone, some interesting points came through -

If you have to be concerned with numbers and numbers of abstraction layers, a language is not adequate as a system programming language. Even though abstraction layers simplify programmatic representation, they of course reduce traceability (why is this not working?) and performance. That would rule out C++ as a system programming language, and any other object-oriented ones. We could rewrite that and create our very own “Quick and Dirty Dichotomy”:

  • System programming languages: fast, real-time, strongly typed, natively compiled, features designed towards interaction with lower-level assembly and hardware drivers: C, Pascal, Fortran
  • Application programming languages: fast – although not real-time, just in time compilation, features designed towards productivity and code production in large scale, support abstraction enhancing features like object orientation and loose-typing: Java, Python, Ruby, C#

Where exactly does C++ fit in? — I hope that does not qualify me as a dinosaur…

Next Page »