The Bank of Finland has this thing they call “the simulator” – it is basically a platform for stress-testing of large financial institutions at a systemic level.

Basically the same thing we have here in the Federal Reserve, the difference is in Finland  it is not as opaque. The “simulator” is a computational platform, allowing scenario testing through plug and play quantitative models leveraging payment and liquidity data from large financial institutions. It is used by several nations outside of Finland as well.

They invited me for a talk about the paper introducing FRACTI we published about a year ago as part of our research at the CCFEA.

If you by any remote chance find yourself around Helsinki on 8/25 and 26, and want to get engaged in interesting conversations about modelling of crowd behavior, data-driven predictive models, larges-scale simulations and systemic risk, come join us… the first and second rounds of Karhu are on me…

Languages are more than communication. They are often one’s window to reality.

Your language shapes how you think, what you can achieve, and how you achieve it. There are languages that often facilitate concepts in a domain of knowledge, or make them more obscure. You might try to use the French language for philosophy, or German for poetry. Using them the other way around might make you write more being forcibly more verbose. Using the wrong language can even impede the expression of your ideas.

Language defines reality. This is the case not only with natural languages, but also with computer languages.

Like natural languages, computer languages often grow from needs of specialized domains, and therefore are better suited for use cases relevant for that specific domain. In the past computer languages were born and bred in a specific domain, frozen to requirements of that domain in that specific point in time. If requirements on that domain evolve in order to follow increasing complexity of the problems at hand, that language would no longer fit.

In modern times computer languages must be dynamic, quasi-living things that must be able to evolve and adapt to solve new classes of problems and new computing environments. Modern problems are different from what we had to deal with a few years back. You must have adequate tools and methods in order to approach them properly. In the same way new, computational environments change in face of new demands and new hardware technologies. Single to multiple cores, cloud, cluster, grid computing.

The way in which you describe to a binary being a way to resolve a problem plays a very special role. This role is tied to the concept of representability. The effectiveness of your representation is limited by features of your language, your familiarity with that specific domain knowledge and your experience, i.e. thinking patterns you have used when approaching previous problems on that domain.

“A good notation has a subtlety and suggestiveness which at times makes it almost seem like a live teacher.” Bertrand Russell The World of Mathematics (1956).

If you zoom closer into the specialized domain of our interest, computational finance, and look at problems we had to approach in the past, and patterns we used to resolve them, we can list a number of important features our language (and environment) will have to support:

  • Responsiveness: Deterministic response time is critical in common use cases in computational finance. As rash as this may sound, the fact that you can keep your response time under a few dozen microseconds 99.99% of the time is irrelevant when you took a few seconds to decide on what to do waiting on a garbage collection. Even if just happened once that time, you wiped out all your hard-gained profits of day.
  • Adequate representation of data structures: Plain old data structures have to be represented properly. It is hard to believe several of the widespread programming platforms still have problems properly representing data structures introduced on CS 101 curricula, cases like contiguous arrays and sparse vectors. In computational finance we care about very specific abstractions, like proper representation of time series and currencies.
  • Functional-vectorization friendly: Representation of data structures must be able to leverage the vectorial nature of modern computer architectures through lambda functors. Functional support is crucial.
  • Simplified concurrency through continuations: Continuations, or co-routines, are probably the simplest and most abstract way to leverage concurrency. You can leverage streams, vectors and parallelism using simple patterns. No shared state synchronization required.
  • Interactive: Support for interactive command line for preliminary brainstorming, prototyping and testing. Being able to record, share and story-tell a resolution of a problem is very important. The record must support rich representation – plots, tables, structured formatting, etc – the more, the better. Communication and collaboration are critical, and your representation cannot ignore that. The hardcore problems of our times cannot be solved without proper and organic collaboration. Your representation must be collaboration friendly.
  • Mini-representations: Notations matter in any representation. Domains have specific ways to represent concepts, and your representation has to be flexible enough to adhere to use cases of that domain. Mini-representations are used here in the same sense as mini-languages, also called little languages, or domain specific language (DSL), are used as ways to leverage a host language for meta-representation. In other ways, you could use a language to “override” its tokens and represent a language for appropriate for streaming, or behavior.

These are personal (and of course biased) and are limited by my own experiences of patterns that seem to work best when solving practical problems related to computational finance.

As our research goes on it seems like the major missing piece is a proper representation of financial models. Which “language” to properly represent financial models, across all use cases: risk, trading, simulation, back testing, and others. The search continues.

R has several packages for interaction with relational databases, one of them is RMySQL. You might notice during installation that RMySQL requires the presence of mysql binaries:

package ‘RMySQL’ is available as a source package but not as a binary
Warning in install.packages :
package ‘RMySQL’ is not available (for R version 3.1.1)

The message is pretty straightforward: you need to compile the package against mysql headers and binaries to have it avalilable. You will need to go through a few simple steps to work around this:

Install homebrew:

Homebrew is the package manager that ‘install the stuff you need that apple didn’t”. Installation is done through a one liner ruby curl command.

itacoatiara jfaleiro$ ruby -e "$(curl -fsSL"
itacoatiara:~ jfaleiro$ which brew

You can see brew is now available.

Install mysql using homebrew:

itacoatiara jfaleiro$ brew install mysql

Start a mysql server locally

Start a mysql server process. You will need this for testing. Like any other process you might decide to install on your machines, make you you understand their vulnerabilities before anything else.

itacoatiara:~ jfaleiro$ mysql.server start
Starting MySQL

Sane check your mysql client installation

itacoatiara:~ jfaleiro$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.21 Homebrew

mysql> exit

Install RMySQL as source

> install.packages("RMySQL", type = "source")

Load library

> library(RMySQL)


Now you can test your connection

> db <- dbConnect(MySQL(), user='genome')
> result <- dbGetQuery(db, 'show databases;'); dbDisconnect(db)
> result
db result result
1 information_schema
2 test

This post was made with information made available by the human genome database and stack overflow.

The meta-model of QuantLET arrange components in terms of Models, Execution Modes, Shocks and Benchmarks, aggregating from the bottom up.

As we browse over some of the concepts behind this meta-model we will bring an overall description of the fundamentals behind the QuantLET framework.


Models represent a very specific problem in terms of dependent and independent parameters.

An applied example of an (unrealistic) model would be something like:  given LIBOR, benchmark interest and rates in US, US/EUR spread and crude oil spot price, what is the limit price of a long position and how many pips for a trailing stop in a conditional order with a highest chance of maximized profit?

A typical problem of stochastic optimization.

On this example, independent variables are given by:

  • Benchmark US rates
  • US/EUR spread in the spot rate
  • Crude oil spot price

And dependent variables:

  • Limit price of the long position
  • Rate maximum move in the stop loss leg
  • Exit price of the long position

Models might have one or more Execution Modes associated to it.

Execution Mode

An execution mode defines source and destination of financial data, represented by a high-frequency stream of events. For the example above, we could think about a few execution modes: simulation, back-testing, paper-trading and production.


In simulation mode, our model reads a stream of events from an endpoint simulating a synthetic Geometric Brownian Motion with pre-defined parameters for drift and volatility.


In back testing, our model reads a stream of events from historical parameters for each of the independent variables.

Paper Trading

In paper trading, our model posts orders to a simulated exchange environment, based on inbound data from real-time live data.


Reads inbound data from real-time live inbound channels, posts orders to real exchange channels


Shocks are re-configuration of execution modes with different parameters. For obvious reasons, not all execution modes allow the association of shocks. On the example before, we could think about a few shocks:

  • Simulation mode, with drifts in the range of (-0.30 to 0.30) in increments of 0.05
  • Back-testing mode, with LIBOR rates shocked up (or down) in the range  of (-2.0 to +2.00) in increments of 0.1


A benchmark allows the comparison of results from various shocks, looking for the lowest cost of the optimization function, or any other dimension you could think of. Some examples.

  • Highest single profit in a day
  • Highest accumulated profit in a period (week, month)
  • Highest number of profitable periods (days, weeks)
  • Highest value at risk exposure in a period

You can have multiple benchmarks active at any given time, and historical benchmarks are available for comparison.

The Remaining Framework

QuantLET is a framework – and as such it provides many battle tested, time saving components with a number of rich features. A few other pieces of the framework:

  • Tapestry: A powerful framework for data and CPU intensive based on map-reduce abstractions and data partitioning
  • Dashboard: A UNO-based front-end for OpenOffice family of spreadsheets.
  • Mediation: Configuration of models, modes, shocks and benchmarks at run-time

Among many other features – check QuantLET home often for more and updates.

Let me try to define the problem with a question: how to mutate variables defined in an outer scope inside an inner function?

Let’s illustrate with an example: cumulative moving averages of real-time price points .

In PyQuantLET we define a simple model to read ticks from an inbound endpoint, generate a cumulative moving average, and forward the result to an outbound endpoint in two lines:

f = quantlet.handler.stat.cma()
 [outbound((x, f(x))) for x in inbound]

Concrete inbound and outbound endpoints are defined as part of the execution mode of this model. A mode of execution could be a simulation in which the inbound endpoint is a real-time feed for trades of a specific underlying, and an outbound plot of price points and cumulative moving average.


Of course, nothing realistic, but that’s all it takes.  Ok, let’s now define the function for cumulative moving average (in reality we would use classes for the same purpose, but let’s stick to this for now):

def cma(): // this is wrong
     total, counter = (0,0)
     def result(value):
            total = total + value
            counter = counter + 1
            return total / counter
     return result

If you try this version, you will get something like this:

UnboundLocalError: local variable 'total' referenced before assignment

The message is self explanatory – seems like the reason is that rebinding names in a closed scope is not allowed in Python. Nothing like a dictatorship for life…

So, now the good part – the solution – rewrite your function so variables in the outer scope are associated to a container object, see…

def cma():
    total, counter =([0],[0])
    def result(value):
        total[0] = total[0] + value
        counter[0] = counter[0] + 1
        return total[0] / counter[0]
return result

Looks weird, but works – you can try yourself

I was looking into Protocol Buffers as an alternative to represent events in RMS architectures and trying to figure out which of its features are a match for RMS requirements, and which ones might be a drag.

As a reminder, RMS is a collection of patterns that are ideal for event-driven, low-latency applications, especially useful in high-frequency finance. Events in RMS are described in terms of a header and a body. The header is usually a dictionary, and the body must support the representation of elaborate graphs of objects, representing real world analytical use cases. QuantLET is an open source RMS framework.

Protocol Buffers

“Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data”. Comparisons with other alternatives like XML or JSON come to mind immediately, but just let’s say for now that Protocol Buffer translates to a smaller binary wire format, and therefore transfers are usually done in a fraction of a time of alternatives like XML or JSON.


A few features of Protocol Buffers are a great match to represent RMS events and high frequency applications


The transport format is based in a binary format, compact, and therefore really fast for transfers and serialization. It is ideal for high frequency solutions where the expected latency is less than a millisecond.

Backward Compatible

We can see that backward compatibility is a notion that is present from the grounds up. The notion of optional attributes and unique identification of fields through a number allow messages to be consumed by older clients as you roll out new versions of your information model.


A few features of Protocol Buffers that speak against its use as a transport protocol in RMS architectures.

Not Plain Objects

In RMS lingo we define ‘plain object’ as a structure capable of describing cyclical graphs of objects, in which objects are represented through an interface allowing access to its members either directly or through getters and setters.

With that definition in mind, Protocol Buffer objects cannot be seen as RMS plain objects. Protocol Buffers objects carry programmatic features – like fluent builders, and binding to language specific structures (like streams in Java or C++).

While bringing benefits, these same features make objects dependent on the framework when Protocol Buffers are not really necessary (in RMS a transport format is only necessary on endpoints, usually during marshalling/unmarshalling and transformation).

Lacks Inheritance Support

Ok, I know – Protocol Buffers allows the representation of a quasi inheritance in a few different ways, depending on how you intend on using it.

When a consumer knows in advance the expected subtype of a message, you can simply embed an instance of your base type on the derived type, you can embed subtypes as optional aggregated messages:

message Animal { // General animal properties.
optional double weight = 1;
extensions 1000 to max;
message Dog { // Dog-specific properties.
optional float average_bark_frequency = 1; }
message Cat { // Cat-specific properties.
enum Breed { TABBY = 1; CALICO = 2; ... }
optional Breed breed = 1;
extend Animal {
optional Dog dog = 1000;
optional Cat cat = 1001;

Ok, it might work. The problem with that is that you will need to know all the possible derived types in advance, for anything that is a bit more than really trivial use cases this is an impediment. Here is the alternative, for when a  consumer does not know in advance what types to expect, you have to play with a mix of features.

message Animal reserved 10 {
optional double weight = 1;
optional Color color = 2;
message Dog extends Animal reserved 10 {
optional float average_bark_frequence = 1;
message Cat extends Animal reserved 10 {
optional Breed breed = 1;
message Lion extends Cat reserved 10 {
optional boolean isAlphaMale = 1;

Now the dirty details: the description above is made in a ‘protoext’ file, and in order to give you something useful it has to be compiled to produce a proto file like this:

//@Hierarchi Lion:Cat:Animal
message Lion {
//-- Animal's indices start at 1 because it is the base class (Lifeform does not count since it is an interface)
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- Cat's indices starts at 11 because reserved is set to 10: 10+1
//@Class Cat
optional Breed breed = 11;
//-- Lion's indices starts at 21 because reserved is set to 10: 10+10+1
//@Class Lion
optional boolean isAlphaMale = 21;
//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Lion above
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 11;

So you have already compiled it once, and this thing here will have to be compiled again, to give you something you can finally use. You got the picture: yes, it is complicated.


So are there any alternatives to Protocol Buffers out there? Yes there are a few, each bringing their own benefits and drawbacks:

Specialized Types

You can use a set of collection type of your language of choice to represent graphs through keys and values. Some examples are maps of maps in Java, or dictionary sets in Python. Expensive and loosely typed, I just can’t think of any good reason to use it, you should avoid it if you can.

Native Serialization

This is language dependent, and brings its obvious and critical drawbacks. Java’s serialization, Python pickles, C++ streams, and so forth. Of course those formats are hardly interchangeable. In some cases (Java) the format is so large and cumbersome that makes it a hard fit for any serious low-latency application.


Large, slow, over engineered, over complicated, human readable (why if it is intended to be used by computers?). I still did not figure out why people seem to like it and its use is so overspread. I can only see justification for XML use away from low-latency solutions, representing maybe complex and text-driven structures like documents.


JSON (JavaScript Object Notation) is a lightweight data-interchange format, based on a subset of JavaScript. Again, human readable (can someone explain me why this is a good thing?) and supported by virtually any major platform out there. Faster and a bit smarter than XML, JSON is still too slow for low-latency applications.


SLICE (Specification Language for ICE) is the abstraction mechanism for separating ICE object interfaces from their implementations, on various languages. Really fast and lightweight, the problem is that it forces in the ICE paraphernalia. It is an option if you plan on using the whole stack.


Overall, as long as you clearly understand its limitations and pitfalls, and plan a work around them, given the alternatives and pros and cons, Protocol Buffers is a reasonable option to define and transport events and the information model in RMS systems.

A few notes on the configuration of a MacBook Pro video resolution (1280×800) in a Ubuntu 9.10 virtual appliance in Parallels 5:

  1. Make sure you install Parallels Tools
    You do that from Parallels menu: Virtual Machine -> Install Parallels Tools

  2. Modify X configuration to account for MacBook Pro proper resolution modes
    jfaleiro@ubuntu:~$ sudo vi /etc/X11/xorg.conf

    enter your admin password, and then change the file, commenting old line and adding a new one with your desired configuration:

    #                Modes   "800x600"
                     Modes   "1280x800" "1152x820" "1024x768"
  3. Re-apply configuration to all additional xserver configuration files
    jfaleiro@ubuntu:~$ sudo dpkg-reconfigure -phigh xserver-xorg
  4. Modify the amount of video memory on the virtual machineAgain, the Parallels menu: Virtual Machine -> Configure -> Video On my case, 16Mb did the job
  5. Finally, a fresh Restart
    jfaleiro@ubuntu:~$ sudo reboot

The highest resolution is fine for a full screen view. You can switch Ubuntu ( System -> Preferences -> Display ) to a lower resolution (1152×820 or 1024×768) if you intend on using a window view.