R has several packages for interaction with relational databases, one of them is RMySQL. You might notice during installation that RMySQL requires the presence of mysql binaries:

>install.packages("RMySQL")
package ‘RMySQL’ is available as a source package but not as a binary
Warning in install.packages :
package ‘RMySQL’ is not available (for R version 3.1.1)

The message is pretty straightforward: you need to compile the package against mysql headers and binaries to have it avalilable. You will need to go through a few simple steps to work around this:

Install homebrew:

Homebrew is the package manager that ‘install the stuff you need that apple didn’t”. Installation is done through a one liner ruby curl command.

itacoatiara jfaleiro$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
itacoatiara:~ jfaleiro$ which brew
/usr/local/bin/brew

You can see brew is now available.

Install mysql using homebrew:


itacoatiara jfaleiro$ brew install mysql

Start a mysql server locally

Start a mysql server process. You will need this for testing. Like any other process you might decide to install on your machines, make you you understand their vulnerabilities before anything else.

itacoatiara:~ jfaleiro$ mysql.server start
Starting MySQL
. SUCCESS!

Sane check your mysql client installation


itacoatiara:~ jfaleiro$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.21 Homebrew

(…)
mysql> exit

Install RMySQL as source


> install.packages("RMySQL", type = "source")

Load library


> library(RMySQL)

Enjoy

Now you can test your connection


> db <- dbConnect(MySQL(), user='genome')
> result <- dbGetQuery(db, 'show databases;'); dbDisconnect(db)
> result
db result result
Database
1 information_schema
2 test

This post was made with information made available by the human genome database and stack overflow.

The meta-model of QuantLET arrange components in terms of Models, Execution Modes, Shocks and Benchmarks, aggregating from the bottom up.

As we browse over some of the concepts behind this meta-model we will bring an overall description of the fundamentals behind the QuantLET framework.

Models

Models represent a very specific problem in terms of dependent and independent parameters.

An applied example of an (unrealistic) model would be something like:  given LIBOR, benchmark interest and rates in US, US/EUR spread and crude oil spot price, what is the limit price of a long position and how many pips for a trailing stop in a conditional order with a highest chance of maximized profit?

A typical problem of stochastic optimization.

On this example, independent variables are given by:

  • LIBOR
  • Benchmark US rates
  • US/EUR spread in the spot rate
  • Crude oil spot price

And dependent variables:

  • Limit price of the long position
  • Rate maximum move in the stop loss leg
  • Exit price of the long position

Models might have one or more Execution Modes associated to it.

Execution Mode

An execution mode defines source and destination of financial data, represented by a high-frequency stream of events. For the example above, we could think about a few execution modes: simulation, back-testing, paper-trading and production.

Simulation

In simulation mode, our model reads a stream of events from an endpoint simulating a synthetic Geometric Brownian Motion with pre-defined parameters for drift and volatility.

Back-Testing

In back testing, our model reads a stream of events from historical parameters for each of the independent variables.

Paper Trading

In paper trading, our model posts orders to a simulated exchange environment, based on inbound data from real-time live data.

Production

Reads inbound data from real-time live inbound channels, posts orders to real exchange channels

Shocks

Shocks are re-configuration of execution modes with different parameters. For obvious reasons, not all execution modes allow the association of shocks. On the example before, we could think about a few shocks:

  • Simulation mode, with drifts in the range of (-0.30 to 0.30) in increments of 0.05
  • Back-testing mode, with LIBOR rates shocked up (or down) in the range  of (-2.0 to +2.00) in increments of 0.1

Benchmark

A benchmark allows the comparison of results from various shocks, looking for the lowest cost of the optimization function, or any other dimension you could think of. Some examples.

  • Highest single profit in a day
  • Highest accumulated profit in a period (week, month)
  • Highest number of profitable periods (days, weeks)
  • Highest value at risk exposure in a period

You can have multiple benchmarks active at any given time, and historical benchmarks are available for comparison.

The Remaining Framework

QuantLET is a framework – and as such it provides many battle tested, time saving components with a number of rich features. A few other pieces of the framework:

  • Tapestry: A powerful framework for data and CPU intensive based on map-reduce abstractions and data partitioning
  • Dashboard: A UNO-based front-end for OpenOffice family of spreadsheets.
  • Mediation: Configuration of models, modes, shocks and benchmarks at run-time

Among many other features – check QuantLET home often for more and updates.

Let me try to define the problem with a question: how to mutate variables defined in an outer scope inside an inner function?

Let’s illustrate with an example: cumulative moving averages of real-time price points .

In PyQuantLET we define a simple model to read ticks from an inbound endpoint, generate a cumulative moving average, and forward the result to an outbound endpoint in two lines:

f = quantlet.handler.stat.cma()
 [outbound((x, f(x))) for x in inbound]

Concrete inbound and outbound endpoints are defined as part of the execution mode of this model. A mode of execution could be a simulation in which the inbound endpoint is a real-time feed for trades of a specific underlying, and an outbound plot of price points and cumulative moving average.

execute_mode(id='simulation',
     model=cma(inbound=feed(''tcp://localhost:61616?wireFormat=openwire'),
               outbound=graph(['plot']
              )
     )

Of course, nothing realistic, but that’s all it takes.  Ok, let’s now define the function for cumulative moving average (in reality we would use classes for the same purpose, but let’s stick to this for now):

def cma(): // this is wrong
     total, counter = (0,0)
     def result(value):
            total = total + value
            counter = counter + 1
            return total / counter
     return result

If you try this version, you will get something like this:

UnboundLocalError: local variable 'total' referenced before assignment

The message is self explanatory – seems like the reason is that rebinding names in a closed scope is not allowed in Python. Nothing like a dictatorship for life…

So, now the good part – the solution – rewrite your function so variables in the outer scope are associated to a container object, see…

def cma():
    total, counter =([0],[0])
    def result(value):
        total[0] = total[0] + value
        counter[0] = counter[0] + 1
        return total[0] / counter[0]
return result

Looks weird, but works – you can try yourself

I was looking into Protocol Buffers as an alternative to represent events in RMS architectures and trying to figure out which of its features are a match for RMS requirements, and which ones might be a drag.

As a reminder, RMS is a collection of patterns that are ideal for event-driven, low-latency applications, especially useful in high-frequency finance. Events in RMS are described in terms of a header and a body. The header is usually a dictionary, and the body must support the representation of elaborate graphs of objects, representing real world analytical use cases. QuantLET is an open source RMS framework.

Protocol Buffers

“Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data”. Comparisons with other alternatives like XML or JSON come to mind immediately, but just let’s say for now that Protocol Buffer translates to a smaller binary wire format, and therefore transfers are usually done in a fraction of a time of alternatives like XML or JSON.

Benefits?

A few features of Protocol Buffers are a great match to represent RMS events and high frequency applications

Fast

The transport format is based in a binary format, compact, and therefore really fast for transfers and serialization. It is ideal for high frequency solutions where the expected latency is less than a millisecond.

Backward Compatible

We can see that backward compatibility is a notion that is present from the grounds up. The notion of optional attributes and unique identification of fields through a number allow messages to be consumed by older clients as you roll out new versions of your information model.

Drawbacks?

A few features of Protocol Buffers that speak against its use as a transport protocol in RMS architectures.

Not Plain Objects

In RMS lingo we define ‘plain object’ as a structure capable of describing cyclical graphs of objects, in which objects are represented through an interface allowing access to its members either directly or through getters and setters.

With that definition in mind, Protocol Buffer objects cannot be seen as RMS plain objects. Protocol Buffers objects carry programmatic features – like fluent builders, and binding to language specific structures (like streams in Java or C++).

While bringing benefits, these same features make objects dependent on the framework when Protocol Buffers are not really necessary (in RMS a transport format is only necessary on endpoints, usually during marshalling/unmarshalling and transformation).

Lacks Inheritance Support

Ok, I know – Protocol Buffers allows the representation of a quasi inheritance in a few different ways, depending on how you intend on using it.

When a consumer knows in advance the expected subtype of a message, you can simply embed an instance of your base type on the derived type, you can embed subtypes as optional aggregated messages:

message Animal { // General animal properties.
optional double weight = 1;
extensions 1000 to max;
}
message Dog { // Dog-specific properties.
optional float average_bark_frequency = 1; }
message Cat { // Cat-specific properties.
enum Breed { TABBY = 1; CALICO = 2; ... }
optional Breed breed = 1;
}
extend Animal {
optional Dog dog = 1000;
optional Cat cat = 1001;
}

Ok, it might work. The problem with that is that you will need to know all the possible derived types in advance, for anything that is a bit more than really trivial use cases this is an impediment. Here is the alternative, for when a  consumer does not know in advance what types to expect, you have to play with a mix of features.

message Animal reserved 10 {
optional double weight = 1;
optional Color color = 2;
}
message Dog extends Animal reserved 10 {
optional float average_bark_frequence = 1;
}
message Cat extends Animal reserved 10 {
optional Breed breed = 1;
}
message Lion extends Cat reserved 10 {
optional boolean isAlphaMale = 1;
}

Now the dirty details: the description above is made in a ‘protoext’ file, and in order to give you something useful it has to be compiled to produce a proto file like this:

//@Hierarchi Lion:Cat:Animal
message Lion {
//-- Animal's indices start at 1 because it is the base class (Lifeform does not count since it is an interface)
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- Cat's indices starts at 11 because reserved is set to 10: 10+1
//@Class Cat
optional Breed breed = 11;
//-- Lion's indices starts at 21 because reserved is set to 10: 10+10+1
//@Class Lion
optional boolean isAlphaMale = 21;
}
//@Hierarchi Dog:Animal
message Dog {
//-- See comments for Lion above
//@Class Animal
optional double weight = 1;
//@Class Animal
optional Color color = 2;
//-- See comments for Cat above
//@Class Dog
optional float aveage_bark_frequence = 11;
}

So you have already compiled it once, and this thing here will have to be compiled again, to give you something you can finally use. You got the picture: yes, it is complicated.

Alternatives

So are there any alternatives to Protocol Buffers out there? Yes there are a few, each bringing their own benefits and drawbacks:

Specialized Types

You can use a set of collection type of your language of choice to represent graphs through keys and values. Some examples are maps of maps in Java, or dictionary sets in Python. Expensive and loosely typed, I just can’t think of any good reason to use it, you should avoid it if you can.

Native Serialization

This is language dependent, and brings its obvious and critical drawbacks. Java’s serialization, Python pickles, C++ streams, and so forth. Of course those formats are hardly interchangeable. In some cases (Java) the format is so large and cumbersome that makes it a hard fit for any serious low-latency application.

XML

Large, slow, over engineered, over complicated, human readable (why if it is intended to be used by computers?). I still did not figure out why people seem to like it and its use is so overspread. I can only see justification for XML use away from low-latency solutions, representing maybe complex and text-driven structures like documents.

JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format, based on a subset of JavaScript. Again, human readable (can someone explain me why this is a good thing?) and supported by virtually any major platform out there. Faster and a bit smarter than XML, JSON is still too slow for low-latency applications.

SLICE

SLICE (Specification Language for ICE) is the abstraction mechanism for separating ICE object interfaces from their implementations, on various languages. Really fast and lightweight, the problem is that it forces in the ICE paraphernalia. It is an option if you plan on using the whole stack.

So?

Overall, as long as you clearly understand its limitations and pitfalls, and plan a work around them, given the alternatives and pros and cons, Protocol Buffers is a reasonable option to define and transport events and the information model in RMS systems.

A few notes on the configuration of a MacBook Pro video resolution (1280×800) in a Ubuntu 9.10 virtual appliance in Parallels 5:

  1. Make sure you install Parallels Tools
    You do that from Parallels menu: Virtual Machine -> Install Parallels Tools

  2. Modify X configuration to account for MacBook Pro proper resolution modes
    jfaleiro@ubuntu:~$ sudo vi /etc/X11/xorg.conf

    enter your admin password, and then change the file, commenting old line and adding a new one with your desired configuration:

    #                Modes   "800x600"
                     Modes   "1280x800" "1152x820" "1024x768"
  3. Re-apply configuration to all additional xserver configuration files
    jfaleiro@ubuntu:~$ sudo dpkg-reconfigure -phigh xserver-xorg
  4. Modify the amount of video memory on the virtual machineAgain, the Parallels menu: Virtual Machine -> Configure -> Video On my case, 16Mb did the job
  5. Finally, a fresh Restart
    jfaleiro@ubuntu:~$ sudo reboot

The highest resolution is fine for a full screen view. You can switch Ubuntu ( System -> Preferences -> Display ) to a lower resolution (1152×820 or 1024×768) if you intend on using a window view.

Finally found some time to get out of my bubble and go over the configuration of a new Snow Leopard installation. Time to make a choice: Fink or MacPorts? For some reason I have *both* wondering around on my hard disk.

First of all, they both have to be upgraded to Mac OS X 10.6.1 – so, to the uninstall:

Fink:

%% sudo rm -rf /sw

and MacPort:

%% sudo rm -rf \
    /opt/local \
    /Applications/DarwinPorts \
    /Applications/MacPorts \
    /Library/LaunchDaemons/org.macports.* \
    /Library/Receipts/DarwinPorts*.pkg \
    /Library/Receipts/MacPorts*.pkg \
    /Library/StartupItems/DarwinPortsStartup \
    /Library/Tcl/darwinports1.0 \
    /Library/Tcl/macports1.0 \
    ~/.macports

Finally the installation: I decided to stay with MacPort, seems to provide more packages and most of the tools I usually look for are included – downloaded the 10.6.1 DMG for it and

%% sudo port -v selfupdate

Simple and straightforward, all works like a breeze.

In event driven architectures following RMS patterns you have a few different ways of handling incoming events – we can think of four different patterns, three of them supported in QuantLET:

Specialized Declarative

Mostly through a declarative language (CEP guys/gals love that) like EQL or dialects. In other words, a language specialized for event handling (DSL). But be advised – despite of its ‘coolness’ and being (most times at least) a time saver and a shortcut it requires specialized skills and tools not available on every corner (translation: lots of $$$).

Functional Declarative

Similar to above, one example would be something like a lisp-like dialect. Another example are constructs based on paradigms like map-reduction for large data fabric clusters. All as above applies

Second-Order Declarative

Based on inferences derived from second-order logic descriptions. You: “What? Why are you bringing this up here?” Me: Well, inference engines, specially RETE engines, are basically a correlated forwarded-chain of rules and facts, optimized in a tree-like fashion for speed in which “triggering” can be seen as an event’s action.

A time saver, flexible, and ideal when proper tools and resources are not available.

Finite State Machines

Actions are associated to changes in state represented by a directed graph. Conditional transitions are also supported by ‘guards’. Incoming events carry either a next state of the graph, or the indication of a transition. Different types of actions are then performed based on the transition and the association, i.e: actions performed when entering a state (entry), leaving a state (exit), or when changed from one specific state to another (transition).

Observer-Observable Imperative

Simplest, but can lead to a lot of code to write that *will* turn into spaghetti if done by the uninitiated. Base yourself on proven paradigms that enforce separation of control and state to avoid adding to your own pain.

Procedural Imperative

Read ‘anything-else-imperative-except-the-observer-pattern-mentioned-above’ – Don’t even think about trying that unless your model is *really* simple…

Ok — Now, pick one, any one…

Follow

Get every new post delivered to your Inbox.