This is a question that has been around since early 20th century, on the first notes of the Dow Theory: Does technical analysis work? And by “work” I mean, can you get consistent profits using technical analysis?

On this latest publication we answer this question using data, simulations… and science.

If you wonder what is technical analysis, or TA, you might not know what it is, but you must certainly have seen their proponents on day time television staring at the camera barking buy and sell commands at you and yelling terms like resistance, support, momentum and trends, pointing to charts of time series and price variation of an asset with overlapped lines.

At first sight they make as much of a sense as black magic, or tea leaves reading.

Another common trait of TAs is that they usually (if not always) try to convince investors to assume the same long or short positions they recommend – what should damage the profitability of their own strategy. There are also some TA blogs in the web advertising consistent gains in the range of triple digits in a year. Yes, that is right… if you doubt google it. And use your common sense to judge it.

Technical Analysts is all about formalization of visual patterns. Indeed TAs refer so often to charts that their detractors call them “chartists”.

screen-shot-2016-11-26-at-10-50-14-pm

This is an example of some signals one can derive from charts, taken from an online source that gives you instructions on how to read chart patterns.

Since I mentioned their detractors, let me list them, the way I see it. People going around trying to figure out how to make money in financial markets using computers in one way or another is divided in three sects: technical analysts, fundamentalists and quantitative analysts.

Fundamentalists believe that value is given by intrinsic economic features of the environment associated to the underlying. If an underlying is issued by a corporation some of the features could be product and market placement, quality of management, competition, and culture. If the underlying is issued by a sovereign government: interest rates, geopolitics, public policies. Names like Warren Buffett and Philip Fisher follow fundamentalist strategies.

Quantitative analysts look at statistical properties of price movements, individually or in correlated pairs or groups, trying to predict future price movements.

Lastly, to the sect we care about on this study, if we navigate past all the terms they use and get to the basics, technical analysts believe they can predict future value of a price path by looking at past features of a chart. They believe for example that if a price movement hits something like a resistance line it has a tendency to fall, or on the other hand if a price hit a support line you can nowhere but up.

So, which sect teachings is more profitable? Who should you listen to?

If I move my personal biases aside I will have to say that the answer so far has actually depended on who you ask, the personal beliefs and biases of that individual.

Like I said before, with this publication we push aside beliefs and biases and look solely at method and data. We use quantitative techniques to dissect and back test technical analysis, answering one simple question using science and data: does technical analysis really work?

screen-shot-2016-11-26-at-11-16-46-pm

Scatter plot showing how dampening, load (cost) and sigma affects profitability (fitness) of a momentum strategy used in technical analysis, one of the several insights on the research paper. Refer to the link for all details.

The original bootstrap of this work is a FRACTI notebook dissecting the crown strategy of technical analysts: the momentum strategy. You can follow the analysis step by step in there, or you can download and execute it yourself.

From November or 2015 until now we have been having internal discussions at the CCFEA on the structure and content that generated 22 different revisions of this publication.

And now, after a year, the first version is out for public review. That means it’s now your turn. We should be submitting it to external peer review and publication over the next few weeks and we welcome your insights and ideas. This is the essence of our research, FRACTI: transparency, collaboration, data and method.

I don’t want to give any spoilers, you can skip all the formalization and the math and go straight to conclusions. But the ending might surprise you…

Enjoy!

I write this on the early AM of 11/9/2016 as I finish a bottle of wine, watch TV and try to come to terms with the fact that we are all going to have a pussy-grabber as the president of the United States of America.

Power, prestige and influence, arguably the highest office on earth… and a pussy-grabbing president. Strongly supported by American women, the minority group who should have felt deeply offended by the self proclaimed ’grab’em by the pussy’ candidate. Yes, 53% of American women voted for the pussy-grabber.

Girls, this is time for some deep soul searching. I am sure history will properly judge you, but on the mean term google self-value.

Given that premise, I feel entitled to use PUSSY-GRABBING as much as I want – even using capitals and bold for proper emphasis. When I first heard those lowly words spoken by a fellow man on national television I felt embarrassed. How can someone talk about women like that? But now hey, screw it, since you seem to cherish this treatment, I don’t care.

This was the pussy-grabbing campaign, driven by biases and pussy-grabbing journalism. Very important, this could never  have happened without them, prime time pussy-grabbing journalism. Well done. Kudos to them.

Pussy-grabbing journalism outlets are cash strapped, massive corporations catering specifically to information illiterate consumers.

Pussy-grabbing outlets desperately try to raise revenue streams by spinning massive amounts of false and irrelevant data, tailored to feed twisted biases of information illiterate individuals.

Information illiterate individuals by definition do not carry the proper tools to filter and evaluate all that noise. They are unable to associate cause and effect, or make informed decisions about what will influence his/her lives and that of their family and communities.

And here the disconnect: democratic institutions require strong, independent, educated, informed individuals to function. Without them, democracies simply collapse.

The result: the grab’em-by-the-pussy candidate won.

Welcome to the pussy-grabbing age. Enjoy it while it lasts, because next could be the end of civilization as we know it.

Yes. Throw away your dumbbells, fire your overpriced personal trainer and get a data powered dumbbell instead.

Have you ever imagined a scenario in which your training equipment would play the role of your personal trainer?

People regularly quantify how much of a particular activity they do, but they rarely quantify how well that same activity is performed. More often than not discerning the quality of a work out requires specialized supervision of a personal trainer.

This is actually what this whole analysis is all about. We predict how well people exercise based on data produced by accelerators attached to their belt, forearm, arm, and dumbbell.

The overall quality in which people exercise is given by the “classe” variable in the training set. Classe ‘A’ indicates an exercise performed correctly (all kudos to you, athlete). The other classes indicate common exercising mistakes.

All credits for data collection and original analysis go to the Human Activity Recognition – HAR laboratory, previously detailed in this paper. Credits for educational notes go to the Johns Hopkins School of Biostatistics and our valuable co-workers.

In a simple and quick fitting we were able to get very close to the weighted average of the baseline accuracy of 99.4%. Despite of the numerical proximity of the results, we can see the baseline is on the upper boundary of the confidence interval of this study.

We were limited in terms of computing resources and time (this analysis was performed beginning to end in about 3 hours). If we had more time we could try ensemble methods for classifications, specifically AdaBoost, but that would be beyond the intent and time allocated for this exercise.

If you want to check a more elaborate analysis you can either check the original paper or better yet, refer to a longer version of this study for details on each of the steps of the analysis:

  • First things first: obtaining the data, pre-processing and clean-up: downloading, caching raw data and clean up of the data generated by the electronic devices
  • “Raw” feature selection: selecting among the dozens of features which ones are relevant in order to classify an exercise as well or poorly executed. Since we will fit using an implicit feature selection model, this is a “raw” feature selection.
  • Data partitioning: a 75:25 split of the data set on training:testing for training, cross-validation and in/out sample testing
  • Data imputation: electronic devices generate lots of gaps in data. Imputation of NA values by K nearest neighbors imputation algorithm
  • Model fitting:
    • Feature selection: We use random forest for fitting, in which feature selection is performed explicitly (see below).
    • Training: training is performed through a random forest model over 6-folds of cross validation
    • Feature importance: Despite of the absence of automatic feature selection, the random forest classification algorithm does keep track of a ranking of how well each feature collaborate to the outcome on each class. We call this ranking ‘importance’.
  • Prediction and in/out sample measurements: We measure and track in and out sample error by means of comparison of a confusion matrix against the training partition and a testing partition centered and scaled around metrics of the training partition.

The final results: for this model, for an accuracy of 99.27% the out-sample error is 0.73%, with a confidence interval of 98.99% to 99.49%. Considering the little time and computational resources, really good.

This, like most R analysis, relies on a ‘literate programming’ paradigm, what basically means the output of this model is a final paper, or in this case a web page. You can find all details of each step, including real executable R code, there.

If you want to dig even deeper, understand details of the model, run it yourself, improve it and compare results, you can check-out and run it yourself from the github repository where it is graciously made available. You will need R and R Studio for that. The report with the executable analysis is also available.

Have fun!

Yes, Oracle. Sorry. The make believe situation is we have a huuuge table for Oracle and Oracle says we need to partition the table so we can query it. We need to allocate data to partitions in intervals of 7.

We decide to use range-interval partitions.

create table A (
   id number primary key,
   partition_id number not null
)
partition by range(partition_id) interval (7)
(partition FIRST values less than (0))
enable row movement
parallel;

The catch though is that we want to allocate the second partition starting on 5, not 1.

How do we do that?

Change the empty values less than clause to the value of the first item on the second partition, 5, the beginning of the second partition:

create table A (
   id number primary key,
   partition_id number not null
)
partition by range(partition_id) interval (7)
(partition FIRST values less than (5))
enable row movement
parallel;

How many partitions do we have?

select partition_name, high_value
from user_tab_partitions
where table_name = 'A' order by 1

What yields…

PARTITION_NAME                  HIGH_VALUE
--------------------------------------------
FIRST                           5
SYS_P586996                     12

Let’s test it. First, we add a few rows:

insert into A (id, partition_id) values (2,2);
insert into A (id, partition_id) values (3,3);
insert into A (id, partition_id) values (5,5);
insert into A (id, partition_id) values (10,10);

Where each row landed at?

First things first, the partition named FIRST…

select * from A partition (FIRST);
ID                PARTITION_ID
------------------------------
2                 2
3                 3

And what does the second partition holds?

select * from A partition (SYS_P586996);
ID                PARTITION_ID
------------------------------
2                 5
10                10

Exactly what we wanted. You can go back to your painful Oracle life.

The Bank of Finland has this thing they call “the simulator” – it is basically a platform for stress-testing of large financial institutions at a systemic level.

Basically the same thing we have here in the Federal Reserve, the difference is in Finland  it is not as opaque. The “simulator” is a computational platform, allowing scenario testing through plug and play quantitative models leveraging payment and liquidity data from large financial institutions. It is used by several nations outside of Finland as well.

They invited me for a talk about the paper introducing FRACTI we published about a year ago as part of our research at the CCFEA.

If you by any remote chance find yourself around Helsinki on 8/25 and 26, and want to get engaged in interesting conversations about modelling of crowd behavior, data-driven predictive models, larges-scale simulations and systemic risk, come join us… the first and second rounds of Karhu are on me…

Languages are more than communication. They are often one’s window to reality.

Your language shapes how you think, what you can achieve, and how you achieve it. There are languages that often facilitate concepts in a domain of knowledge, or make them more obscure. You might try to use the French language for philosophy, or German for poetry. Using them the other way around might make you write more being forcibly more verbose. Using the wrong language can even impede the expression of your ideas.

Language defines reality. This is the case not only with natural languages, but also with computer languages.

Like natural languages, computer languages often grow from needs of specialized domains, and therefore are better suited for use cases relevant for that specific domain. In the past computer languages were born and bred in a specific domain, frozen to requirements of that domain in that specific point in time. If requirements on that domain evolve in order to follow increasing complexity of the problems at hand, that language would no longer fit.

In modern times computer languages must be dynamic, quasi-living things that must be able to evolve and adapt to solve new classes of problems and new computing environments. Modern problems are different from what we had to deal with a few years back. You must have adequate tools and methods in order to approach them properly. In the same way new, computational environments change in face of new demands and new hardware technologies. Single to multiple cores, cloud, cluster, grid computing.

The way in which you describe to a binary being a way to resolve a problem plays a very special role. This role is tied to the concept of representability. The effectiveness of your representation is limited by features of your language, your familiarity with that specific domain knowledge and your experience, i.e. thinking patterns you have used when approaching previous problems on that domain.

“A good notation has a subtlety and suggestiveness which at times makes it almost seem like a live teacher.” Bertrand Russell The World of Mathematics (1956).

If you zoom closer into the specialized domain of our interest, computational finance, and look at problems we had to approach in the past, and patterns we used to resolve them, we can list a number of important features our language (and environment) will have to support:

  • Responsiveness: Deterministic response time is critical in common use cases in computational finance. As rash as this may sound, the fact that you can keep your response time under a few dozen microseconds 99.99% of the time is irrelevant when you took a few seconds to decide on what to do waiting on a garbage collection. Even if just happened once that time, you wiped out all your hard-gained profits of day.
  • Adequate representation of data structures: Plain old data structures have to be represented properly. It is hard to believe several of the widespread programming platforms still have problems properly representing data structures introduced on CS 101 curricula, cases like contiguous arrays and sparse vectors. In computational finance we care about very specific abstractions, like proper representation of time series and currencies.
  • Functional-vectorization friendly: Representation of data structures must be able to leverage the vectorial nature of modern computer architectures through lambda functors. Functional support is crucial.
  • Simplified concurrency through continuations: Continuations, or co-routines, are probably the simplest and most abstract way to leverage concurrency. You can leverage streams, vectors and parallelism using simple patterns. No shared state synchronization required.
  • Interactive: Support for interactive command line for preliminary brainstorming, prototyping and testing. Being able to record, share and story-tell a resolution of a problem is very important. The record must support rich representation – plots, tables, structured formatting, etc – the more, the better. Communication and collaboration are critical, and your representation cannot ignore that. The hardcore problems of our times cannot be solved without proper and organic collaboration. Your representation must be collaboration friendly.
  • Mini-representations: Notations matter in any representation. Domains have specific ways to represent concepts, and your representation has to be flexible enough to adhere to use cases of that domain. Mini-representations are used here in the same sense as mini-languages, also called little languages, or domain specific language (DSL), are used as ways to leverage a host language for meta-representation. In other ways, you could use a language to “override” its tokens and represent a language for appropriate for streaming, or behavior.

These are personal (and of course biased) and are limited by my own experiences of patterns that seem to work best when solving practical problems related to computational finance.

As our research goes on it seems like the major missing piece is a proper representation of financial models. Which “language” to properly represent financial models, across all use cases: risk, trading, simulation, back testing, and others. The search continues.

R has several packages for interaction with relational databases, one of them is RMySQL. You might notice during installation that RMySQL requires the presence of mysql binaries:

>install.packages("RMySQL")
package ‘RMySQL’ is available as a source package but not as a binary
Warning in install.packages :
package ‘RMySQL’ is not available (for R version 3.1.1)

The message is pretty straightforward: you need to compile the package against mysql headers and binaries to have it avalilable. You will need to go through a few simple steps to work around this:

Install homebrew:

Homebrew is the package manager that ‘install the stuff you need that apple didn’t”. Installation is done through a one liner ruby curl command.

itacoatiara jfaleiro$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
itacoatiara:~ jfaleiro$ which brew
/usr/local/bin/brew

You can see brew is now available.

Install mysql using homebrew:


itacoatiara jfaleiro$ brew install mysql

Start a mysql server locally

Start a mysql server process. You will need this for testing. Like any other process you might decide to install on your machines, make you you understand their vulnerabilities before anything else.

itacoatiara:~ jfaleiro$ mysql.server start
Starting MySQL
. SUCCESS!

Sane check your mysql client installation


itacoatiara:~ jfaleiro$ mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.21 Homebrew

(…)
mysql> exit

Install RMySQL as source


> install.packages("RMySQL", type = "source")

Load library


> library(RMySQL)

Enjoy

Now you can test your connection


> db <- dbConnect(MySQL(), user='genome')
> result <- dbGetQuery(db, 'show databases;'); dbDisconnect(db)
> result
db result result
Database
1 information_schema
2 test

This post was made with information made available by the human genome database and stack overflow.