Interrupted

Unordered thoughts about programming, engineering and dealing with the people in the process.

Computational Investment QSTK Framework From Python to Clojure

| Comments

Last week started the course on Computational Investing from Coursera and I’ve been taking a look.

What caught my attention is the libraries used for portfolio construction and management, QSTK, an opensource python framework, based on numpy, scipy, matplotlib, pandas, etc.

Looking at the first tutorial’s source code, saw it as an opportunity to migrate the tutorials and libraries to Clojure and get to play a little with Incanter.

I’m going to highlight what I’ve found interesting when migrating the tutorials. I’m assuming you have QSTK installed and the QS environment variable is set, since the code depends on that for data reading.

1
(def ^{:dynamic true} *QS* (get (System/getenv) "QS"))

NYSE operation dates

As part of the initialization process the tutorial calls a function getNYSEDays, which retrieves all the days there was trading at the NYSE. Migration is straightforward using incanter’s read-dataset to read file into memory and then filter the required range.

Pay attention to the time-of-day set at 16 hours, the time NYSE closes, we’ll see it again in unexpected places.

Data Access

QSTK provides a helper class called DataAccess used for reading and caching stock prices.

As you see here there’s some data reading happening, we’re gonna take a look at these functions since we’ll need to write them from scratch.

Data initialization in python tutorial
1
2
3
4
dataobj = da.DataAccess('Yahoo')
voldata = dataobj.get_data(timestamps, symbols, "volume",verbose=True)
close = dataobj.get_data(timestamps, symbols, "close",verbose=True)
actualclose = dataobj.get_data(timestamps, symbols, "actual_close",verbose=True)

We’re going to separate this in two functions, first reading symbol data from disk using again read-dataset and creating a hash-map indexed by symbol name.

Creating a symbols hash-map of incanter datasets
1
2
3
4
5
(defn read-symbols-data
  "Returns a hashmap of symbols/incanter datasets read from QS data directory"
  [source-in symbols]
  (let [data-dir (str *QS* "/QSData/" source-in "/")]
    (reduce #(assoc %1 %2 (incanter.io/read-dataset (str data-dir %2 ".csv") :header true)) {} symbols)))

Then if you take a look at voldata in a python repl, you can see pretty much what it’s doing

                       AAPL       GLD     GOOG        $SPX       XOM
 2012-05-01 16:00:00  21821400   7414800  2002300  2706893315  13816900
 2012-05-02 16:00:00  15263900   5632300  1611500  2634854740  11108700
 2012-05-03 16:00:00  13948200  13172000  1868000  2673299265   9998600

It’s grabbing the specified column volume or close from each symbol dataset, and it’s creating a new table with the resulting column renamed as the symbol.

All the get_data magic happens inside get_data_hardread, it’s an ugly piece of code making a lot of assumptions about column names, and even about market closing time. I guess you can only use this library for markets closing at 16 hours local time.

1
2
3
timemonth = int((timebase-timeyear*10000)/100)
timeday = int((timebase-timeyear*10000-timemonth*100))
timehour = 16

I’ve translated that into these two functions:

In this case Clojure shines, the original function is almost 300 lines of code. I’m missing a couple of checks but it’s not bad for a rookie, I think.

The helper function select-value is there in order to avoid an exception when trying to find stock data for a non existent date. Also the function returns :Date as a double since it’s easier to handle later for charting.

Charting

Charting with Incanter is straightforward, there a subtle difference with python since you need to add each series one by one. So what python is doing here charting multiple series at once

1
2
3
newtimestamps = close.index
pricedat = close.values # pull the 2D ndarray out of the pandas object
plt.plot(newtimestamps,pricedat)

We need a little function to solve it with Incanter. Each iteration gets reduced into the next with all the series accumulated in one chart.

creates multiple time-series at once
1
2
3
4
5
6
7
8
9
10
11
12
(defn multi-series-chart
  "Creates a xy-chart with multiple series extracted from column data
  as specified by series parameter"
  [{:keys [series title x-label y-label data]}]
  (let [chart (incanter.charts/time-series-plot :Date (first series)
                                                 :x-label x-label
                                                 :y-label y-label
                                                 :title title
                                                 :series-label (first series)
                                                 :legend true
                                                 :data data)]
  (reduce #(incanter.charts/add-lines %1 :Date %2 :series-label %2 :data data) chart (rest series))))

Data Mangling

Incanter has a lot of built-in functions and helpers to operate on your data, unfortunately I couldn’t use one of the many options for operating on a matrix, or even $=, since the data we’re processing has many nil values inside the dataset for dates the stock didn’t trade which raises an exception when treated as a number, which is what to-matrix does, tries to create an array of Doubles.

There’s one more downside and it’s we need to keep the :Date column as-is when operating on the dataset, so we need to remove it, operate, and add it later again, what happens to be a beautiful one-liner in python

This attempts a naive normalization dividing each row by the first one.
1
 normdat = pricedat/pricedat[0,:]
Or the daily return function.
1
dailyrets = (pricedat[1:,:]/pricedat[0:-1,:]) - 1

I ended up writing from scratch the iteration and function applying code.

Maybe there’s an easier way but I couldn’t think of it, if you know a better way please drop me a line!

Now normalization and daily-returns are at least manageable.

Normalization and Daily Returns
1
2
3
4
5
6
7
8
9
10
11
12
13
14
(defn normalize
  "Divide each row in a dataset by the first row"
  [ds]
  (let [first-row (vec (incanter.core/$ 0 [:not :Date] ds))]
    (apply-rows ds (/ first-row) 0 (fn [n m] (and (not-any? nil? [n m]) (> m 0))))))


(defn daily-rets
  "Daily returns"
  [data]
  (apply-rows data
            ((fn [n m] (- (/ n m) 1)) (vec (incanter.core/$ (- i 1) [:not :Date] data)))
            1
            (fn [n m] (and (not-any? nil? [n m]) (> m 0)))))

Having the helper functions done, running of the tutorial is almost declarative.

If you wanna take a look at the whole thing together here’s the gist, I may create a repo later.

Please remember NumPy is way much faster than Clojure since it links BLAS/Lapack libraries.

Follow me on twitter

Comments