Last week started the course on Computational Investing from Coursera and I’ve been taking a look.
What caught my attention is the libraries used for portfolio construction and management, QSTK, an opensource python framework, based on numpy, scipy, matplotlib, pandas, etc.
Looking at the first tutorial’s source code, saw it as an opportunity to migrate the tutorials and libraries to Clojure and get to play a little with Incanter.
I’m going to highlight what I’ve found interesting when migrating the tutorials. I’m assuming you have QSTK installed and the QS environment variable is set, since the code depends on that for data reading.
As part of the initialization process the tutorial calls a function getNYSEDays, which retrieves all the days there was trading at the NYSE. Migration is straightforward using incanter’s read-dataset to read file into memory and then filter the required range.
Pay attention to the time-of-day set at 16 hours, the time NYSE closes, we’ll see it again in unexpected places.
Data Access
QSTK provides a helper class called DataAccess used for reading and caching stock prices.
As you see here there’s some data reading happening, we’re gonna take a look at these functions since we’ll need to write them from scratch.
We’re going to separate this in two functions, first reading symbol data from disk using again read-dataset and creating a hash-map indexed by symbol name.
Creating a symbols hash-map of incanter datasets
12345
(defn read-symbols-data"Returns a hashmap of symbols/incanter datasets read from QS data directory"[source-insymbols](let [data-dir(str *QS*"/QSData/"source-in"/")](reduce #(assoc %1%2(incanter.io/read-dataset(str data-dir%2".csv"):headertrue)){}symbols)))
Then if you take a look at voldata in a python repl, you can see pretty much what it’s doing
It’s grabbing the specified column volume or close from each symbol dataset, and it’s creating a new table with the resulting column renamed as the symbol.
All the get_data magic happens inside get_data_hardread, it’s an ugly piece of code making a lot of assumptions about column names, and even about market closing time. I guess you can only use this library for markets closing at 16 hours local time.
In this case Clojure shines, the original function is almost 300 lines of code. I’m missing a couple of checks but it’s not bad for a rookie, I think.
The helper function select-value is there in order to avoid an exception when trying to find stock data for a non existent date. Also the function returns :Date as a double since it’s easier to handle later for charting.
Charting
Charting with Incanter is straightforward, there a subtle difference with python since you need to add each series one by one. So what python is doing here charting multiple series at once
123
newtimestamps=close.indexpricedat=close.values# pull the 2D ndarray out of the pandas objectplt.plot(newtimestamps,pricedat)
We need a little function to solve it with Incanter. Each iteration gets reduced into the next with all the series accumulated in one chart.
creates multiple time-series at once
123456789101112
(defn multi-series-chart"Creates a xy-chart with multiple series extracted from column data as specified by series parameter"[{:keys[seriestitlex-labely-labeldata]}](let [chart(incanter.charts/time-series-plot:Date(first series):x-labelx-label:y-labely-label:titletitle:series-label(first series):legendtrue:datadata)](reduce #(incanter.charts/add-lines%1:Date%2:series-label%2:datadata)chart(rest series))))
Data Mangling
Incanter has a lot of built-in functions and helpers to operate on your data, unfortunately I couldn’t use one of the many options for operating
on a matrix, or even $=, since the data we’re processing has many nil values inside the dataset for dates the stock didn’t trade which raises an exception when
treated as a number, which is what to-matrix does, tries to create an array of Doubles.
There’s one more downside and it’s we need to keep the :Date column as-is when operating on the dataset, so we need to remove it, operate, and add it later again, what happens to be a beautiful one-liner in python
This attempts a naive normalization dividing each row by the first one.
1
normdat=pricedat/pricedat[0,:]
Or the daily return function.
1
dailyrets=(pricedat[1:,:]/pricedat[0:-1,:])-1
I ended up writing from scratch the iteration and function applying code.
Maybe there’s an easier way but I couldn’t think of it, if you know a better way please drop me a line!
Now normalization and daily-returns are at least manageable.
Normalization and Daily Returns
1234567891011121314
(defn normalize"Divide each row in a dataset by the first row"[ds](let [first-row(vec(incanter.core/$0[:not:Date]ds))](apply-rowsds(/ first-row)0(fn [nm](and (not-any? nil? [nm])(> m0))))))(defn daily-rets"Daily returns"[data](apply-rowsdata((fn [nm](- (/ nm)1))(vec(incanter.core/$(- i1)[:not:Date]data)))1(fn [nm](and (not-any? nil? [nm])(> m0)))))
Having the helper functions done, running of the tutorial is almost declarative.
If you wanna take a look at the whole thing together here’s the gist, I may create a repo later.
Please remember NumPy is way much faster than Clojure since it links BLAS/Lapack libraries.