During the past year we have written several stories on using machine learning for trading. Developing trading programs using artificial intelligent is not easy, not only because it’s hard work to develop something predictive but also because of limited tools inside the best trading platforms for machine learning.
Other programming languages have tools for time series forecasting and machine learning. The top two are R language and Python. R does not have a good interface for developing and testing trading systems the way traders would like, even though there are many libraries that attempt to allow backtesting of trading systems and charting of market data. Python backtesting is somewhat better, it also has many Machine Learning Libraries, but it does not have as good statistical forecasting libraries like R has.
One solution to this is that several trading platforms have added interface to R, native to the language, so you could always pass information through files. One such company with a native R interface is InfoReach platform. Another one is TradersStudio Turbo 2017 release. This is available using an add-in.
In the next few articles we will show you how R can be used to develop machine learning and modeling trading technologies, and use them to backtest and trade trading systems.
The end of the average trader
In 2015 Ray Dalio’s $165 billion hedge fund, Bridgewater Associates, started an artificial-intelligence unit with six people. This team is led by David Ferrucci, who joined Bridgewater at the end of 2012 after leading the International Business Machines (IBM) engineers who developed Watson, the computer that beat human players on “Jeopardy.” This new unit at Bridgewater will have deep pockets to apply prediction technology to the markets, which adapt to changing market conditions; it’s a new paradigm that could mean the end of the average trader.
Developing adaptive systems and machine learning for trading systems is very expensive for two reasons. First, development of the machine learning method code is needed, such as with neural network algorithms, wavelet algorithms and deep learning algorithms. This software takes high level math and programming skills to write. Developing a plug to Excel or TradeStation with an algorithm from academic papers could take several months to develop using world class minds. The algorithm alone could cost up to $100,000 to develop and it might not work for you. Ideally, you have a collection of dozens of tools that would cost millions. This approach for large hedge funds like Bridgewater is ideal because it allows for better integration between the learning algorithms and the trading signals.
Quantitative investment firms, including $24 billion Two Sigma Investments and $25 billion Renaissance Technologies, are increasingly hiring programmers and engineers to expand their artificial-intelligence staffs. Machine learning gives hedge funds a competitive advantage in markets where trading has been handicapped by rich asset prices, according to Gustavo Dolfino, CEO of recruitment firm WhiteRock Group. Dolfino says, “Machine learning is the new wave of investing for the next 20 years and the smart players are focusing on it.”
The other expensive skill set needed is the knowledge engineer with market and trading knowledge. They need to use the tools and understand the algorithms but not to the level of the developers who developed the code.
The cost of building algorithms slowed down research. This is why R has become so valuable. R has thousands of libraries built with open source by a community of developers. If a firm had to develop these libraries it would cost tens of millions of dollars, but using R they can tap into these and only have to worry about the knowledge engineering and system development.
The R Programming Language is now one of the hottest programming languages for machine learning and trading.
R offers libraries from machine learning, statistical analysis, fractal and wavelet analysis, natural language processing and more. R has a learning curve but when you consider the advanced tools in terms of modeling it can do, it’s worth it. R backtesting is very primitive. Because of this, R’s best use for traders is to either do analysis which help with system development or output. Output produced by your market studies performed in R can be used as input data into a strategy. Some trading platforms can use R-Scripts. We currently have R available for TradersStudio as an add-in. At the time of this writing it’s in beta with a release sometime in 2017.
These are free sources for learning R. There are also many paid courses for learning R and nicely priced services with lots of good courses on R and machine learning in DataCamp.
Tools of the traders for all
Now everyone can have tools of the trade used by the top hedge funds using R and also to a point Python, although R has the best interface to outside programs that will allow better integration between trading strategies and the intelligence. These tools range from state-of-the-art time series analysis tools like Arima/Garch Hybrid models to wavelets to machine learning methods to a host of neural network algorithms, rule induction and evolutionary algorithms from Genetic Programming to Swarm technology. In addition are things like Chaos theory and game theory modeling tools, and hidden Markov Models.
Even advanced time series modeling tools can take trading where it has not been before. TradersStudio has developed a Hybrid Arima/Garch system. This model predicts the S&P 500 close for the following day minus the present day’s close. This is fine because futures trade after 4 p.m. and the SPY is liquid until about 6 p.m., so it’s possible to get signal on the close and place a trade a minute or two later. The raw predictions earned profits on more than 2,800 points, proving this type of technology is important and valuable for traders. Here’s how machine learning methods which builds a recursive tree can be used in trading.
Money grows on trees
Recursive partitioning is a statistical method for multivariable analysis. Recursive partitioning creates a decision tree that strives to correctly classify members of the population by splitting it into sub-populations based on several dichotomous independent variables. The process is termed recursive because each sub-population may in turn be split an indefinite number of times.
Recursive partitioning methods have been developed since the 1980’s. Well-known methods of recursive partitioning include Ross Quinlan’s ID3 algorithm and its successors, C4.5 and C5.0, and Classification and Regression Trees. Ensemble learning methods such as Random Forests help to overcome a common criticism of these methods -- their vulnerability to overfitting of the data -- by employing different algorithms and combining their output in some way.
A variation is Cox linear recursive partitioning. These methods can be used to pick stocks based on fundamental bases and judge if a stock is overvalued, correctly valued or undervalued (we will only use stocks that are undervalued).
These methods can also be used to predict market returns over the next period in categorical variables such as Up Big, Up, Flat, Down and Down Big. Let’s take a look at a simple example. The goal is not to give the holy grail but to show you how you can start building these models yourself. R is a great language for preprocessing data, but it requires a deeper understanding of R data types. For example, some libraries used Matrixes, Data Frames while others use xts which is based on zoo library and you need to pass the correct type. There are functions to do this conversion on the fly. Another option if you are not a R expert is to do your preprocess in TradersStudio or TradeStation since we feel more confirmable doing the data processing in those platforms and know how to call all the indicators we need as well as write any new ones we think of. We will create a CSV file which we will load into R and test these tree algorithms.
We will make life easy and use the TradersStudio print terminal to output the data. We will then save the print terminal to a file. We will do our analysis on weekly S&P 500 data. We will use S&P 500 earnings, Dow Transports and S&P Dividends in addition to the S&P 500 price data. Our goal is to predict the change in the S&P 500 one week into the future. We will do this by means of regression Tree using the RPart Library in R. Originally we predicted the direction of the S&P 500 using a series of inputs. We failed badly. The tree could resolve any splits which add information to splitting up the output class. The key in developing these tree algorithms is to define the predictive variable smartly.
The problem is noise. Predicting the one-day direction is very noisy and we can have whipsaw trades, which would cost extra slippage and commissions. We really don’t want to change positions when the market might move slightly against us and we also want to filter out noise. These factors are counter balanced with profit. This way we should test our target as if we knew it perfectly to see how profitable it would have been. We chose to use the following target.
If (ForClose-Average(ForClose,3,0)>=0) Or (ForClose-Average(ForClose,3,0)>=-.1*Average(ForRange,3,0) And Perfect=1) Then Perfect=1
If (ForClose-Average(ForClose,3,0)<0) Or (ForClose-Average(ForClose,3,0)<=.1*Average(ForRange,3,0) And Perfect=-1) Then Perfect=-1
Let ForClose be price series for the close shifted one bar into the future. You can see that we don’t change positions where price does not move more than 10% of the range against you one bar in the future.
Because of the noise reduction this target is much easier to predict and still makes crazy money if we could predict perfectly. Backtesting from Jan. 10, 1980 to Dec. 9, 2016, without slippage and commission we make more than 23,000 points if we knew this target just right. Knowing price change perfectly one bar in the future makes more than 26,700 points, about 12% more but is much harder to predict. We will use my new target for the rest of this analysis.
We used, S&P 500 weekly data, S&P 500 Earnings, S&P 500 Dividends and DJ Transports for our model. We named this file DataWkPred.csv. We will then use it and the RPart R package to implement a recursive tree. We will use 80% of our data for training and 20% for testing, out of sample data. (Both codes are available online).
This code outputs a lot of information. We will study the more interesting and useful information from this output (see “Explaining the output,” below).
Complexity Parameter explains the amount by which the relative error will change if a split is made. Number of splits shows that the number of splits made is zero and hence the total number of nodes is 0+1 = 1, which is the parent node; nsplit = 1 indicates that one split has been made, i.e. the parent node has been divided and now the total nodes become 1+1 =2. Relative error across the splits is measured relative to the parent node. The parent node is considered to have the error 1.000, assuming the highest number of misclassifications.
The table shows that nsplit=0 which indicates the parent node. The relative error in this case is 1.0. The value of the complexity parameter is 0.18969849, which indicates that 1 split is made (nsplit = 1). The relative error changes to 1-0.18969848 = 0.8103015, which is evident from the second line. The output is shown up to where the value of complexity parameter is very small. In this case at nsplit = 4 because there is only a marginal reduction in error. Ten-fold cross-validation error (xerror) is the whole dataset into 10 samples. Each of the sample is then divided into test and training in a ratio of 90:10. Then the model is trained and the error is calculated and averaged out over the samples.
In this case error is calculated at each split. It is also measured relative to the parent node. Standard error is generally used to obtain the optimal number of splits where the standard error is minimum and 1-SE is maximum.
This procedure is followed because initially the standard error decreases with the number of splits but after a particular split the Standard Error increases. This split value can be considered as the optimum value of split. However, there is another rule of thumb used to determine the optimum number of splits and prune the tree, which is relative error + SE < xerror.
It can be seen from the above examples that according to the ‘1-SE’ rule the optimum number of split is 2 because SE value at nsplit=3 is higher as compared to the SE value at SE=2.
The rule of thumb, the optimum number of split should be taken as 3 because at this stage, relative error + SE is less than xerror. Variable importance is outputted when we run that R code for the recursive tree.
RSI_Z PriceMA DJTrans_Osc Close.Open SP_Osc 35 23 17 13 13
We also have a lot of information about how the nodes are split and using what critical values. This information tells us how much each split helps with the separation of the output class and at the leaf level it gives us the accuracy of the system. Each variable can be used multiple times in the same tree. We also produce an analysis of the output distribution of our data. For example 57% of all cases are for output class “buy”. It also creates an output which can be translated into a tree (see “Money tree,” below).
The output of the code also give us all type of information about how predictive each split and how much information is gained. This code uses rpart.plot function which outputs a nice pretty tree as, with more detail (see “Trimming the tree,” below).
Pruning is a technique used to reduce the size of a decision tree and retain the important contributions in classifying instances. Too large a tree can cause overfitting of data and poorly classify new observation. Too small then it might not contain all the structural information. The code in R, includes pruning code. The prune code will remove two leaves from 7 to 5 and improve the tree robustness.
A Confusion Matrix or error matrix is a table that categorizes predictions according to whether they match actual values. One of the table’s directions indicates the possible categories of predicted values and the other includes categories for other values. When the predicted value matches the original value it is called the correct classification. This falls on the diagonal in the confusion matrix. The off diagonal elements indicate the incorrect predictions (see “Prediction statistics,” below).
For the Class Buy: Correct classifications are (True Positive Rate) = 174/205 = 0.8487. This is also known as sensitivity. Incorrect Classification is known as False Positive Rate = 71/158 = 0.4493
For the Class Sell: Correct classifications (True Negative Rate) = 87/158 = 0.5507. This is also known as specificity. Incorrect Classification is known as False Negative Rate = 31/205 = 0.1513.
Machine Learning in Trading is not simply the hot new thing, but an essential tool for traders to be successful in today’s markets. Here, we showed how we can use recursive trees, in future installments we will cover methods which will help you keep your edge in trading the markets.