Trading Social Media Sentiment Indicators

With the growing popularity of social media and its induction into investment and trading metrics, there is a growing interest in studying it’s ability to predict market movement. Here, we review prior research and attempt to answer the question of whether or not it can be a valid predictor of stock and currency returns.

In seminal research from University of Pennsylvania Professor Philip Tetlock, there is a distinguishing factor between sentiment and information given news or media release. New news, or information, such as earnings and/or economic developments, has permanent effects on stock returns. This is opposite to the so-called sentiment effects given news, where the media may just be repeating themselves with news that has already been put out in the public domain. This is what’s called a media effect, and it does still have an impression on stock returns; but only temporarily. The major point of his sentiment research is that negative news, which is new information, can cause greater risk-adjusted returns in stocks, up to one quarter after the release. The research also shows that stock returns were quickly reversed with so-called sentiment effects from news.

In this case, the focus is on social media quantified by the five-day raw score gathered from Social Market Analytics (SMA). The raw score is an unweighted, unbiased statistic of social media from Twitter and StockTwits. This is our proxy for the so-called Social Media Factor (SMF), which can filter real news and information, as well as what is pure sentiment. When the five-day raw score is positive, it will generate a buy signal, and when the five-day raw score is negative, it generates a sell signal. Because this requires some sophisticated text analytics via computer coding, the best way to replicate this SMF indicator is to either license the data through SMA, or to gather some end-of-day data from similar data vendors.

Data & Methodology
One possible way to emulate the five-day raw score is to simply compile data from Sentdex online and take a five-day moving average of its data on the spot euro.

Sentdex provides multiple sentiment indicators for different markets and trading instruments. The company, also in the explanation of the data, shows the difference between keyword usage and weighting versus a neuro-linguistic programming (NLP) method.

The data from SMA uses a proprietary NLP algorithm, which weights and synthesizes data from StockTwits and Twitter. This is important to understand because the source of the data is purely from social media. This would be our best proxy for sentiment with regard to a SMF, which is expounded in prevailing research. The Sentdex source on the other hand, will generate an indicator from news sources, which is also relevant, but more a proxy for the sentiment given news as described, and illustrated in “How it works” (page 65).

An alternative method is to test sample data from Accern for free via Quantopian. That company provides a sample set of news sentiment scores on the entire stock universe for the period of August 2012 to February 2014.

Proprietary Sentiment Indicator
One of the baseline indicators the Options Sentiment Indicator (OSI) is our noise trading indicator calculated and summed up using total open interest of underlying options. For this dataset, we use the Euro ETF (FXE). It differs from traditional put-call ratios in that it is able to explain all option buying and option writing activity; whereas, traditionally, a put-call ratio can only explain the buying activity, without accounting for positions that were sold to open.

The indicator generates a buy signal when there is net call buying and net put writing; effectively a synthetic long futures contract bought by the collective market. And, it generates a sell signal when there is net put buying and net call writing, equivalent to a synthetic short futures contract. The calculation can be seen later on in this article.

Backtested Strategy Results
“Keeping score” (below) summarizes the portfolio statistics for each sentiment trading strategy on the euro. The backtest period is from May 2014 to May 2016. Notice the five-day Raw Score indicator, it has the best risk-adjusted performance of all the data sets.

“Taking the next step” (below) shows the portfolio statistics when incorporating an added sentiment indicator based on whether the ETF’s Net Asset Value (NAV) is trading at a premium or discount. This is an additional sentiment trading indicator called the Closed-End-Fund Discount (CEFD) indicator, which is ideal for discounting sentiment as a trading tactic.

For the CEFD indicator to generate a buy signal, the NAV should be trading at a premium from the previous day’s average price; this confirms a buy signal for the CEFD indicator. When the NAV price is at a discount to the previous day’s average price, this creates a sell signal. The indicator’s formula is:

CEFD (%) = LN (Midpoint Price t-1÷ NAVt-1)

This indicator is contrarian, opposite the other two indicators, which are coinciding indicators. Notice in the results from “Taking the next step” that the risk-adjusted performance for all the data sets are pretty much the same, except for the Euro ETF, which exceeds the five-day raw score + CEFD indicator.

Differentiating Sentiment Indicators
Because the five-day raw score and OSI are non-correlated (-0.019 correlation) they are useful in gauging sentiment for the tactical trader.

One way to look at this is that the performance from your proxy for the SMF is your main sentiment equity line; any extra profitability coming from the coinciding OSI is simply noise, which should be a reason for profit taking. In other words, when backtesting and tracking the performance of both indicators, whenever the OSI touches or crosses over the five-day raw score, that means most of your profits from trading sentiment is noise. With that said, the sentiment trader should be ready to take profits when the OSI is outperforming the five-day raw score (Social Media Factor).

The five-day raw score indicator has a positive correlation (0.4) with the underlying FXE. The greater this number, the stronger tendency for prices to be correlated with sentiment. This also gives rise to a contrarian viewpoint. If raw score were to reach an extreme, chances are the prices have topped, with a near perfect correlation.

In this case, the correlation is moderately positive. Yet the correlation for OSI is slightly negative, making it ideal to implement as a coinciding—as opposed to a contrarian—indicator. With that said, we would expect the two indicators to repel each other anytime they were to come close, where the OSI shouldn’t be outperforming the five-day raw score with regard to sentiment, all things remaining equal.

In the flow chart, you will notice the expressions, “information given news” and “sentiment given news.” It’s important to note that the former can be construed as a leading indicator, where the latter is more of a lagging indicator. Sentiment given news does not produce sustainable excess returns and adds no real information that can be discounted; while sentiment given news is simply noise. Once again, it must be pointed out that the OSI is still able to profit from the noise, capturing profitable temporal price swings as a trading program.

OSI Summary
The OSI is an intraday indicator based on end-of-day data without momentum; each day’s buy or sell signal is valid for that day only. From the backtest results we see a winning percentage at or greater than 50% with the OSI; this is in line with previous statistical tests from all population datasets when testing the OSI. The options data for constructing the daily trade signals was compiled from Market Data Express, summarizing the total open interest from FXE.

When combining this indicator with the five-day raw score and CEFD indicator, the backtest results improved significantly with a winning percentage of 58%, albeit generating much fewer trade signals.

In short, a sentiment prediction model can profit from the noise, and when combined with a noise trading indicator such as the OSI, our proxy for the SMF shows improved results.