Trading Education Posted by Team Topstep January 30, 2020

How to Test Strategies and Become a Better Trader With Backtesting

What is Backtesting?

Backtesting is the testing of a trading strategy against historical data. Backtesting intends to test the statistical validity of a trading strategy. While the practice has various flaws and biases, it can provide you with additional confidence in your strategy and serve as a simple way to quickly test out any ideas about price behavior you may come up with.

Backtesting takes many forms, from the simple: testing a few indicator signals with simple stop-loss and profit-taking rules; to the complex: using order flow data to automate market making (basically what high-frequency traders do).

At its core, backtesting aims to quantify the historical expectancy of a trade signal. While possessing many flaws, backtesting is the first and most crucial step in identifying whether a strategy idea is worth pursuing further. Generally, if an idea returns an unfavorable backtest, further optimization, testing, and live trading won’t look much better.

Dr. Ernest Chan, a CTA and the author of several books on quantitative analysis, presented the typical backtesting workflow in a talk at QuantCon 2018. I think this graphic is an excellent way for beginners to grasp what backtesting is, even without understanding quant jargon.

Backtesting Workflow

Backtesting as a Discretionary Trader

The majority of funded traders are discretionary, meaning they might have rough mechanical criteria for placing traders, but several other qualitative factors play into market analysis.

Perhaps your overall trading style is based on market profile and auction market theory, but unquantifiable factors like sentiment, intuition, and forward-thinking invariably creep into your decision making. Maybe you see that everyone on FinTwit is super bullish, giving you a near-term short or neutral bias, which may affect where you look for setups. Things like these cannot be backtested or act as mechanical inputs in a system.

For this reason, many discretionary traders completely write-off backtesting as a practice only reserved for quants and those trying to sell you a trading system.

I think throwing out the baby with the bathwater here is a mistake. While backtesting will never act as an accurate representation of your results, testing your ideas about market behavior is crucial to progress as a trader. Sure, you may have found a couple of setups that work well for you, but markets change, and setups go in and out of favor. Trend following, which created countless millionaires throughout the 80s and 90s, has seen a decade of underperformance. Those trading a classic Turtle-style system without any adaptations or other strategies have been crushed.

Continuously testing new ideas and patterns, even if you don’t apply them mechanically, can tip you off to anomalies in market behavior that you can profit from. This is a practice that super-traders like Linda Raschke and Adam Grimes (both of whom we’ve profiled on the blog) are huge advocates of.

Even better is that they recommend manual backtesting by hand in favor of coding an automated system. Given that most discretionary traders aren’t programmers, this probably serves as a relief.

A Basic Example of Backtesting

Let’s build the most straightforward trading system possible to illustrate the concept of backtesting: a long-only moving average crossover system on the S&P 500.

Perhaps you’ve observed that the price of the S&P 500 acts positively when the 50-day moving average crosses above the 200-day moving average. Your intuition tells you that it’s a good trade, but backtesting is how you separate your gut-feelings and intuition from the data.

Portfolio Growth

Here is where so many questions arise, though. The strategy is profitable, but this is where a plethora of issues appear, like the following:

How do other options compare?
Does this additional risk justify itself through additional congruent returns?
How does it compare to the risk-free interest rate?
Did I introduce bias in my backtesting?
How do I know that I identified an actual repeatable pattern and not randomness?
Is my sample size of trades large enough to reach statistical significance?
How does this strategy compare to buying and holding SPY or a similar portfolio which requires fewer fees and work?

Kevin Davey’s Backtesting Cycle

In a Chat With Traders interview, Kevin Davey spoke about his strategy development process, and it basically looks like a backtesting cycle.

A quick backtest with the most straightforward possible criteria: off the bat, this tells you if there’s any validity to your idea
From there, you optimize the strategy through techniques like Walk Forward Optimization (WFO)
Random simulations like Monte Carlo Analysis to see how different trade sequences will affect the expectancy of the strategy
“The Shelf”: At this point, Davey watches the strategy in real-time without trading it. He periodically checks on it to see if it behaves the way he expects. If it does for some time, he begins to allocate capital to the strategy for live trading

Common Backtesting Mistakes

Overfitting

Overfitting is typically the first bias that backtesting newbies run into. It goes kind of like this:

Look for a basic trading strategy and test the results
They tweak the parameters a bit and observe better performance.
They continue tweaking until the Sharpe ratio and max drawdown are out-of-this-world good.
The live trading goes nothing like the backtest.

The mistake here is that you’re not optimizing against market conditions but the tendencies present in that specific dataset. You’ve tweaked the data so much that you’re now just finding (likely random) patterns in the historical data that have no real predictive value for future prices.

How to tell if you’re overfitting:

The strategy performs radically different with small changes in parameters. If all it takes is slightly shifting your lookback period to alter the returns dramatically, it’s likely you just found an anomaly in your data.
You can’t put into words what market tendency you’re taking advantage of. If you don’t know why your strategy works, you likely tested parameters until one showed excellent returns.
You’re training your strategy over the same set of data that you’re testing it on.

“The backtest is fit when the strategy is profiting from a signal, and overfit when the backtest is profiting from noise.” – Marcos Lopez de Prado

Not Using Out-of-Sample Data

Most algorithmic trading experts agree that it’s smart to have at least two sets of data to test on. The first is the set that you train. You test several parameters and strategies on the dataset, looking for tendencies. Once you’ve found what looks like a profitable system, you check the strategy on your second dataset.

The intention here is to avoid fitting your strategy to work on one set of historical prices.

Survivorship Bias

One of my favorite anecdotes to illustrate survivorship bias was during World War II when allied forces tried to reinforce the armor on damaged planes returning from the war.

Essentially, the military observed that the red zones in the diagram below took the most damage, so they decided to reinforce those areas on the planes. This is until a statistician pointed out that they were only studying the aircraft that survived the war and that studying destroyed planes would give more accurate data on the aircraft’s failure points.

Airplane

Survivorship bias is one of the biggest mistakes newer quants make, and one of the most common pitfalls relates to delisted stocks that were previously in indexes like the S&P 500.

According to Tucker Balch of Lucena Research, 58 stocks at one point in the S&P 500 index have been delisted since 2008. That means plenty of quants with bad data providers are looking at back-testing results that don’t include those delisted stocks, which is likely to alter the results of their backtests dramatically.

While this issue is more present among traders of individual stocks, it can still creep into futures trading, like in the analysis of Commodity Trading Advisor (CTAs) performance. One may observe that mean-reversion-focused funds are performing better than trend-following funds without accounting for the fact that several mean-reversion funds just blew up, leaving only the most skilled funds in existence, inflating the perceived returns of the strategy (hypothetical example).

Not Forward Testing

Randomness abounds in financial markets. No amount of data analysis will allow you to escape randomness. This is why even if you avoid all of the common backtesting pitfalls, you can still end up with a strategy that tests excellently but fails in live trading.

Kevin Davey, a winner of multiple trading championships, has a simple approach to forward-testing, which we briefly touched upon when we went over his “backtesting cycle.”

Once a strategy Kevin has developed has been qualified and successfully reached the end of his cycle, he’ll let it sit “on the shelf” for a while. He will periodically return to the strategy to see how the forward tests compare to the backtests. If the forward tests match up to the backtests with a large enough sample size, only then will he begin trading them live.

Small Sample Size

This is a bias I still observe in published materials by supposed experts adored by the trading community, speaking on stage at conferences. They’ll publish a trading system in a book with only 20 trades in their backtest. That’s hardly enough to determine the win/loss ratio, let alone determine if the system is repeatedly profitable.

The conventional wisdom is that a backtest needs at least 100 historical trades even to be considered valid. As the sample size gets larger, you can place a bit more confidence in metrics like maximum drawdown, as the larger the sample size, the lower the reversion from the mean.

The solution to a strategy with small sample size is to put the strategy on the proverbial “shelf,” as Kevin Davey calls it, see how it plays out in forward testing, and allow the sample size to grow.

Backtesting Platforms

I divide backtesting platforms into two camps, those that require programming knowledge and those that do not. The sky’s the limit for those that need programming, with your only limitations being your coding skills and the language itself.

A done-for-you system like FinViz Elite will have plenty of essential criteria, more than enough for most. Still, there may be an “if, then” statement you want to use, but the platform doesn’t support that combination of criteria.

Here are a few done-for-you backtesting platforms (meaning they have easy-to-use wizards or only require simple pseudocode).

Backtesting by hand
FinViz Elite
NinjaTrader
TradeStation
MultiCharts
Portfolio Visualizer

Here are some platforms that require more coding but have a higher ceiling in terms of what they’re capable of (some aren’t limited to data analysis in financial markets):

Stata
Amibroker
Matlab
Sierra Chart
A slew of Python libraries

Further Study Material

Quantitative Trading Fundamentals

I think Ernie Chan is the best at distilling the complexities of algorithmic trading down to simple-to-grasp paragraphs. He writes a blog epchan.blogspot.com and has penned a few quant trading classics like Quantitative Trading: How to Build Your Own Algorithmic Trading Business.

Kevin Davey, who has been mentioned in this article several times, is also an excellent resource. Davey has a blog, courses, a membership program, and several books. As a primer, I recommend his interview with Chat With Traders.

Sourcing Ideas

When you run out of ideas to test, you don’t really have anything to do. Personally, I keep a whiteboard next to my computer and quickly jot down any ideas I have. I find it works better than a Google Doc or iPhone note because it’s within my sight all the time. As you read the ideas of other traders, talk in chat rooms, or watch webinars, you’ll always be coming up with fun ideas to test.

However, sometimes, it’s good to “borrow” ideas from others. Here are a few places to source ideas:

Street Smarts: High Probability Short Term Trading Strategies by Linda Raschke and Larry Connors
- The book is old but supplies tons of ideas for you to test. The rules in the book alone probably aren’t profitable on their own, but they offer an excellent jumping-off point from which to tweak.
Academic papers
Futures.io
Reddit Algo Trading Subreddit

Final Thoughts

I think it’s fitting to end this post with this quote from Dimitris Melas, head of research at MSCI: “I’ve never seen a bad backtest.”

There’s an interesting Dunning-Kruger phenomenon at play in regards to backtesting. It’s usually those that are brand new to the concept that has the utmost confidence in it. They have yet to grasp the statistics concepts that make it a flawed practice and assume that if they can find a holy grail backtest, then they’ve basically found a blank check.

Dunning Kruger Effect

It’s typically the most experienced quants who distrust a backtest and see it more as a confirmation tool (confirm if the strategy is even worth the time spent on looking into it) than a validation tool.

Whether you plan to automate your strategies or use backtesting for research purposes, keep in mind that the human brain is imperfect and seeks pleasure. We will unconsciously go out of our way to create a favorable backtest because of the spike in dopamine it provides us. There’s no real way to eradicate the bias, so keep your BS-detector on at all times regarding backtesting your trading strategies.