Performance Measures for Trading Systems

In autumn of 2008 it became clear that the “hidden risks” which Taleb and others talk about since years are much more then academic rumblings: These risks are a reality and they can impact the real world severely.
These hidden risks are related to probability distributions with “fat tails”, serial correlation, positive feedback loops and other signs of nonliear behaviour of complex systems.
This review of existing performance measures tries to show how the various existing performance measures deal with these effects.
It also sets a reference frame for our new performance measure called SysQ (System Quality).

Existing Performance Measures

There exist many measures which try to estimate the benefits or quality of a trading strategy. We consider here only those measures which don’t need any knowledge about the underlying trading strategy, such as individual trades or positions. All measures on this page work form an equity curve, the daily, weekly or monthly values of total account value. Such a measure can easily be applied to a benchmark or an index.

Performance Measures

It is relatively easy to measure the “upside” of a trading strategy. It’s simply the profit. It is important to have a measure independent from starting capital and length of history curve however, so some scaling is used.

Annualized Compounded Returns

This is the profit of a trading strategy, expressed as a percentage and annualized. Also known as Compound Average Growth Rate (CAGR) expressed as a percentage (CAGR%) or as average annual total return (geometric)

CAGR on Wikipedia

Risk Measures

Risk is the “downside” of a trading strategy. Other than with profits it is not very clear how the “riskiness” of a strategy can be quantified.

Max. Drawdown

Maximum drawdown is easy to calculate precisely. It tells us something about the exact history of a system’s equity curve. But this number is not very stable. Usually the max. drawdown depends on the exact sequence of a small number of trades. If you remove a single symbol from your portfolio or change the parameters of your strategy ever so slightly a big change in max. drawdown can occur. Because the max. drawdown depends on such a small set of numbers it also does not tell us much about the future. Usually future drwawdowns are worse than past drawdowns.
Furthermore the drawdown observed depends on the length of your backtest/simulation/data series. With 5 years of data you expect to see a P80 drawdown, with 10 years of data a P90 drawdown and so forth.

Drawdown on Wikipedia

Std.Deviation of Returns

Standard Deviation (SD) is a well established measure in statistics, it is defined and works well for normally distributed values. Unfortunated trading returns are not normally distributed and strictly speaking a SD of such values is not defined. The good thing about SD is that it takes all values into account, it results in a stable measure. The result is somewhat “abstract” though because it does not relate directly to the experienced behaviour of a trading system.

Standard Deviation on Wikipedia

Ulcer Index

Root mean square (RMS) of all drawdown values. Contains much more information than the max. drawdown alone. Captures some of the non-linearities of a trading system.
Ulcer Index on Wikipedia
Peter Martin’s Ulcer Index page

Value at Risk (VaR)

VaR is both easy to misunderstand, and dangerous when misunderstood. Mr. Einhorn compared VaR to “an airbag that works all the time, except when you have a car accident.”
VaR on Wikipedia

Risk/Reward Ratios

While profit and risk tell us something about a system, each measure taken alone is not too interesting. If you change your position size or leverage, both profit and risk will change also. For this reason it is helpful to concentrate on Risk/Reward ratios in assessing the relative merits of a trading system.

Calmar Ratio / Sterling Ratio / MAR Ratio / SOL Quotient

This is simply annualized return divided by max. Drawdown. All the problems of max. drawdown are present in these ratios also. The original definition of Calmar Ratio uses three years of data. The MAR Ratio is very similar it uses all available data. The Sterling Ratio is also

APR / MaxDD or APR/(MaxDD+10%)

Sterling Ratio on Wikipedia
Calmar Ratio on Wikipedia

Sharpe Ratio

Invented in 1966 by William Forsyth Sharpe (who later won the Nobel Memorial Prize) the Sharpe Ratio is widely used to measure the risk reward ratio of investments and trading strategies.

This is the well accepted standard. Everybody states Sharpe Ratio for his system, so it is a good number to compare systems. But it is not very clearly defined. The formula contains the “risk-free-rate”, but everybody seems to use another value for this. Also this component makes the Sharpe Ratio change when position sizes are changed, despite a mere change in position size does not change the overall quality of a system at all!

Investopedia says:

The returns measured can be of any frequency (i.e. daily, weekly, monthly or annually), as long as they are normally distributed, as the returns can always be annualized. Herein lies the underlying weakness of the ratio – not all asset returns are normally distributed. Abnormalities like kurtosisfatter tails and higher peaks, or link skewness on the distribution can be a problematic for the ratio, as standard deviation doesn’t have the same effectiveness when these problems exist. Sometimes it can be downright dangerous to use this formula when returns are not normally distributed.

The biggest problem with Sharpe Ratio is the way it is usually calculated: From monthly returns. That way only a relatively small set of numbers go into the calculation. If a drawdown happens to be in the middle of a month it is not at all reflected by this version of the Sharpe Ratio
Sharpe Ratio on Wikipedia
The Sharpe Ratio by W.F.Sharpe

Sortino Ratio

Following the psychology of traders this ratio takes only downward movements into account, because they “hurt” more than upward movements. While this intuitively may make sense, in fact half of the available information is ignored. If a strategy shows sharp upward movements this is the same sign for high risk as are downward movements.
link http:=”””” wiki=”” sortino_ratio=”” _blank=”” external-link-new-window=”” “”opens=”” external=”” link=”” in=”” new=”” window””=””>Sortino Ratio on Wikipedia

Ulcer Performance Index (UPI)

This one uses the Ulcer Index (see above) to create a Sharpe Ratio like number. Shares all the benefits of the Ulcer Index.

UPI on Wikipedia

Modigliani Risk-Adjusted Performance or M2

The Modigliani Risk-Adjusted Performance is derived from the widely used Sharpe ratio, but in units of percent return (as opposed to the Sharpe ratio – a dimensionless ratio), which makes it probably more intuitive to interpret.

M2 on Wikipedia

Risk adjusted return on capital (RAROC)

RAROC is Expected Return / VaR. Because it uses Var (see above) it has all its problems.

RAROC on Wikipedia


The following table summarizes our findings:

Name Fat Tails Serial Corr. Predictive Power
CAGR Yes Yes Good
Calmar/Sterling Ratio, MAR Yes Yes Very Bad
Drawdown Yes Yes Very Bad
Modigliani M2
Robust Sharpe Ratio No No Medium
Sharpe Ratio No No Good
Sortino Ratio No No Medium
Standard Deviation of Returns No No Good
Ulcer Index Yes Yes Good
UPI Yes Yes Good
VaR No No Medium

Further Information

Popularity of Performance Measures