In autumn of 2008 it became clear that the “hidden risks” which Taleb and others talk about since years are much more then academic rumblings: These risks are a reality and they can impact the real world severely.
These hidden risks are related to probability distributions with “fat tails”, serial correlation, positive feedback loops and other signs of nonliear behaviour of complex systems.
This review of existing performance measures tries to show how the various existing performance measures deal with these effects.
It also sets a reference frame for our new performance measure called SysQ (System Quality).
Existing Performance Measures
There exist many measures which try to estimate the benefits or quality of a trading strategy. We consider here only those measures which don’t need any knowledge about the underlying trading strategy, such as individual trades or positions. All measures on this page work form an equity curve, the daily, weekly or monthly values of total account value. Such a measure can easily be applied to a benchmark or an index.
It is relatively easy to measure the “upside” of a trading strategy. It’s simply the profit. It is important to have a measure independent from starting capital and length of history curve however, so some scaling is used.
Annualized Compounded Returns
This is the profit of a trading strategy, expressed as a percentage and annualized. Also known as Compound Average Growth Rate (CAGR) expressed as a percentage (CAGR%) or as average annual total return (geometric)
Risk is the “downside” of a trading strategy. Other than with profits it is not very clear how the “riskiness” of a strategy can be quantified.
Maximum drawdown is easy to calculate precisely. It tells us something about the exact history of a system’s equity curve. But this number is not very stable. Usually the max. drawdown depends on the exact sequence of a small number of trades. If you remove a single symbol from your portfolio or change the parameters of your strategy ever so slightly a big change in max. drawdown can occur. Because the max. drawdown depends on such a small set of numbers it also does not tell us much about the future. Usually future drwawdowns are worse than past drawdowns.
Furthermore the drawdown observed depends on the length of your backtest/simulation/data series. With 5 years of data you expect to see a P80 drawdown, with 10 years of data a P90 drawdown and so forth.
Std.Deviation of Returns
Standard Deviation (SD) is a well established measure in statistics, it is defined and works well for normally distributed values. Unfortunated trading returns are not normally distributed and strictly speaking a SD of such values is not defined. The good thing about SD is that it takes all values into account, it results in a stable measure. The result is somewhat “abstract” though because it does not relate directly to the experienced behaviour of a trading system.
Root mean square (RMS) of all drawdown values. Contains much more information than the max. drawdown alone. Captures some of the non-linearities of a trading system.
Ulcer Index on Wikipedia
Peter Martin’s Ulcer Index page
Value at Risk (VaR)
VaR is both easy to misunderstand, and dangerous when misunderstood. Mr. Einhorn compared VaR to “an airbag that works all the time, except when you have a car accident.”
VaR on Wikipedia
While profit and risk tell us something about a system, each measure taken alone is not too interesting. If you change your position size or leverage, both profit and risk will change also. For this reason it is helpful to concentrate on Risk/Reward ratios in assessing the relative merits of a trading system.
Calmar Ratio / Sterling Ratio / MAR Ratio / SOL Quotient
This is simply annualized return divided by max. Drawdown. All the problems of max. drawdown are present in these ratios also. The original definition of Calmar Ratio uses three years of data. The MAR Ratio is very similar it uses all available data. The Sterling Ratio is also
APR / MaxDD or APR/(MaxDD+10%)
This is the well accepted standard. Everybody states Sharpe Ratio for his system, so it is a good number to compare systems. But it is not very clearly defined. The formula contains the “risk-free-rate”, but everybody seems to use another value for this. Also this component makes the Sharpe Ratio change when position sizes are changed, despite a mere change in position size does not change the overall quality of a system at all!
The returns measured can be of any frequency (i.e. daily, weekly, monthly or annually), as long as they are normally distributed, as the returns can always be annualized. Herein lies the underlying weakness of the ratio – not all asset returns are normally distributed. Abnormalities like kurtosis, fatter tails and higher peaks, or link skewness on the distribution can be a problematic for the ratio, as standard deviation doesn’t have the same effectiveness when these problems exist. Sometimes it can be downright dangerous to use this formula when returns are not normally distributed.
The biggest problem with Sharpe Ratio is the way it is usually calculated: From monthly returns. That way only a relatively small set of numbers go into the calculation. If a drawdown happens to be in the middle of a month it is not at all reflected by this version of the Sharpe Ratio
Sharpe Ratio on Wikipedia
The Sharpe Ratio by W.F.Sharpe
Following the psychology of traders this ratio takes only downward movements into account, because they “hurt” more than upward movements. While this intuitively may make sense, in fact half of the available information is ignored. If a strategy shows sharp upward movements this is the same sign for high risk as are downward movements.
link http:=”” en.wikipedia.org=”” wiki=”” sortino_ratio=”” _blank=”” external-link-new-window=”” “”opens=”” external=”” link=”” in=”” new=”” window””=””>Sortino Ratio on Wikipedia
Ulcer Performance Index (UPI)
This one uses the Ulcer Index (see above) to create a Sharpe Ratio like number. Shares all the benefits of the Ulcer Index.
Modigliani Risk-Adjusted Performance or M2
The Modigliani Risk-Adjusted Performance is derived from the widely used Sharpe ratio, but in units of percent return (as opposed to the Sharpe ratio – a dimensionless ratio), which makes it probably more intuitive to interpret.
Risk adjusted return on capital (RAROC)
RAROC is Expected Return / VaR. Because it uses Var (see above) it has all its problems.
The following table summarizes our findings:
|Name||Fat Tails||Serial Corr.||Predictive Power|
|Calmar/Sterling Ratio, MAR||Yes||Yes||Very Bad|
|Robust Sharpe Ratio||No||No||Medium|
|Standard Deviation of Returns||No||No||Good|