AI/ML Risk Modelling Stress Testing

The Case for Monte Carlo Simulations

When narrative scenarios first became a standard tool for risk management, around the time of SCAP in 2009, I was frankly skeptical that the technique would last. Having emerged from academia, I was used to more rigorous methods and had spent years coding up very detailed and complex Monte Carlo experiments. I always thought that simulation methods, aided by advances in computing power, would eventually replace narrative scenarios for stress testing.

It hasn’t happened. Today, while simulation tools are routinely used in the insurance industry, they are far less commonly applied in banking.

Over the years, I have come to appreciate the strengths of narrative scenarios, but my earlier concerns remain. In this column, I want to explain these issues and discuss why Monte Carlo methods may provide a useful augmentation.

Tony Hughes Headshot
Tony Hughes

The main problem with narrative, scenario-based forecasting is that, even with the benefit of hindsight, you can’t tell whether you’re doing it very well. Indeed, since stress scenarios never actually come true, it’s impossible to determine the accuracy of previously-constructed forecasts.

After more than a decade of practice, therefore, we do not know which teams are producing the best scenario analytics and we do not know who’s consistently getting it right or wrong. You could argue that scenarios work because no major banks have failed in the past 10 years, but this is a very blunt way to determine the effectiveness of a particular statistical technique.

Though I didn’t recognize this 10 years ago, the strength of scenarios is primarily as a pedagogical device. They give us a common language with which to talk about downside risk and they allow non-technical industry participants to communicate with the quants.

It’s obviously difficult to place a value on these benefits, but they are substantial. Scenario analysis is worthwhile, but we definitely shouldn’t have all our eggs in a single basket – especially one whose wickerwork cannot be properly inspected.

Assessing the Value of Additional Scenarios

The most critical scenario considered by most banks follows a deep, generic recession narrative. This scenario is used in most jurisdictions to aid in the determination of regulatory capital, which is clearly an important calculation for any bank.

Since the global financial crisis (GFC), we have seen a substantial increase in capital for banks, which will provide greater cushion against a wider range of possible future downside risks. It should be noted, though, that there are many other ways (outside of scenario-based stress testing) that capital levels could have been calculated – and, thus, increased – relative to pre-GFC levels.

The law of diminishing returns certainly applies to scenarios. So, the question is, once we’ve run the severely adverse scenario and capital has been determined, how much additional impact do subsequent scenarios provide?

Given that the narrative-driven forecasts cannot be rigorously analyzed, our ability to draw subtle distinctions between adjacent narratives must be viewed as highly questionable. We cannot construct confidence intervals, so there are no reliable tests of whether any two scenarios are statistically distinct. One could perhaps argue that one or two additional narratives offer some qualitative flavor, provided they are quite distinct from the primary scenario.

Beyond this, it’s difficult to see how these subsequent scenarios will have very much impact on the behavior of the bank, the nature of its portfolio or its future financial performance.

If the additional scenarios are cheap to produce, it may be tempting to keep churning them out, even as the marginal benefit wanes. Access to many scenarios may, however, simply increase levels of confusion for business leaders, undermining any pedagogical benefits provided by the analysis.

Monte Carlo or Bust?

Clearly, there are downsides to the narrative-scenarios approach, but what advantages do Monte Carlo simulations offer – and should they be used to augment traditional stress testing?

In comparison to narrative scenarios, simulations offer significant changes in both style of analysis and the language we use to communicate. Instead of agonizing over individual variable paths (a la traditional stress tests), simulations develop an engine that rapidly populates the entire distribution of possible future outcomes. Moreover, since the deliverable is in the form of a single distribution, the problem of diminishing returns no longer applies.

Rather than making point forecasts for assessing portfolio performance, simulations instead describe the outcomes using probabilities. The use of distributions and probabilities means, on the possible downside, that we instantly lose a lot of non-technical audience members from the conversation.

However, as long as the simulations are well constructed, we still reap several technical benefits. Most notably, simulations give us a better sense of the uncertainty inherent in our estimates, while also allowing us to better explore interconnections in the portfolio – across a wide variety of layered risks.

Complicating matters a bit, the development of useful simulations is extremely challenging. In the context of financial risk, attention will invariably focus on the shape of the downside tail; we know the distribution of economic and financial outcomes will probably be leptokurtic, but we have only a small amount of data with which to calibrate the probability of tail outcomes.

Thinking, for simplicity, about U.S. unemployment rates, we can observe that the highest recorded rate over the past century was 25% during the Great Depression; it hit roughly 15% during the recent pandemic.

Given these data points, what should a one-in-10,000 unemployment rate then look like? I remember seeing a set of macro simulations a few years ago; the maximum observed unemployment rate in a set of 10,000 sims was around 15%, roughly in line with the recently observed pandemic maximum. At the time, the depression-era number was not considered plausible by the producers of the sims, due to labor market deregulation and improvements in mobility and welfare that have been implemented since the 1930s.

To my mind, however, the pandemic doesn’t feel like it was a one-in-10,000 event. The Great Depression doesn’t either, though I didn’t actually experience it.

It would be possible but extremely unlucky to live through a one-in-10,000-level economic calamity twice in a randomly selected century. When you think about extremely rare tail events this way, the maximum unemployment rate in 10,000 simulated future paths should be 40%, 50%, perhaps even 100% – assuming such a dire outcome is even meaningful.

Even though the 99.99th percentile is a bit shaky, you can have rather more confidence in the location of the 90th or perhaps even the 99th percentile using a Monte Carlo simulation. Armed with a decent representation of the distribution of interest, which will normally target credit losses or something similar, an array of analytical options then become available.

In my view, it is well worth considering the more frequent use of Monte Carlo simulations as a supplement to traditional, narrative scenarios for stress testing. However, we should remain careful when analyzing extreme tail outcomes, given the difficulties faced in capturing these seldom-visited areas of the target distribution.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s