Interactive exploration of VaR backtesting with Christoffersen tests (unconditional coverage, independence, conditional coverage) and binomial coverage analysis with Kupiec test
Risk models promise specific probabilistic guarantees: a 1% VaR should be exceeded only 1% of the time. Backtesting checks whether the model delivers on that promise by comparing forecasts against realized outcomes (Christoffersen 2012, chap. 13; Hull 2023, sec. 11.10).
// ============================================================// SHARED UTILITIES// ============================================================// Seeded PRNG (Mulberry32)prng = {functionmulberry32(seed) {returnfunction() { seed |=0; seed = seed +0x6D2B79F5|0let t =Math.imul(seed ^ seed >>>15,1| seed) t = t +Math.imul(t ^ t >>>7,61| t) ^ treturn ((t ^ t >>>14) >>>0) /4294967296 } }functionboxMuller(rng) {const u1 =rng(), u2 =rng()returnMath.sqrt(-2*Math.log(u1)) *Math.cos(2*Math.PI* u2) }return { mulberry32, boxMuller }}
// Standard normal CDF (Abramowitz & Stegun approximation)normalCDF = x => {const a1 =0.254829592, a2 =-0.284496736, a3 =1.421413741const a4 =-1.453152027, a5 =1.061405429, p =0.3275911const sign = x <0?-1:1const z =Math.abs(x) /Math.sqrt(2)const t =1.0/ (1.0+ p * z)const y =1- (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t *Math.exp(-z * z)return0.5* (1+ sign * y)}
// Log-gamma function (Lanczos approximation, g=7)lgamma = {const g =7const coef = [0.99999999999980993,676.5203681218851,-1259.1392167224028,771.32342877765313,-176.61502916214059,12.507343278686905,-0.13857109526572012,9.9843695780195716e-6,1.5056327351493116e-7 ]function_lgamma(x) {if (x <=0) returnInfinityif (x <0.5) {returnMath.log(Math.PI/Math.sin(Math.PI* x)) -_lgamma(1- x) } x -=1let a = coef[0]const t = x + g +0.5for (let i =1; i < g +2; i++) a += coef[i] / (x + i)return0.5*Math.log(2*Math.PI) + (x +0.5) *Math.log(t) - t +Math.log(a) }return _lgamma}
fmt = (x, d) => x ===undefined||isNaN(x) ?"N/A": x.toFixed(d)
lnBinomPMF = (k, n, p) => {if (k <0|| k > n) return-Infinityif (p ===0) return k ===0?0:-Infinityif (p ===1) return k === n ?0:-Infinityreturnlgamma(n +1) -lgamma(k +1) -lgamma(n - k +1) + k *Math.log(p) + (n - k) *Math.log(1- p)}
binomPMF = (k, n, p) =>Math.exp(lnBinomPMF(k, n, p))
regularizedGammaP = {function_series(a, x) {let sum =1/ a, term =1/ afor (let n =1; n <300; n++) { term *= x / (a + n) sum += termif (Math.abs(term) <1e-14*Math.abs(sum)) break }return sum *Math.exp(-x + a *Math.log(x) -lgamma(a)) }function_cf(a, x) {const TINY =1e-30, EPS =1e-14let b = x +1- a, c =1/ TINY, d =1/ b, h = dfor (let i =1; i <=300; i++) {const an =-i * (i - a) b +=2 d = an * d + b;if (Math.abs(d) < TINY) d = TINY c = b + an / c;if (Math.abs(c) < TINY) c = TINY d =1/ dconst del = d * c h *= delif (Math.abs(del -1) < EPS) break }return1-Math.exp(-x + a *Math.log(x) -lgamma(a)) * h }return (a, x) => {if (x <=0) return0if (x < a +1) return_series(a, x)return_cf(a, x) }}
chi2CDF = (x, df) => x <=0?0:regularizedGammaP(df /2, x /2)
kupiecLR = (n, m, p) => {if (m <=0|| m >= n) return m ===0&& p >0?2* n *safeLog(1- p) * (-1) +0:0const piHat = m / nreturn-2* ((n - m) *safeLog(1- p) + m *safeLog(p))+2* ((n - m) *safeLog(1- piHat) + m *safeLog(piHat))}
VaR Backtesting
Backtesting compares ex ante VaR forecasts with ex post realized returns. Whenever the loss on a given day exceeds the VaR, we record a violation (or hit):
We construct the hit sequence\(\{I_{t+1}\}_{t=1}^T\) across \(T\) days. If the VaR model is correctly specified, this sequence should be unpredictable:
This null hypothesis implies two properties: (1) the average violation rate equals \(p\) (unconditional coverage), and (2) violations are randomly scattered over time (independence).
The unconditional coverage test checks whether the observed violation rate \(\hat{\pi} = T_1/T\) differs from \(p\):
The independence test models the hit sequence as a first-order Markov chain and tests whether the probability of a violation depends on yesterday’s outcome. Define \(\pi_{01} = \Pr(I_{t+1}=1 \mid I_t=0)\) and \(\pi_{11} = \Pr(I_{t+1}=1 \mid I_t=1)\). Under independence, \(\pi_{01} = \pi_{11}\):
Why clustering matters. Even with correct average coverage, clustered violations are dangerous. If all losses concentrate in a short period, the risk of bankruptcy is much higher than if violations are scattered randomly. Historical evidence shows that commercial bank VaRs, particularly those based on Historical Simulation, tend to produce exactly this pattern.
Note
Simulation setup. Returns are simulated from a GARCH(1,1) data-generating process: \(R_t = \sigma_t z_t\) with \(z_t \sim N(0,1)\) and \(\sigma^2_{t+1} = \omega + \alpha R_t^2 + \beta \sigma^2_t\). The three VaR methods differ in what they know about this process:
Normal (constant): estimates a single standard deviation from the full sample and assumes constant volatility. This is misspecified because the true volatility varies over time.
Historical Simulation: uses a rolling window of past raw returns to compute the VaR percentile. Also misspecified, as it adapts slowly to volatility changes.
GARCH(1,1): uses the true conditional volatility \(\sigma_t\) from the simulation. This is correctly specified and should produce well-behaved violations.
Tip
How to experiment
Try the Normal (constant) method first: it assumes constant volatility and will produce clustered violations when the true volatility spikes. Then switch to GARCH(1,1): because it tracks the true volatility dynamics, violations should be scattered randomly. Compare the test statistics across methods. Increase \(\alpha\) to create more volatile data and observe how the Normal and HS methods deteriorate.
html`<p style="color:#666;font-size:0.85rem;">Each red tick marks a VaR violation. Under a correct model, violations should be scattered randomly. Clustering indicates the model fails when risk is elevated.</p>`
Under a correctly specified VaR model, the number of violations in \(n\) days follows a binomial distribution: \(M \sim \text{Binomial}(n, p)\). The Kupiec likelihood ratio test checks whether the observed number of violations \(m\) is consistent with the promised coverage rate \(p\):
A critical challenge is the low power of backtests at high confidence levels with limited data. At a 99% VaR with 250 trading days, we expect only 2.5 violations, making it difficult to distinguish a correct model from an incorrect one.
Tip
How to experiment
Compare the power curve for \(n = 250\) versus \(n = 1000\) at the 99% confidence level. With fewer observations the power curve is much flatter, meaning the test struggles to distinguish between models with very different true violation rates. A model with a true violation rate of 3% (three times the promised 1%) may still not be rejected.
// Compute binomial PMF and rejection regioncvData = {const data = []const rejSet =newSet()for (let k =0; k <= cvMaxK; k++) {const pmf =binomPMF(k, cvN, cvP)const lr =kupiecLR(cvN, k, cvP)const rejected = lr > cvCriticalif (rejected) rejSet.add(k) data.push({ k, pmf, rejected, lr }) }// Find acceptance rangelet minAccept =0, maxAccept = cvMaxKfor (let k =0; k <= cvMaxK; k++) { if (!rejSet.has(k)) { minAccept = k;break } }for (let k = cvMaxK; k >=0; k--) { if (!rejSet.has(k)) { maxAccept = k;break } }return { data, rejSet, minAccept, maxAccept }}
// Power curve: P(reject | piTrue) for different true violation ratescvPowerData = {const { rejSet, maxAccept } = cvDataconst result = []const maxPi =Math.min(1, cvP *5)const step = maxPi /200for (let pi = step; pi <= maxPi; pi += step) {// Sum PMF over rejection region (k <= cvMaxK)let power =0for (const k of rejSet) { power +=binomPMF(k, cvN, pi) }// All k > maxAccept are also in the rejection region// Add P(X > cvMaxK) = 1 - sum_{k=0}^{cvMaxK} PMF(k)let cumBelow =0for (let k =0; k <= cvMaxK; k++) { cumBelow +=binomPMF(k, cvN, pi) } power += (1- cumBelow) result.push({ piTrue: pi,power:Math.min(power,1) }) }return result}
html`<div style="display:flex;gap:18px;font-size:0.85rem;margin-top:-6px;flex-wrap:wrap;"> <span><svg width="24" height="10"><line x1="0" y1="5" x2="24" y2="5" stroke="#2f71d5" stroke-width="2" stroke-dasharray="4 2"/></svg> Promised rate p = ${(cvP*100).toFixed(1)}%</span> <span><svg width="24" height="10"><line x1="0" y1="5" x2="24" y2="5" stroke="#d62728" stroke-width="1" stroke-dasharray="4 2"/></svg> Significance level α = ${cvAlpha}</span></div><p style="color:#666;font-size:0.85rem;">The power curve shows the probability of rejecting H₀ as a function of the true violation rate π. Because the Kupiec test is <strong>two-tailed</strong> (rejecting both too few and too many violations), the curve has a characteristic U-shape: power is high on the left (too few violations, e.g., an overly conservative model) and on the right (too many violations, e.g., a model that underestimates risk). The minimum near π = p is where the test has the least ability to detect misspecification. For risk management, the right side is most relevant: how well can the test detect a model that produces more violations than promised?</p>`
References
Christoffersen, Peter F. 2012. Elements of Financial Risk Management. 2nd ed. Academic Press.
Hull, John. 2023. Risk Management and Financial Institutions. 6th ed. John Wiley & Sons.