SB21-169 wants the disparate-impact test, not the model's internals. Here is the test, and the fix.
A protected-class denial under Colorado-style rules is examined on its outcome gap: do two subgroups get declined at materially different rates the risk does not justify? This harness runs that test on a synthetic decline-decision set, then shows the decoupled-classifier fix compressing the gap. The outcomes are synthetic and seeded; the headline compression result is Jeff's own published CAMH thesis figure, cited as track record.
SYNTHETIC OUTCOMES (seeded PRNG, no real applicants, no Corgi data). The 35% to ~1% compression is Jeff's published UofT/CAMH result, not a Corgi number.
Try a scenario: one click runs the test
Fine-tune (optional)
The single model trains one scorer for everyone; the decoupled approach fits per-subgroup calibration so the decline threshold means the same thing for each group. The fix Jeff's thesis used.
How much a correlated, non-risk-bearing feature skews the single model's subgroup decline rates. Illustrative dial; in a real book this is whatever your features encode.
Synthetic decline rate by subgroup
Track record (not a Corgi number). On a clinical corpus of about 1,500 OCR'd psychiatric notes (UofT MSc thesis, CAMH), this decoupled-plus-calibrated architecture compressed a 35-point false-positive-rate parity gap to roughly 1 point, with no degradation to overall accuracy, on a held-out set of 200 notes. Underwriting and claims fairness testing is the same problem shape. Published: jeffpinto.com/notes/decoupled-classifiers.
Sources & method
Synthetic outcomes: a seeded decline-decision set (deterministic PRNG, same series every reload). No real applicants, no protected-class data, no Corgi outcomes. The subgroup gap is generated by a labelled latent-bias dial; it demonstrates the test, it is not a measured Corgi figure.
The legal hook: Colorado SB21-169 / Reg 10-1-1 bars ECDIS-driven unfair discrimination on protected classes across rating, underwriting, and claims, and asks for protected-class testing and a governance framework, not the model's internals. The NAIC AIS Program expects the same testing evidence over the reporting period (NAIC bulletin).
Four-fifths rule: the 80% adverse-impact ratio is a long-standing US disparate-impact screen; used here as one illustrative pass/fail line. It is a screen, not the whole legal test.
The compression result (35% to ~1%): Jeff's OWN published figure from his UofT/CAMH MSc thesis (decoupled-classifiers note), cited as track record. The synthetic harness shows the method's shape on insurance-shaped outcomes; it does not reproduce the clinical result.