What a ReguLume Governance Score Actually Measures

“We’re 73% compliant.”

That sentence appears in board presentations every quarter across thousands of organizations. And in almost every case, nobody in the room can explain what it means.

73% of what? Which obligations? Weighted how? Does 73% mean three-quarters of the checkboxes are ticked, or that three-quarters of the risk surface is covered? If we close five more gaps, does the score go to 78%? Or 74%? Or 73.2%?

Compliance scores without published methodology are vanity metrics. They give the board a number to track and the compliance team a direction to point (“it went up!”). They don’t tell anyone what the number actually represents, what drives it, or what would change it.

We publish our methodology. Here’s how the ReguLume Governance Score works.

Four Dimensions, Not One

A single compliance score that only measures gap severity is incomplete. An organization can have zero critical gaps and still be exposed – because they’ve only assessed one of four applicable regulations, or because they identified the gaps but never closed them, or because the gap analysis exists but no evidence proves remediation happened.

The RGS measures four dimensions:

Readiness – How severe are the compliance gaps across assessed regulations? This is the dimension most people think of when they hear “compliance score.” It measures the actual gap findings: how many obligations have gaps, how severe those gaps are, and how the severity distribution looks across the client’s AI systems. A client with two critical gaps and zero low gaps has a very different readiness profile than a client with zero critical gaps and twenty low ones – even if the gap count is similar.

Evidence – How strong is the compliance evidence? Identifying gaps is step one. Collecting evidence that proves compliance is step two. This dimension measures the strength of validated evidence against applicable obligations. A client who has mapped all obligations and closed all gaps but hasn’t documented proof scores high on readiness but low on evidence. An auditor would notice.

Coverage – What percentage of applicable regulations have been assessed? A client subject to EU AI Act, NIST AI RMF, Colorado AI Act, and CCPA who has only assessed the EU AI Act has 25% coverage. The unassessed regulations represent unknown risk. Coverage measures how much of the regulatory surface has been evaluated – because you can’t score gaps you haven’t looked for.

Remediation – What percentage of identified remediation tasks have been resolved? Gaps identified but never remediated are compliance theater. This dimension measures forward progress: of all the tasks generated from gap findings, how many have been completed? A rising remediation score means the organization is closing gaps, not just documenting them.

Each dimension contributes to the composite score. The weighting is designed so that no single dimension can produce a high overall score alone – because compliance posture is genuinely multi-dimensional. High readiness with zero evidence is a different kind of exposure than low readiness with excellent documentation.

Why Severity Weighting Matters

Not all gaps are equal. This seems obvious. Most scoring systems treat it as obvious and then fail to implement it meaningfully.

A critical gap – a system that violates an Article 5 prohibition or lacks a mandatory control entirely – represents immediate enforcement exposure. The EU AI Act’s maximum penalty tier for prohibited practices: EUR 35 million or 7% of global turnover, whichever is higher. That’s not a risk to manage over time. That’s a risk to address this week.

A low gap – a documentation item that needs updating, a configuration that needs minor adjustment – represents cleanup work. Important for audit readiness. Not a board-level emergency.

The readiness dimension weights gaps by severity. A critical gap impacts the score far more than a low gap. Closing one critical gap can move the score more than closing ten low gaps – because the risk reduction from eliminating a prohibition violation dwarfs the risk reduction from fixing a documentation format.

This weighting also considers the AI system’s risk level. A critical gap on a high-risk system matters more than a critical gap on a minimal-risk system. And obligation type matters – a prohibition violation carries different weight than a documentation deficiency. The scoring reflects what the regulation itself treats as most serious.

The consequence: you can’t game the RGS by closing easy gaps first. Twenty low-severity fixes won’t significantly move a score dragged down by two critical findings. The score rewards what the regulation rewards – addressing the highest-risk exposures first.

Four Tiers

The composite score maps to a governance tier. Each tier carries a specific oversight recommendation – not because we’re prescribing governance policy, but because the compliance posture at each level implies different operational constraints.

Exemplary (90-100) – Comprehensive governance posture. Standard review cycles are appropriate. AI deployments can proceed through normal approval channels. This tier means the evidence exists, the gaps are minimal, the coverage is broad, and remediation is nearly complete.

Strong (70-89) – Solid foundation with minor gaps remaining. Periodic review is warranted. High-risk AI systems should require committee approval before deployment. The organization is in good shape but hasn’t closed every finding.

Developing (40-69) – Material gaps exist. Enhanced oversight is recommended. All AI deployments should require review before proceeding. There are gaps significant enough that deploying new AI systems without addressing them could compound the exposure.

Critical (0-39) – Significant compliance exposure. A governance hold is recommended – executive sign-off required for all AI initiatives until the score improves. This tier means the organization has substantial unaddressed gaps, limited evidence, or major coverage holes.

The tiers aren’t arbitrary ranges. They reflect practical governance thresholds: the point at which oversight should escalate, the point at which deployment should slow down, and the point at which the board needs to understand the exposure before authorizing further AI investment.

The Score Over Time

A snapshot score tells you where you are. A trend tells you where you’re going. Both matter for different audiences.

Every time a gap analysis completes, an evaluation plan is validated, or remediation tasks are updated, the governance score is recalculated and a snapshot is recorded. These snapshots create a history that answers the board’s most common question: “Are we getting better?”

The trend chart shows the trajectory. A score that went from 34 to 58 over three months is a different narrative than a score that dropped from 72 to 65 after a new regulation was assessed. Both are useful – one shows remediation progress, the other shows that expanding coverage reveals new gaps.

Per-regulation breakdowns let the consultant and the client see which regulations are driving the overall score. A composite of 64 that consists of 82% EU AI Act readiness and 31% Colorado readiness tells a clear story: the EU assessment is in good shape, Colorado needs work. The aggregate alone wouldn’t reveal that.

Preliminary Scores: Before the Assessment

A full governance score requires a completed gap analysis. But clients want to know their exposure before committing to the assessment – and consultants want a way to quantify urgency during the sales conversation.

The preliminary risk assessment generates an instant exposure estimate from data that already exists in the platform: the client’s system inventory, their risk classifications, and which regulations they’re associated with.

A client with six AI systems – two classified as high-risk, one as limited, three as undetermined – associated with EU AI Act and Colorado AI Act, with zero completed assessments, gets a preliminary exposure score that reflects: high-risk systems exist, multiple regulations apply, none have been assessed. That score is a starting point, not a final answer.

The preliminary score displays differently in the dashboard – a dashed gauge rather than a solid one, with a clear label indicating it’s estimated. The distinction matters. The client shouldn’t confuse “we think your exposure is significant” with “we measured your compliance and it’s low.” One is an estimate. The other is an assessment.

What Makes This Different from a Maturity Model

Maturity models measure organizational capability. “Do you have a risk management process?” “Is there a governance committee?” “Are policies reviewed annually?” They measure whether structures exist.

The RGS measures compliance state. Not “do you have a risk management process?” but “does the risk management documentation for this specific system address these specific obligations with this specific evidence?” It measures what an auditor measures – not organizational maturity, but demonstrable compliance.

The distinction is practical. A mature organization with excellent governance structures can still score low on the RGS if it hasn’t applied those structures to specific AI systems against specific regulatory obligations. A less mature organization that has methodically assessed its three AI systems against the EU AI Act’s requirements and documented everything can score higher.

This aligns with how enforcement works. A regulator won’t ask whether you have a governance committee. A regulator will ask whether the AI system you deployed into the EU market has documentation that satisfies Annex IV requirements. The RGS measures what the regulator measures.

Published Methodology

The scoring methodology is publicly accessible at /api/scoring-methodology – no authentication required. Dimensions, tier definitions, update triggers, and the logic that drives each component.

This isn’t standard practice. Most compliance scoring tools treat their methodology as proprietary. They show you a number and ask you to trust it.

We made the opposite choice. Compliance professionals evaluate tools by the same standard they evaluate their clients’ programs: show your work. A score that can’t be explained is a score that can’t be defended. If a consultant presents the RGS to a client’s board and a board member asks “how is this calculated?” – the answer should be specific, verifiable, and public.

The heatmap shows where the gaps are. The cross-regulation mapping shows how frameworks overlap. The evidence layer proves compliance was addressed. The governance score reduces all of it to a number the board can track.

One number. A published methodology. A trajectory you can explain.

The ReguLume Governance Score is recalculated automatically when gap analyses complete, evaluation plans are validated, or remediation tasks are updated. Historical snapshots enable trend analysis. The full methodology is published at /api/scoring-methodology. Learn more about how we validate AI outputs.