Methodology

Data & Analytical Methodology

Full transparency on how we process, analyze, and present OSHA enforcement records.

Data Source

All enforcement data originates from the U.S. Department of Labor, Occupational Safety and Health Administration (OSHA). This includes:

Data is sourced from the DOL's public enforcement data systems and imported into our database. We do not modify source data fields — penalty amounts, citation counts, dates, and addresses are presented exactly as reported by OSHA.

Risk Score Methodology

Each facility receives a composite risk score on a 0–100 scale. The score is calculated using a weighted model that considers multiple enforcement dimensions:

Scoring Components

ComponentWeightWhat It Measures
Penalty Severity40%Total penalties relative to industry median
Citation Gravity25%Presence and count of willful/repeat violations
Violation Density20%Number of citations per inspection
Abatement Status15%Percentage of citations marked as abated

The penalty severity component uses log-scale normalization to prevent outliers (e.g., $38M in the BP Texas City case) from distorting the scale. A facility with penalties at the 95th percentile of its NAICS sector receives a penalty component score near the maximum.

Tier Classification

Facilities are classified into four enforcement tiers based on their risk score:

TierRisk ScoreCriteriaSignificance
Tier 1 Critical75–100Penalties >$100K or willful violationsMost severe enforcement actions; substantial probability of death or serious physical harm
Tier 2 Severe50–74Penalties $25K–$100K, serious violationsSignificant enforcement with elevated citation gravity
Tier 3 Elevated25–49Penalties $5K–$25K, multiple citationsAbove-average enforcement activity for the sector
Tier 4 Standard0–24Penalties <$5K, routine inspectionsBaseline enforcement consistent with industry norms

Industry Benchmarking

Every facility report includes a comparison against its NAICS (North American Industry Classification System) sector average. This answers the critical question: "Is this facility's enforcement history unusual for its industry?"

How Benchmarks Are Calculated

We maintain a benchmark lookup table (svep_naics_stats) containing pre-computed statistics for each 2-digit NAICS sector:

The benchmark multiplier shown on each report (e.g., "27.2× the national average") is calculated as:

multiplier = facility_penalty / sector_avg_penalty

State Context

In addition to industry benchmarks, each report includes state-level positioning. The state percentile indicates where a facility ranks among all inspected facilities in its state:

percentile = (facilities_with_lower_penalty / total_state_facilities) × 100

State statistics are maintained in a dedicated lookup table (svep_state_stats) covering all 50 states and U.S. territories.

AI-Enriched Analysis

Our top 49,800+ facility reports include AI-generated analytical narratives that synthesize enforcement data into readable, contextual assessments. These narratives are generated using Google's Gemini language model and include:

AI Transparency Statement

AI-generated narratives are clearly based on the underlying enforcement data and regulatory frameworks. The AI does not fabricate facts, invent citations, or create penalty amounts. All quantitative claims in the narrative (penalty amounts, citation counts, dates) are sourced directly from the DOL database record for that facility.

The analytical commentary (e.g., "penalties significantly exceed the national median") is derived from our benchmark calculations, not from AI speculation.

Citation Standard Translation

OSHA citations reference Code of Federal Regulations (CFR) standards using numeric identifiers (e.g., 1926.0501). We maintain a translation table that maps the 30+ most frequently cited standards to plain-English descriptions, making reports accessible to non-specialist readers.

Data Limitations

Users should be aware of the following limitations:

Update Frequency

Our database is periodically refreshed from DOL source data. Analytical enrichment is applied to the highest-severity facilities first, with ongoing expansion of coverage. The current dataset includes 2,336,195 facility records with 49,800+ receiving full analytical enrichment.

See It in Action

The methodology described on this page drives every facility report on SVEP Navigator. Explore the platform to see how these analytical frameworks translate into actionable intelligence: