Analyze AI model fairness using industry-standard metrics including demographic parity, equalized odds, equal opportunity, and disparate impact. Compare model predictions across groups to detect bias and ensure fair machine learning systems.
Confusion matrix values for the reference group
Confusion matrix values for the protected group
You might also find these calculators useful
Calculate ML classification metrics from confusion matrix
Calculate F1 and F-beta scores from precision and recall
Detect concept drift in ML model performance
Calculate CVSS v3.1 vulnerability severity scores
The AI Fairness Calculator helps data scientists and ML engineers assess whether their models treat different demographic groups equitably. Analyze six industry-standard fairness metrics, check disparate impact compliance, and receive actionable recommendations to reduce bias. Essential for responsible AI deployment in hiring, lending, healthcare, and criminal justice applications.
AI fairness ensures machine learning models don't discriminate against protected groups based on characteristics like race, gender, age, or disability. Even well-intentioned models can exhibit unfair behavior due to biased training data, proxy variables, or historical disparities. Fairness metrics quantify how differently a model treats various groups, enabling detection and mitigation of algorithmic bias. The field encompasses multiple definitions because different contexts prioritize different notions of fairness.
Disparate Impact Formula
Disparate Impact = P(Ŷ=1|Group B) / P(Ŷ=1|Group A) ≥ 0.80The disparate impact rule (80% rule) has legal standing in employment and lending decisions. Organizations can face lawsuits if their AI systems produce discriminatory outcomes, even without discriminatory intent. Proactive fairness assessment helps avoid legal exposure.
AI systems increasingly affect people's lives through hiring decisions, loan approvals, healthcare recommendations, and criminal justice predictions. Ensuring fair treatment across groups is an ethical imperative for responsible AI development.
Biased AI systems generate negative publicity and erode customer trust. Companies face backlash when algorithms discriminate against protected groups. Fairness testing protects brand reputation and customer relationships.
Fairness analysis often reveals data quality issues, feature engineering problems, or model limitations. Addressing fairness issues frequently improves overall model performance and generalization.
Resume screening and candidate ranking systems must not discriminate based on protected characteristics. The 80% rule originated from employment discrimination law. Hiring AI requires careful fairness analysis to avoid disparate impact on gender, race, age, and disability status.
Loan approval algorithms must comply with fair lending regulations. Credit scoring models that produce different approval rates across racial or gender groups face regulatory scrutiny. Fairness metrics help ensure equitable access to credit.
Medical AI systems for diagnosis, treatment recommendations, and resource allocation must work equitably across demographic groups. Healthcare disparities can be amplified by biased algorithms, making fairness critical for health equity.
Recidivism prediction and bail algorithms have faced criticism for racial bias. Tools like COMPAS demonstrated how seemingly neutral features can produce discriminatory outcomes. Fairness analysis is essential for criminal justice applications.
Insurance pricing and approval models must balance actuarial accuracy with fair treatment across protected groups. Regulations increasingly require fairness documentation for AI-driven insurance decisions.
AI systems that curate content, recommend products, or moderate user-generated content should work fairly across user demographics. Biased content systems can reinforce stereotypes and exclude minority voices.
Demographic parity (also called statistical parity) requires that the positive prediction rate is equal across groups: P(Ŷ=1|A=a) = P(Ŷ=1|A=b). Use it when you want equal selection rates regardless of group membership, such as ensuring equal interview rates in hiring. However, it may conflict with accuracy if base rates differ between groups.
Equalized odds requires both true positive rate (TPR) and false positive rate (FPR) to be equal across groups. Equal opportunity is a relaxed version requiring only equal TPR. Equal opportunity ensures qualified individuals from all groups have equal chances of positive outcomes, while equalized odds additionally ensures equal error rates for negative outcomes.
The 80% rule, established by the EEOC in employment guidelines, states that selection rate for any protected group should be at least 80% of the rate for the group with the highest rate. Mathematically: min(selection_rate_B / selection_rate_A, selection_rate_A / selection_rate_B) ≥ 0.80. Violation suggests potential disparate impact requiring justification.
Generally no. Research shows that satisfying multiple fairness criteria simultaneously is often mathematically impossible when base rates differ between groups. This is known as the 'impossibility theorem' of fairness. You must choose which fairness criterion matters most for your specific application and document the trade-offs.
Different base rates (different proportions of actual positives) create inherent tension between fairness metrics. You cannot simultaneously achieve demographic parity and calibration when base rates differ. Consider whether historical base rate differences reflect true differences or historical bias, and choose metrics accordingly.
Consider your context: Use demographic parity when you want equal representation regardless of qualifications. Use equal opportunity when you want qualified members of all groups to have equal success rates. Use predictive parity when positive predictions should be equally reliable across groups. Use calibration when probability estimates should be accurate for all groups.
Common causes include: biased training data reflecting historical discrimination, proxy variables correlated with protected attributes (e.g., zip codes correlated with race), unequal representation in training data, optimization for overall accuracy at the expense of minority performance, and feedback loops that amplify initial biases.
Strategies include: pre-processing (reweighting or resampling training data), in-processing (adding fairness constraints during training), post-processing (adjusting decision thresholds per group), removing or transforming biased features, collecting more balanced data, and using fairness-aware algorithms like adversarial debiasing.
Popular tools include: IBM AI Fairness 360 (comprehensive toolkit), Google What-If Tool (interactive exploration), Microsoft Fairlearn (Python library), Aequitas (bias audit toolkit), LIME and SHAP (for understanding feature importance by group), and custom implementations using confusion matrices as shown in this calculator.