Gender bias in recruitment is well-documented, but hard to shift. Awareness campaigns, unconscious bias training, and DEI commitments have produced limited measurable change in who actually gets hired, particularly in male-dominated industries like technology, engineering, and finance. The first field experimental study to assess the impact of AI tools on gender diversity in recruitment suggests a more direct route to closing the gap.
Researchers from Monash University and the University of Gothenburg examined whether AI recruitment tools change both who applies for roles and how those applications are evaluated. The results are significant for any organisation serious about reducing gender bias in its hiring process.
The researchers advertised a real web designer role and invited over 700 people to apply. Candidates completed a structured chat interview through Sapia.ai‘s AI-powered platform. Candidates were randomly told either that AI would assess their application or that a human evaluator would.
The first finding came from the application stage. When candidates knew AI would assess their application, 30% more women applied. That increase came without any reduction in the overall number or quality of applications. The prospect of AI assessment, rather than human evaluation, was enough to shift the gender composition of the applicant pool in a role where women are typically underrepresented.
The second experiment tested the effect of AI on the evaluators themselves. Five hundred people working in tech were asked to evaluate the applications. Some were shown the applicant’s gender. Some were not. Some were shown both the applicant’s gender and their AI score.
The pattern was stark. Evaluators shown an applicant’s gender chose significantly more male candidates. When gender was withheld, they chose equal numbers of men and women. When evaluators were shown both the gender and the AI score, they again chose equal numbers of men and women.
The AI score acted as an objective anchor. It did not replace human judgment. It reduced the influence of gender on that judgment. The overall result across both experiments was a 36% reduction in the gender gap. The full research is available in the Does AI Help or Hurt Gender Diversity whitepaper.
The 30% increase in female applications is not a statistical quirk. It reflects a real and rational response to how hiring has historically worked.
Women in male-dominated industries have significant experience of bias in human-led hiring processes: being evaluated on presentation rather than substance, facing assumptions about their commitment or suitability, being held to different standards than male counterparts for identical behaviours. The perception that AI assessment removes these variables is enough to change application behaviour.
This matters because pipeline diversity is one of the most cited barriers to gender equity in hiring. Organisations frequently attribute low female representation in shortlists to a pipeline problem. The research suggests a different explanation: the process itself is discouraging women from applying in the first place. Changing the assessment method, and communicating that change to candidates, shifts the pipeline without requiring any changes to sourcing or outreach.
Sapia’s own data reinforces this finding. Female candidates consistently score higher than male candidates on Sapia’s chat-based assessments, yet are hired at lower rates overall. This gap does not emerge from the AI scoring. It emerges from human decisions made after the AI has produced its recommendations, precisely the pattern the second experiment confirmed.
The finding that showing evaluators an AI score neutralised gender bias in their decisions is the most practically significant result of the research.
It demonstrates that objective data, presented at the point of decision, is sufficient to reduce the influence of gender on hiring choices. Evaluators who knew a candidate’s gender were significantly more likely to favour men. Evaluators who knew both the gender and the AI score chose equal numbers of men and women. The AI score did not remove the evaluator’s awareness of gender. It gave them something more objective to anchor their decision to.
This has direct implications for how AI recruitment tools should be implemented. The benefit is not just in the automated screening stage. It extends to the human decisions that follow. Providing hiring managers with structured, objective candidate scores before they make shortlisting or offer decisions reduces the window in which gender bias can re-enter the process.
It also challenges the assumption that AI in recruitment is primarily a speed or efficiency tool. The research shows it is equally a fairness tool, when the AI is built on clean, demographic-free assessment rather than biased historical data. The contrast with Amazon’s recruiting tool, which trained on a decade of male-dominated CV data and learned to penalise female applicants, illustrates what happens when that design principle is ignored. The AI Bias in Hiring blog covers how to identify and avoid these failure modes when evaluating AI recruitment tools.
The research points to two specific actions that reduce gender bias in the recruitment process, both supported by experimental evidence.
The first is telling candidates that AI will assess their application. This alone increased female applications by 30% in the study. For organisations struggling with underrepresentation at application stage, communicating the use of blind, structured AI assessment in the job advert is a low-cost, high-impact change.
The second is providing human evaluators with AI scores before they make hiring decisions. The research showed this was sufficient to neutralise gender bias in evaluator choices. Structured, objective candidate data anchors human judgment in a way that training and awareness programmes have consistently failed to do.
Both actions depend on the AI being built correctly. An AI that uses demographic data, trains on historically biased hiring decisions, or processes visual inputs like video or photos will not produce the neutralising effect the research describes. It will encode and amplify the same biases human evaluators bring. The FAIR Framework whitepaper sets out the specific design and testing standards that determine whether an AI recruitment tool reduces or reinforces gender bias.
For organisations tracking gender outcomes across their hiring funnel, the Diversity Recruiting Metrics blog covers how to measure where the gender gap is widest and which stages of the process are driving it.
The research conclusion is straightforward. AI, designed and implemented correctly, can substantially remove gender bias from the recruitment process and encourage applications from candidates who might otherwise not have applied. The 36% reduction in the gender gap is not a projection. It is a measured outcome from a real hiring process. That is the standard organisations should be holding their AI recruitment tools to.
For a broader framework on using AI to build equity into hiring from the ground up, the AI for Equity eBook is a practical starting point.
Want to see how Sapia’s blind, structured AI assessment reduces gender bias in your hiring funnel? Book a demo.
Researchers from Monash University and the University of Gothenburg ran two field experiments on a real job posting. They found that telling candidates AI would assess their application increased female applications by 30%. They also found that human evaluators shown a candidate’s gender chose significantly more men, but when shown an AI score alongside the gender information, they chose equal numbers of men and women. The overall effect was a 36% reduction in the gender gap across the recruitment process.
The research suggests women perceive AI assessment as fairer than human evaluation, based on their experience of gender bias in human-led hiring processes. The prospect of being assessed on job-relevant responses rather than appearance, presentation, or personal impression is enough to change application behaviour. This has direct implications for how organisations communicate their recruitment process to candidates.
Yes, if it is built incorrectly. AI trained on historically biased hiring data learns to replicate those biases at scale. Amazon’s recruiting tool, trained on a decade of male-dominated CV data, penalised female applicants and downgraded graduates of all-women’s colleges. AI that processes video or photos introduces visual signals that correlate with demographic characteristics. The design of the model and the data it is trained on determine whether AI reduces or amplifies gender bias.
The research found that evaluators shown both a candidate’s gender and their AI score chose equal numbers of men and women, despite knowing the applicant’s gender. The objective score provided an anchor for the decision that reduced the influence of gender on the outcome. This suggests that providing hiring managers with structured candidate data before they make shortlisting decisions is one of the most direct ways to reduce gender bias at the human decision-making stage.
The tool should not use demographic data including gender, age, or ethnicity as inputs to the scoring model. It should be tested at every stage of development for adverse impact across gender groups using accepted statistical methods. Bias testing results should be available on request. The model should be monitored continuously after deployment, not just tested before launch. Vendors should be able to explain exactly what data the model was trained on and how fairness is measured and maintained over time.