Back

7 critical questions to ask when selecting your ‘Ai for Hiring’ technology

 

Interrupting bias in people decisions

We hope that the debate over the value of diverse teams is now over.  There is plenty of evidence that diverse teams lead to better decisions and therefore, business outcomes for any organisation.

This means that CHROs today are being charged with interrupting the bias in their people decisions and expected to manage bias as closely as the  CFO manages the financials.

But the use of Ai tools in hiring and promotion requires careful consideration to ensure the technology does not inadvertently introduce bias or amplify any existing biases.

To assist HR decision-makers to navigate these decisions confidently, we invite you to consider these 8 critical questions when selecting your Ai technology.

You will find not only the key questions to ask when testing the tools but why these are critical questions to ask and how to differentiate between the answers you are given.

Question 1  

What training data do you use?

Another way to ask this is: what data do you use to assess someone’s fit for a role?

First up- why is this an important question to ask …

Machine-learning algorithms use statistics to find and apply patterns in data.  Data can be anything that can be measured or recorded, e.g. numbers, words,  images, clicks etc. If it can be digitally stored, it can be fed into a machine-

learning algorithm.

The process is quite basic: find the pattern, apply the pattern.

This is why the data you use to build a predictive model, called training data, is so critical to understand.

In HR, the kinds of data that could be used to build predictive models for  hiring and promotion are:

  • CV data and cover letters
  • Games built to measure someone’s memory capacity and processing speed
  • Behavioural data, e.g. how you engage in an assessment,
  • Video Ai can capture how you act in an interview—your gestures, pose, lean, as well as your tone and cadence.
  • Your text or voice responses to structured interview questions
  • Public data sources such as your social media profile, your tweets, and other social media activity

If you consider the range of data that can be used in training data, not all data sources are equal, and on its surface, you can certainly see how some carry the risk of amplifying existing bases and the risk of alienating your candidates.

Consider the training data through these lenses:

> Is the data visible or opaque to the candidate?

Using data that is invisible to the candidate may impact your employer brand. And relying on behavioural data such as how quickly a candidate completes the assessment, social data or any data that is invisible to the candidate might expose you to not only brand risk but also a legal risk. Will your candidates trust an assessment that uses data that is invisible to them, scraped about them or which can’t be readily explained?

Increasingly companies are measuring the business cost from poor hiring processes that contribute to customer churn. 65% of candidates with a positive experience would be a customer again even if they were not hired and 81% will share their positive experience with family, friends and peers (Source: Talent Board).

Visibility of the data used to generate recommendations is also linked to explainability which is a common attribute now demanded by both governments and organisations in the responsible use of Ai.

Video Ai tools have been legally challenged on the basis that they fail to comply with baseline standards for AI decision-making, such as the OECD AI Principles and the Universal Guidelines for AI.

Or that they perpetuate societal biases and could end up penalising nonnative speakers, visibly nervous interviewees or anyone else who doesn’t fit the model for look and speech.

If you are keen to attract and retain applicants through your recruitment pipeline, you may also care about how explainable and trustworthy your assessment is. When the candidate can see the data that is used about them and knows that only the data they consent to give is being used, they may be more likely to apply and complete the process. Think about how your own trust in a recruitment process could be affected by different assessment types.

> Is the data 1st party data or 3rd party data?

1st party data is data such as the interview responses written by a candidate to answer an interview question. It is given openly, consensually and knowingly. There is full awareness about what this data is going to be used for and it’s typically data that is gathered for that reason only.

3rd party data is data that is drawn from or acquired through public sources about a candidate such as their Twitter profile. It could be your social media profile. It is data that is not created for the specific use case of interviewing for a job, but which is scraped and extracted and applied for a different purpose. It is self-evident that an Ai tool that combines visible data and 1st party data is likely to be both more accurate in the application for recruitment and have outcomes more likely to be trusted by the candidate and the recruiter.


Trust matters to your candidates and to your culture …

At PredictiveHire, we are committed to building ethical and engaging assessments. This is why we have taken the path of a text chat with no time pressure. We allow candidates to take their own time, reflect and submit answers in text format.

We strictly do not use any information other than the candidate responses to the interview questions (i.e. fairness through unawareness – algorithm knows nothing about sensitive attributes).

For example, no explicit use of race, age, name, location etc, candidate behavioural data such as how long they take to complete, how fast they type, how many corrections they make, information scraped from the internet etc. While these signals may carry information, we do not use any such data.


2. Can you explain why ‘person y’ was recommended by the Ai and not ‘person z’?

Another way to ask this is – Can you explain how your algorithm works? and does your solution use deep learning models?

This is an interesting question especially given that we humans typically obfuscate our reasons for rejecting a candidate behind the catch-all explanation of “Susie was not a cultural fit”.

For some reason, we humans have a higher-order need and expectation to unpack how an algorithm arrived at a recommendation. Perhaps because there is not much to say to a phone call that tells you were rejected for cultural fit.

This is probably the most important aspect to consider, especially if you are the change leader in this area. It is fair to expect that if an algorithm affects someone’s life, you need to see how that algorithm works.

Transparency and explainability are fundamental ingredients of trust, and there is plenty of research to show that high trust relationships create the most productive relationships and cultures.

This is also one substantial benefit of using AI at the top of the funnel to screen candidates. Subject to what kind of Ai you use, it enables you to explain why a candidate was screened in or out.

This means recruitment decisions become consistent and fairer with AI  screening tools.

But if Ai solutions are not clear why some inputs (called “features” in machine learning jargon) are used and how they contribute to the outcome,  explainability becomes impossible.

For example, when deep learning models are used, you are sacrificing explainability for accuracy. Because no one can explain how a particular data feature contributed to the recommendation. This can further erode candidate trust and impact your brand.

The most important thing is that you know what data is being used and then ultimately, it’s your choice as to whether you feel comfortable to explain the algorithm’s recommendations to both your people and the candidate.

3. What assumptions and scientific methods are behind the product? Are they validated?

Assessment should be underpinned by validated scientific methods and like all science, the proof is in the research that underpins that methodology.

This raises another question for anyone looking to rely on AI tools for human decision making – where is the published and peer-reviewed research that ensures you can have confidence that a) it works and b) it’s fair.

This is an important question given the novelty of AI methods and the pace at which they advance.

At PredictiveHire, we have published our research to ensure that anyone can investigate for themselves the science that underpins our AI solution.


INSERT RESEARCH


We continuously analyse the data used to train models for latent patterns that reveal insights for our customers as well as inform us of improving the outcomes.

4. What are the bias tests that you use and how often do you test for bias?

It’s probably self-evident why this is an important question to ask. You can’t have much confidence in the algorithm being fair for your candidates if no one is testing that regularly.

Many assessments report on studies they have conducted on testing for bias.  While this is useful, it does not guarantee that the assessment may not demonstrate biases in new candidate cohorts it’s applied on.

The notion of “data drift” discussed in machine learning highlights how changing patterns in data can cause models to behave differently than expected, especially when the new data is significantly different from the training data.

Therefore on-going monitoring of models is critical in identifying and mitigating risks of bias.

Potential biases in data can be tested for and measured.

These include all assumed biases such as between gender and race groups that can be added to a suite of tests. These tests can be extended to include other groups of interest where those group attributes are available like  English As Second Language (EASL) users.

On bias testing, look out for at least these 3 tests and ask to see the tech manual and an example bias testing report.

  • Proportional Parity Test. This is the standard EEOC measure for adverse impact on selection and recommendations.
  • Score Distribution Test. This measures whether the assessment score distributions are similar across groups of interest
  • Fairness Test. This measures whether the assessment is making the same rate of errors across groups of interest

INSERT IMAGE


At PredictiveHire, we conduct all the above tests. We conduct statistical tests to check for significant differences between groups of feature values,  model outcomes and recommendations. Tests such as t-tests, effect sizes,  ANOVA, 4/5th, Chi-Squared etc. are used for this. We consider this standard practice.

We go beyond the above standard proportional and distribution tests on fairness and adhere to stricter fairness considerations, especially at the model training stage on the error rates. These include following guidelines set by  IBM’s AI Fairness 360 Open Source Toolkit. Reference: https://aif360.mybluemix.net/) and the Aequitas project at the Centre for  Data Science and Public Policy at the University of Chicago

We continuously analyse the data used to train models for latent patterns that reveal insights for our customers as well as inform us of improving the outcomes.

5. How can you remove bias from an algorithm?

We all know that despite best intentions, we cannot be trained out of our biases. Especially the unconscious biases.

This is another reason why using data-driven methods to screen candidates is fairer than using humans.

Biases can occur in many different forms. Algorithms and Ai learn according to the profile of the data we feed it. If the data it learns from is taken from a  CV, it’s only going to amplify our existing biases. Only clean data, like the answers to specific job-related questions, can give us a true bias-free outcome.

If any biases are discovered, the vendor should be able to investigate and highlight the cause of the bias (e.g. a feature or definition of fitness) and take corrective measure to mitigate it.

  1. On which minority groups have you tested your products?

If you care about inclusivity, then you want every candidate to have an equal and fair opportunity at participating in the recruitment process.

This means taking account of minority groups such as those with autism,  dyslexia and English as a second language (EASL), as well as the obvious need to ensure the approach is inclusive for different ethnic groups, ages and genders.

At PredictiveHire, we test the algorithms for bias on gender and race. Tests can be conducted for almost any group in which the customer is interested.  For example, we run tests on “English As a Second Language” (EASL) vs. native speakers.

  1. What kind of success have you had in terms of creating hiring equity?

If one motivation for you introducing Ai tools to your recruitment process is to deliver more diverse hiring outcomes, it’s natural you should expect the provider to have demonstrated this kind of impact in its customers.

If you don’t measure it, you probably won’t improve it. At PredictiveHire, we provide you with tools to measure equality. Multiple dimensions are measured through the pipeline from those who applied, were recommended and then who was ultimately hired.

8. What is the composition of the team building this technology?

Thankfully, HR decision-makers are much more aware of how human bias  can creep into technology design. Think of how the dominance of one trait in  the human designers and builders have created an inadvertent unfair  outcome.

In 2012, YouTube noticed something odd.

About 10% of the videos being uploaded were upside down.

When designers investigated the problem, they found something unexpected:  Left-handed people picked up their phones differently, rotating them 180  degrees, which lead to upside-down videos being uploaded,

The issue here was a lack of diversity in the design process. The engineers and designers who created the YouTube app were all right-handed, and none had considered that some people might pick up their phones differently.

In our team at PredictiveHire, from the top down, we look for diversity in its broadest definition.

Gender, race, age, education, immigrant vs native-born, personality traits,  work experience. It all adds up to ensure that we minimise our collective blind spots and create a candidate and user experience that works for the greatest number of people and minimises bias.

What other questions have you used to validate the fairness and integrity of the Ai tools you have selected to augment your hiring and promotion processes?

We’d love to know!


Blog

Neuroinclusion by design. Not by exception.

Why neuroinclusion can’t be a retrofit and how Sapia.ai is building a better experience for every candidate.

In the past, if you were neurodivergent and applying for a job, you were often asked to disclose your diagnosis to get a basic accommodation – extra time on a test, maybe the option to skip a task. That disclosure often came with risk: of judgment, of stigma, or just being seen as different.

This wasn’t inclusion. It was bureaucracy. And it made neurodiverse candidates carry the burden of fitting in.

We’ve come a long way, but we’re not there yet.

Shifting from retrofits to inclusive-by-design

Over the last two decades, hiring practices have slowly moved away from reactive accommodations toward proactive, human-centric design. Leading employers began experimenting with:

  • Sharing interview questions in advance

  • Replacing group exercises with structured simulations

  • Offering a variety of assessment formats

  • Co-designing assessments with neurodiverse candidates

But even these advances have often been limited in scope, applied to special hiring programs or specific roles. Neurodiverse talent still encounters systems built for neurotypical profiles, with limited flexibility and a heavy dose of social performance pressure.

Hiring needs to look different.

Insight 1: The next frontier of hiring equity is universal design

Truly inclusive hiring doesn’t rely on diagnosis or disclosure. It doesn’t just give a select few special treatment. It’s about removing friction for everyone, especially those who’ve historically been excluded.

That’s why Sapia.ai was built with universal design principles from day one.

Here’s what that looks like in practice:

  • No time limits — Candidates answer at their own pace
  • No pressure to perform — It’s a conversation, not a spotlight
  • No video, no group tasks — Just structured, 1:1 chat-based interviews
  • Built-in coaching — Everyone gets personalised feedback

It’s not a workaround. It’s a rework.

Insight 2: Not all “friendly” methods are inclusive

We tend to assume that social or “casual” interview formats make people comfortable. But for many neurodiverse individuals, icebreakers, group exercises, and informal chats are the problem, not the solution.

When we asked 6,000 neurodiverse candidates about their experience using Sapia.ai’s chat-based interview, they told us:

“It felt very 1:1 and trustworthy… I had time to fully think about my answers.”

“It was less anxiety-inducing than video interviews.”

“I like that all applicants get initial interviews which ensures an unbiased and fair way to weigh-up candidates.”

Insight 3: Prediction ≠ Inclusion

Some AI systems claim to infer skills or fit from resumes or behavioural data. But if the training data is biased or the experience itself is exclusionary, you’re just replicating the same inequity with more speed and scale.

Inclusion means seeing people for who they are, not who they resemble in your data set.

At Sapia.ai, every interaction is transparent, explainable, and scientifically validated. We use structured, fair assessments that work for all brains, not just neurotypical ones.

Where to from here?

Neurodiversity is rising in both awareness and representation. However, inclusion won’t scale unless the systems behind hiring change as well.

That’s why we built a platform that:

  • Doesn’t rely on disclosure

  • Removes ambiguity and pressure

  • Creates space for everyone to shine

  • Measures what matters, fairly

Sapia.ai is already powering inclusive, structured, and scalable hiring for global employers like BT Group, Costa Coffee and Concentrix. Want to see how your hiring process can be more inclusive for neurodivergent individuals? Let’s chat. 

Read Online
Blog

Skills Measurement vs Skills Inference – What’s the Difference and Why Does It Matter?

There’s growing interest in AI-driven tools that infer skills from CVs, LinkedIn profiles, and other passive data sources. These systems claim to map someone’s capability based on the words they use, the jobs they’ve held, and patterns derived from millions of similar profiles. In theory, it’s efficient. But when inference becomes the primary basis for hiring or promotion, we need to scrutinise what’s actually being measured and what’s not.

Let’s be clear: the technology isn’t the problem. Modern inference engines use advanced natural language processing, embeddings, and knowledge graphs. The science behind them is genuinely impressive. And when they’re used alongside richer sources of data, such as internal project contributions, validated assessments, or behavioural evidence, they can offer valuable insight for workforce planning and development.

But we need to separate the two ideas:

  • Skills Measurement: Directly observing or quantifying a skill based on evidence of actual performance. 
  • Skills Inference: Estimating the likelihood that someone has a skill, based on indirect signals or patterns in their data. 

The risk lies in conflating the two.

The Problem Isn’t Inference of Skills. It’s the Data Feeding It

CVs and LinkedIn profiles are riddled with bias, inconsistency, and omission. They’re self-authored, unverified, and often written strategically – for example, to enhance certain experiences or downplay others in response to a job ad. 

And different groups represent themselves in different ways. Ahuja (2024) showed, for example, that male MBA graduates in India tend to self-promote more than their female peers. Something as simple as a longer LinkedIn ‘About’ section becomes a proxy for perceived competence.

Job titles are vague. Skill descriptions vary. Proficiency is rarely signposted. Even where systems draw on internal performance data, the quality is often questionable. Ratings tend to cluster (remember the year everyone got a ‘3’ at your org?) and can often reflect manager bias or company culture more than actual output.

Sophisticated ≠ Objective

The most advanced skill inference platforms use layered data: open web sources like job ads and bios, public databases like O*NET and ESCO, internal frameworks, even anonymised behavioural signals from platform users. This breadth gives a more complete picture, and the models powering it are undeniably sophisticated.

But sophistication doesn’t equal accuracy.

These systems rely heavily on proxies and correlations, rather than observed behaviour. They estimate presence, not proficiency. And when used in high-stakes decisions, that distinction matters.

Transparency (or Lack Thereof)

In many inference systems, it’s hard to trace where a skill came from. Was it picked up from a keyword? Assumed from a job title? Correlated with others in similar roles? The logic is rarely visible, and that’s a problem, especially when decisions based on these inferences affect access to jobs, development, or promotion.

Presence ≠ Proficiency

Inferred skills suggest someone might have a capability. But hiring isn’t about possibility. It’s about evidence of capability. Saying you’ve led a team isn’t the same as doing it well. Collecting or observing actual examples of behaviour allows you to evaluate someone’s true competence at a claimed skill. 

Some platforms try to infer proficiency, too, but this is still inference, not measurement. No matter how smart the model, it’s still drawing conclusions from indirect data.

By contrast, validated assessments like structured interviews, simulations, and psychometric tools are designed to measure. They observe behaviour against defined criteria, use consistent scoring frameworks (like Behaviourally Anchored Rating Scales, or BARS), and provide a transparent, defensible basis for decision-making. In doing this, the level or proficiency of a skill can be placed on a properly calibrated scale. 

But here’s the thing: we don’t have to choose one over the other.

A Smarter Way Forward: The Hybrid Model

The real opportunity lies in combining the rigour of measurement with the scalability of inference.

Start with measurement
Define the skills that matter. Use structured tools to capture behavioural evidence. Set a clear standard for what good looks like. For example, define Behaviourally Anchored Rating Scales (BARS) when assessing interviews for skills. Using a framework like Sapia.ai’s Competency Framework is critical for defining what you want to measure. 

Layer in inference
Apply AI to scale scoring, add contextual nuance, and detect deeper patterns that human assessors might miss, especially when reviewing large volumes of data.

Anchor the whole system in transparency and validation
Ensure people understand how inferences are made by providing clear explanations. Continuously test for fairness. Keep human oversight in the loop, especially where the stakes are high. More information on ensuring AI systems are transparent can be found in this paper.

This hybrid model respects the strengths and limits of both approaches. It recognises that AI can’t replace human judgement, but it can enhance it. That inference can extend reach, but only measurement can give you higher confidence in the results.

The Bottom Line

Inference can support and guide, but only measurement can prove. And when people’s futures are on the line, proof should always win.

References

Ahuja, A. (2024). LinkedIn profile analysis reveals gender-based differences in self-presentation among Indian MBA graduates. Journal of Business and Psychology.

 

Read Online
Blog

Making Healthcare Hiring Human with Ethical AI

Hiring for care is unlike any other sector. Recruiters are looking for people who can bring empathy, resilience, and energy to the most demanding human roles. Whether it’s dental care, mental health, or aged care, new hires are charged with looking after others when they’re most vulnerable. The stakes are high. 

Hiring for care is exactly where leveraging ethical AI can make the biggest impact.

Hiring for the traits that matter

The best carers don’t always have the best CVs.

That’s why our chat-based AI interview doesn’t screen for qualifications. It screens for the the skills that matter when caring for others. The traits that define a brilliant care worker, things like:

Empathy, Self-awareness, Accountability, Teamwork, and Energy. 

The best way to uncover these traits is through structured behavioural science, delivered through an experience that allows candidates to open up. Giving candidates space to give real-life, open-text answers. With no time pressure or video stress. Then, our AI picks up the signals that matter, free from any demographic data or bias-inducing signals.

Candidates’ answers to our structured interview questions aren’t simply ticking boxes. They’re a window into how someone shows up under pressure. And they’re helping leading care organisations hire people who belong in care and those who stay.

Inclusion, built in

Inclusivity should be a core foundation of any talent assessment, and it’s a fundamental requirement for hirers in the care industry. 

When healthcare hirers use chat-based AI interviews, designed to be inclusive for all groups, candidates complete their interviews when and where they choose, without the bias traps of face-to-face or phone screening. There are no accents to judge, no assumptions, just their words and their story.

And it works:

  • 91.8% of carer candidates complete their interviews
  • Carer candidates report 9/10 Candidate Satisfaction with their interview experience 
  • 80% of candidates would recommend others to apply 
  • Every candidate receives personalised feedback, regardless of the outcome

Drop-offs are reduced, and engagement & employer brand advocacy go up. Building a brand that candidates want to work for includes providing a hiring experience that candidates want to complete. 

Real outcomes in care hiring

Our smart chat already works for some of the most respected names in healthcare and community services. Here’s a sample of the outcomes that are possible by leveraging ethical AI, a validated scientific assessment, wrapped in an experience that candidates love: 

Anglicare – a leading provider of aged care services
  • Time-to-offer dropped from 40+ days to just 14
  • Candidate pool grew by 30%
  • Turnover dropped by 63%
Abano Healthcare – Australasia’s largest dental support organisation
  • 1,213 recruiter hours saved  in the first month (67 hours per individual hiring team member) 
  • $25,000 saved in screening and interviewing time
Berry Street – a not for profit family & child services organisation
  • Time-to-hire down from 22 to 7 days
  • 95.4% of candidates completed their chat interviews

A smarter way to hire

The case study tells the full story of how Sapia.ai helped Anglicare, Abano Healthcare, and Berry Street transform their hiring processes by scaling up, reducing burnout, and hiring with heart. 

Download it here:

Read Online