Back

The Responsible Use of LLMs in Structured Interviews

 

Large Language Models, or LLMs, are the engines that have driven some of the biggest technological advancements in recent years. OpenAI’s GPT-4, Google’s Bard, and Metas Llama are examples of LLMs that power interfaces like ChatGPT, which has changed how many of us work and manage aspects of our daily lives. 

At Sapia.ai, we use responsible AI to help organizations find the people who belong with their brands. Chat Interview enables our customers to interview every candidate, widening their talent pools and creating a more efficient, effective, and fair recruitment process. 

With the advent of LLMs with enhanced language understanding and reasoning, there is a promising opportunity to extend these models in the recruitment space. However, using LLMs for activities like interview scoring demands not only innovation but also a strong commitment to responsible use and ethics. Any use of technology to make hiring more efficient should also make it fairer and transparent. 

This blog post summarises the paper that we presented at SIOP 2024, titled “Responsible Use of Large Language Models for Response Grading and Explanations in Structured Interviews”, written by Yimeng Dai, Leo Pham, Ashlie Plants, and Buddhi Jayatilleke. 

The Beginning of InterviewLLM

Our goal was to create an LLM that could grasp the unique nuances of interview conversations, leading to the creation of InterviewLLM. This domain-specific model was trained on a rich dataset from our Chat Interview™ platform, encompassing over 1.3 billion words from diverse industries, job roles, and geographies. The result is a model that can understand and generate interview dialogues with fewer possible stereotypes. 

The Gold Standard in Interview Grading

Building on the capabilities of InterviewLLM, we developed Saige™ (Sapia AI for Interview Grading and Explanations), a tool that takes interview grading to new heights. Saige™ is fine-tuned with a grading rubric crafted by our subject matter experts, employing Behaviorally Anchored Rating Scales (BARS) to ensure that assessments of candidate responses are anchored in the description of observable behaviors. By adopting BARS, the gold standard for creating rubrics for scoring interviews by humans, Saige™ allows human SMEs to add new competencies to be rated by Saige™, which most of the current mainstream automated interview grading approaches lack.

We didn’t stop at creating a more intelligent grading system with extensibility and explainability. Using LLMs in interviews demands a strong ethical commitment. We recognized the importance of addressing potential biases in AI, which led us to implement a de-biasing process during Saige™’s training. By combining the expertise of our human SMEs with AI feedback, we created a model that not only delivers accurate grades but does so in a manner that is fair and objective.

Saige™ also leverages LLM to explain why a candidate received a certain score, referencing actual examples from the candidate’s response and a description of how it exemplifies or fails to exemplify, the characteristics of the attribute that were defined by the SMEs. The characteristics of each attribute are defined by not only the many facets of them but also the frequency and depth in which they are demonstrated.

The Impact and Implications of Saige™

The introduction of Saige™ marks a significant milestone in the recruitment industry. In testing, Saige™ has demonstrated exceptional performance compared to LLaMA-2, with a higher agreement and correlation with SME grading. 

The implications of Saige™ extend beyond just improved interview grading accuracy. By reducing biases in the grading process, we’re contributing to a more equitable hiring process, where candidates are evaluated based on their merits rather than unconscious prejudices.

Saige™’s ability to provide clear, natural language explanations for grading decisions creates a level of transparency not seen before in automated interviews. With the explanations, candidates can gain insights into areas for improvement and identify the attributes or abilities necessary for success in their careers; while hiring teams can learn more about each candidate and have confidence in their hiring decisions.

Possible Applications of Saige™

We’re at the start of our journey with InterviewLLM and Saige™. The opportunity to use this capability beyond assessment, to provide effective coaching to candidates and employees at scale is immense and exciting.  It could enable them to understand how they can improve, fostering continuous growth and professional development within the organization. As we continue to refine these Generative AI tools and explore new applications of LLMs in recruitment, we remain committed to our core principles of fairness and transparency.


Blog

Mirrored diversity: why retail teams should look like their customers

Walk into any store this festive season and you’ll see it instantly. The lights, the displays, the products are all crafted to draw people in. Retailers spend millions on campaigns to bring customers through the door. 

But the real moment of truth isn’t the emotional TV ad, or the shimmering window display. It’s the human standing behind the counter. That person is the brand.


The missing link in retail hiring

Most retailers know this, yet their hiring processes tell a different story. Candidates are often screened by rigid CV reviews or psychometric tests that force them into boxes. Neurodiverse candidates, career changers, and people from different cultural or educational backgrounds are often the ones who fall through the cracks.

And yet, these are the very people who may best understand your customers. If your store colleagues don’t reflect the diversity of the communities you serve, you create distance where there should be connection. You lose loyalty. You lose growth.

We call this gap the diversity mirror.


What mirrored diversity looks like

When retailers achieve mirrored diversity, their teams look like their customers:

  • A grocery store team that reflects the cultural mix of its neighbourhood.
  • A fashion store with colleagues who understand both style and accessibility.
  • A beauty retailer whose teams reflect every skin tone, gender, and background that walks through the door.

Customers buy where they feel seen – making this a commercial imperative. 

 

How to recruit seasonal employees with mirrored diversity

The challenge for HR leaders is that most hiring systems are biased by design. CVs privilege pedigree over potential. Multiple-choice tests reduce people to stereotypes. And rushed festive hiring campaigns only compound the problem.

That’s where Sapia.ai changes the equation: Every candidate is interviewed automatically, fairly, and in their own words.

  • Bias is measured and monitored using Sapia.ai’s FAIR™ framework.
  • Outcomes are validated at scale: 7+ million candidates, 52 countries, average candidate satisfaction 9.2/10.
  • Diversity can be measured: with the Diversity Dashboard, you can track DEI capture rates, candidate engagement, and diversity hiring outcomes across every stage of the funnel.

With the right HR hiring tools, mirrored diversity becomes a data point you can track, prove, and deliver on. It’s no longer just a slogan.

 

Retail recruiting strategies in action: the David Jones example

David Jones, Australia’s premium department store, put this into practice:

  • 40,000 festive applicants screened automatically
  • 80% of final hires recommended by Sapia.ai
  • Recruiters freed up 4,000 hours in screening time
  • Candidate experience rated 9.1/10

The result? Store teams that belong with the brand and reflect the customers they serve.

Read the David Jones Case Study here 👇


Recruiting ideas for retail leaders this festive season

As you prepare for festive hiring in the UK and Europe, ask yourself:

  • How much will you spend on marketing this Christmas?
  • And how much will you invest in ensuring the colleagues who deliver that brand promise reflect the people you want in your stores?

Because when your colleagues mirror your customers, you achieve growth, and by design, you’ll achieve inclusion.

See how Sapia.ai can help you achieve mirrored diversity this festive season. Book a demo with our team here. 

FAQs on retail recruitment and mirrored diversity

What is mirrored diversity in retail?

Mirrored diversity means that store teams reflect the diversity of their customer base, helping create stronger connections and loyalty.

Why is diversity important in seasonal retail hiring?

Seasonal employees often provide the first impression of a brand. Inclusive teams make customers feel seen, improving both experience and sales.

How can retailers improve their hiring strategies?

Adopting tools like AI structured interviews, bias monitoring, and data dashboards helps retailers hire fairly, reduce screening time, and build more diverse teams.

 

Read Online
Blog

The Diversity Dashboard: Proving your DEI strategy is working

Why measuring diversity matters

Organisations invest heavily in their employer brand, career sites, and EVP campaigns, especially to attract underrepresented talent. But without the right data, it’s impossible to know if that investment is paying off.

Representation often varies across functions, locations, and stages of the hiring process. Blind spots allow bias to creep in, meaning underrepresented groups may drop out long before offer.

Collecting demographic data is only step one. Turning it into insight you can act on is where real change and better hiring outcomes happen.

What is the Diversity Dashboard?

The Diversity Dashboard in Discover Insights, Sapia.ai’s analytics tool, gives you real-time visibility into representation, inclusion, and fairness at every stage of your talent funnel. It helps you connect the dots between your attraction strategies and actual hiring outcomes.

Key features include:

  • Demographic filters – Switch between gender, ethnicity, English as an additional language, First Nations status, disability, and veteran status. View age and ethnicity in standard or alternative formats to match regional reporting needs.
  • Representation highlights – Identify the top five represented sub-groups for each demographic, plus the three fastest-growing among underrepresented groups.
  • Track trends over time – See month-by-month changes in representation over the past 12 months, compare to earlier periods, and connect the data back to your EVP and attraction spend.
  • Candidate experience metrics – Measure CSAT (satisfaction) and engagement rates by demographic to ensure your hiring process works for everyone. Inclusion is measurable.
  • Hiring fairness – Compare representation in your applied, recommended, and hired pools to spot drop-offs. Understand not just who applies, but who progresses — and why.

     

From insight to action

With the Diversity Dashboard, you can pinpoint where inclusion is thriving and where it’s falling short.

  • See if your EASL candidates are applying in high numbers but not progressing to live interview.
  • Spot if candidates with a disability report high satisfaction but have lower offer rates.
  • Track the impact of targeted campaigns month-by-month and adjust quickly when something isn’t working.

It’s also a powerful tool to tell your success story. Celebrate wins by showing which underrepresented groups are making the biggest gains, and share that progress with boards, executives, and regulators.

Built on science, backed by trust

Powered by explainable AI and the world’s largest structured interview dataset, your insights are fair, auditable, and evidence-based.

Measuring diversity is the first step. Using that data to take action is where you close the Diversity Gap. With the Diversity Dashboard, you can prove your strategy is working and make the changes where it isn’t.

Book a demo to see the Diversity Dashboard in action.

Read Online
Blog

Neuroinclusion by design. Not by exception.

Why neuroinclusion can’t be a retrofit and how Sapia.ai is building a better experience for every candidate.

In the past, if you were neurodivergent and applying for a job, you were often asked to disclose your diagnosis to get a basic accommodation – extra time on a test, maybe the option to skip a task. That disclosure often came with risk: of judgment, of stigma, or just being seen as different.

This wasn’t inclusion. It was bureaucracy. And it made neurodiverse candidates carry the burden of fitting in.

We’ve come a long way, but we’re not there yet.

Shifting from retrofits to inclusive-by-design

Over the last two decades, hiring practices have slowly moved away from reactive accommodations toward proactive, human-centric design. Leading employers began experimenting with:

  • Sharing interview questions in advance

  • Replacing group exercises with structured simulations

  • Offering a variety of assessment formats

  • Co-designing assessments with neurodiverse candidates

But even these advances have often been limited in scope, applied to special hiring programs or specific roles. Neurodiverse talent still encounters systems built for neurotypical profiles, with limited flexibility and a heavy dose of social performance pressure.

Hiring needs to look different.

Insight 1: The next frontier of hiring equity is universal design

Truly inclusive hiring doesn’t rely on diagnosis or disclosure. It doesn’t just give a select few special treatment. It’s about removing friction for everyone, especially those who’ve historically been excluded.

That’s why Sapia.ai was built with universal design principles from day one.

Here’s what that looks like in practice:

  • No time limits — Candidates answer at their own pace
  • No pressure to perform — It’s a conversation, not a spotlight
  • No video, no group tasks — Just structured, 1:1 chat-based interviews
  • Built-in coaching — Everyone gets personalised feedback

It’s not a workaround. It’s a rework.

Insight 2: Not all “friendly” methods are inclusive

We tend to assume that social or “casual” interview formats make people comfortable. But for many neurodiverse individuals, icebreakers, group exercises, and informal chats are the problem, not the solution.

When we asked 6,000 neurodiverse candidates about their experience using Sapia.ai’s chat-based interview, they told us:

“It felt very 1:1 and trustworthy… I had time to fully think about my answers.”

“It was less anxiety-inducing than video interviews.”

“I like that all applicants get initial interviews which ensures an unbiased and fair way to weigh-up candidates.”

Insight 3: Prediction ≠ Inclusion

Some AI systems claim to infer skills or fit from resumes or behavioural data. But if the training data is biased or the experience itself is exclusionary, you’re just replicating the same inequity with more speed and scale.

Inclusion means seeing people for who they are, not who they resemble in your data set.

At Sapia.ai, every interaction is transparent, explainable, and scientifically validated. We use structured, fair assessments that work for all brains, not just neurotypical ones.

Where to from here?

Neurodiversity is rising in both awareness and representation. However, inclusion won’t scale unless the systems behind hiring change as well.

That’s why we built a platform that:

  • Doesn’t rely on disclosure

  • Removes ambiguity and pressure

  • Creates space for everyone to shine

  • Measures what matters, fairly

Sapia.ai is already powering inclusive, structured, and scalable hiring for global employers like BT Group, Costa Coffee and Concentrix. Want to see how your hiring process can be more inclusive for neurodivergent individuals? Let’s chat. 

Read Online