Large Language Models, or LLMs, are the engines that have driven some of the biggest technological advancements in recent years. OpenAI’s GPT-4, Google’s Bard, and Metas Llama are examples of LLMs that power interfaces like ChatGPT, which has changed how many of us work and manage aspects of our daily lives.
At Sapia.ai, we use responsible AI to help organizations find the people who belong with their brands. Chat Interview™ enables our customers to interview every candidate, widening their talent pools and creating a more efficient, effective, and fair recruitment process.
With the advent of LLMs with enhanced language understanding and reasoning, there is a promising opportunity to extend these models in the recruitment space. However, using LLMs for activities like interview scoring demands not only innovation but also a strong commitment to responsible use and ethics. Any use of technology to make hiring more efficient should also make it fairer and transparent.
This blog post summarises the paper that we presented at SIOP 2024, titled “Responsible Use of Large Language Models for Response Grading and Explanations in Structured Interviews”, written by Yimeng Dai, Leo Pham, Ashlie Plants, and Buddhi Jayatilleke.
Our goal was to create an LLM that could grasp the unique nuances of interview conversations, leading to the creation of InterviewLLM. This domain-specific model was trained on a rich dataset from our Chat Interview™ platform, encompassing over 1.3 billion words from diverse industries, job roles, and geographies. The result is a model that can understand and generate interview dialogues with fewer possible stereotypes.
Building on the capabilities of InterviewLLM, we developed Saige™ (Sapia AI for Interview Grading and Explanations), a tool that takes interview grading to new heights. Saige™ is fine-tuned with a grading rubric crafted by our subject matter experts, employing Behaviorally Anchored Rating Scales (BARS) to ensure that assessments of candidate responses are anchored in the description of observable behaviors. By adopting BARS, the gold standard for creating rubrics for scoring interviews by humans, Saige™ allows human SMEs to add new competencies to be rated by Saige™, which most of the current mainstream automated interview grading approaches lack.
We didn’t stop at creating a more intelligent grading system with extensibility and explainability. Using LLMs in interviews demands a strong ethical commitment. We recognized the importance of addressing potential biases in AI, which led us to implement a de-biasing process during Saige™’s training. By combining the expertise of our human SMEs with AI feedback, we created a model that not only delivers accurate grades but does so in a manner that is fair and objective.
Saige™ also leverages LLM to explain why a candidate received a certain score, referencing actual examples from the candidate’s response and a description of how it exemplifies or fails to exemplify, the characteristics of the attribute that were defined by the SMEs. The characteristics of each attribute are defined by not only the many facets of them but also the frequency and depth in which they are demonstrated.
The introduction of Saige™ marks a significant milestone in the recruitment industry. In testing, Saige™ has demonstrated exceptional performance compared to LLaMA-2, with a higher agreement and correlation with SME grading.
The implications of Saige™ extend beyond just improved interview grading accuracy. By reducing biases in the grading process, we’re contributing to a more equitable hiring process, where candidates are evaluated based on their merits rather than unconscious prejudices.
Saige™’s ability to provide clear, natural language explanations for grading decisions creates a level of transparency not seen before in automated interviews. With the explanations, candidates can gain insights into areas for improvement and identify the attributes or abilities necessary for success in their careers; while hiring teams can learn more about each candidate and have confidence in their hiring decisions.
We’re at the start of our journey with InterviewLLM and Saige™. The opportunity to use this capability beyond assessment, to provide effective coaching to candidates and employees at scale is immense and exciting. It could enable them to understand how they can improve, fostering continuous growth and professional development within the organization. As we continue to refine these Generative AI tools and explore new applications of LLMs in recruitment, we remain committed to our core principles of fairness and transparency.
Every day, we read stories of increased fake or AI-assisted applications. Tools like LazyApply are just one of many flooding the market, driving up applicant volumes to never-before-seen levels.
As an overwhelmed hiring function, how do you find the needle in the haystack without using an army of recruiters to filter through the maze?
At Sapia.ai, we help global enterprises do just that. Many of the world’s most trusted brands, such as Qantas Group, have relied on our hiring platform as a co-pilot for better hiring since 2020.
Our Chat Interview has given millions of candidates a voice they wouldn’t have had – enabling them to share in their own words why they’re the best fit for the role. To find the people who belong with their brands, our customers must trust that their candidates represent themselves. Thus, they want to trust that our AI is analysing real human answers—not answers from a machine.
The Rise of GPT
When ChatGPT went viral in November 2022, we immediately adopted a defensive strategy. We had long been flagging plagiarised candidate responses, but then, we needed to act fast to flag responses using artificially generated content (‘AGC’).
Many companies were in the same position, but Sapia.ai was the only company with a large proprietary data set of interview answers that pre-dated GPT and similar tools: 2.5 billion words written by real humans.
That data enabled us to build a world-first:- an LLM-based AGC detector for text-based interviews, recently upgraded to v2.0 with 99% accuracy and a false positive rate of 1%. An NLP classification model built on Sapia.ai proprietary data that operates across all Sapia.ai chat interviews.
Full Transparency with Candidates
Because we value candidate trust as much as customer trust, we wanted to be transparent with candidates about our ability to detect artificially generated content (AGC). As an LLM, we could identify AGC in real time and warn candidates that we had detected it.
This has had a powerful impact on candidate behaviour. Since our AGC detector went live, we have seen that the real-time flagging acts as a real-time disincentive to use tools like ChatGPT to generate interview responses.
The detector generates a warning if 3 or more answers are flagged as having artificially generated content. The Sapia.ai Chat Interview uses 5 open-ended interview questions for volume hiring roles, such as retail, contact centre, and customer service, and 6 questions for professional roles, such as engineers, data scientists, graduates, etc.
Let’s Take a Closer Look at the Data…
We see that using our AGC detector LLM to communicate live with candidates in the interview flow when artificial content has been detected has a positive effect on deterring candidates from using AI tools to generate their answers.
The rate of AGC use declines from 1 question flagged to 5 questions – raising the flag on one question is generally enough to deter candidates from trying again.
The graph below shows the number of candidates, from a total of almost 2.7m, that used artificially generated content in their answers.
Differences in AGC Usage Rate by Groups
We see no meaningful differences in candidate behaviour based on the job they are applying for or based on geography.
However, we have found differences by gender and ethnicity – for example, men use artificially generated content more than women. The graph below shows the overall completion ratios by gender – for all interviews on the left and for interviews where the number of questions with AGC detected is 5 or more on the right.
Perception of Artificially Generated Content by Hirers.
We’re curious to understand how hirers perceive the use of these tools to assist candidates in a written interview. The creation of the detector was based on the majority of Sapia.ai customers wanting transparency & explainability around the use of these tools by candidates, often because they want to ensure that candidates are using their own words to complete their interviews and they want to avoid wasting time progressing candidates who are not as capable as their chat interview suggests.
However, some of our customers feel that it’s a positive reflection of the candidate, showing that they are using the tools available to them to put their best foot forward.
It’s a mix of perspectives.
Our detector labels it as the use of artificially generated content. It’s up to our customers how they use that information in their decision-making processes.
This concept of having a human in the loop is one of the key dimensions of ethical AI, and we ensure that it is used in every AI-related hiring product we build.
Interested in the science behind it all? Download our published research on developing the AGC detector 👇
Read the full press release about the partnership here.
Joe & the Juice, the trailblazing global juice bar and coffee concept, is renowned for its vibrant culture and commitment to cultivating talent. With humble roots from one store in Copenhagen, now with a presence in 17 markets, Joe & The Juice has built a culture that fosters growth and celebrates individuality.
But, as their footprint expands, so does the challenge of finding and hiring the right talent to embody their unique culture. With over 300,000 applications annually, the traditional hiring process using CVs was falling short – leaving candidates waiting and creating inefficiencies for the recruitment team. To address this, Joe & The Juice turned to Sapia.ai, a pioneer in ethical AI hiring solutions.
Through this partnership, Joe & The Juice has transformed its hiring process into an inclusive, efficient, and brand-aligned experience. Instead of faceless CVs, candidates now engage in an innovative chat-based interview that reflects the brand’s energy and ethos. Available in multiple languages, the AI-driven interview screens for alignment with the “Juicer DNA” and the brand’s core values, ensuring that every candidate feels seen and valued.
Candidates receive an engaging and fair interview experience as well as personality insights and coaching tips as part of their journey. In fact, 93% of candidates have found these insights useful, helping to deliver a world-class experience to candidates who are also potential guests of the brand.
“Every candidate interaction reflects our brand,” Sebastian Jeppesen, Global Head of Recruitment, shared. “Sapia.ai makes our recruitment process fair, enriching, and culture-driven.”
For Joe & The Juice, the collaboration has yielded impressive results:
33% Reduction in Screening Time: Pre-vetted shortlists from Sapia.ai’s platform ensure that recruiters can focus on top candidates, getting them behind the bar faster.
Improved Candidate Satisfaction: With a 9/10 satisfaction score from over 55,000 interviews, candidates appreciate the fairness and transparency of the process.
Bias-Free Hiring: By eliminating CVs and integrating blind AI that prioritizes fairness, Joe & The Juice ensures their hiring reflects the diverse communities they serve.
Frederik Rosenstand, Group Director of People & Development at Joe & The Juice, highlighted the transformative impact: “Our juicers are our future leaders, so using ethical AI to find the people who belong at Joe is critical to our long-term success. And now we do that with a fair, unbiased experience that aligns directly with our brand.”
In an industry so wholly centred on people, Joe & the Juice is paving the way for similar brands to adopt technology that enables inclusive, human-first experiences that can reflect a brand’s core values.
If you’re curious about how Sapia.ai can transform your hiring process, check out our full case study on Joe & The Juice here.
It’s been a year of Big Moves at Sapia.ai. From welcoming groundbreaking brands to achieving incredible milestones in our product innovation and scale, we’re pushing the boundaries of what’s possible in hiring.
And we’re just getting started 🚀
Take a look at the highlights of 2024
All-in-one hiring platform
This year, with the addition of Live Interview, we’re proud to say our platform now covers screening, assessing and scheduling.
It’s an all-in-one volume hiring platform that enables our customers to deliver a world-leading experience from application through to offer.
Supercharging hiring efficiency
Every 15 seconds, a candidate is interviewed with Sapia.ai.
This year, we’ve saved hiring managers and recruiters hours of precious time that can now be used for higher-value tasks.
Giving candidates the best experience
Our platform allows candidates to be their best selves, so our customers can find the people that truly belong with them. They’re proud to use a technology that’s changing hiring, for good.
Leading the way in AI for hiring
We’ve continued to push the boundaries in leveraging ethical AI for hiring, with new products on the way for Coaching, Internal Mobility & Interview Builders.