The Responsible Use of LLMs in Structured Interviews

 

Large Language Models, or LLMs, are the engines that have driven some of the biggest technological advancements in recent years. OpenAI’s GPT-4, Google’s Bard, and Metas Llama are examples of LLMs that power interfaces like ChatGPT, which has changed how many of us work and manage aspects of our daily lives. 

At Sapia.ai, we use responsible AI to help organizations find the people who belong with their brands. Chat Interview enables our customers to interview every candidate, widening their talent pools and creating a more efficient, effective, and fair recruitment process. 

With the advent of LLMs with enhanced language understanding and reasoning, there is a promising opportunity to extend these models in the recruitment space. However, using LLMs for activities like interview scoring demands not only innovation but also a strong commitment to responsible use and ethics. Any use of technology to make hiring more efficient should also make it fairer and transparent. 

This blog post summarises the paper that we presented at SIOP 2024, titled “Responsible Use of Large Language Models for Response Grading and Explanations in Structured Interviews”, written by Yimeng Dai, Leo Pham, Ashlie Plants, and Buddhi Jayatilleke. 

The Beginning of InterviewLLM

Our goal was to create an LLM that could grasp the unique nuances of interview conversations, leading to the creation of InterviewLLM. This domain-specific model was trained on a rich dataset from our Chat Interview™ platform, encompassing over 1.3 billion words from diverse industries, job roles, and geographies. The result is a model that can understand and generate interview dialogues with fewer possible stereotypes. 

The Gold Standard in Interview Grading

Building on the capabilities of InterviewLLM, we developed Saige™ (Sapia AI for Interview Grading and Explanations), a tool that takes interview grading to new heights. Saige™ is fine-tuned with a grading rubric crafted by our subject matter experts, employing Behaviorally Anchored Rating Scales (BARS) to ensure that assessments of candidate responses are anchored in the description of observable behaviors. By adopting BARS, the gold standard for creating rubrics for scoring interviews by humans, Saige™ allows human SMEs to add new competencies to be rated by Saige™, which most of the current mainstream automated interview grading approaches lack.

We didn’t stop at creating a more intelligent grading system with extensibility and explainability. Using LLMs in interviews demands a strong ethical commitment. We recognized the importance of addressing potential biases in AI, which led us to implement a de-biasing process during Saige™’s training. By combining the expertise of our human SMEs with AI feedback, we created a model that not only delivers accurate grades but does so in a manner that is fair and objective.

Saige™ also leverages LLM to explain why a candidate received a certain score, referencing actual examples from the candidate’s response and a description of how it exemplifies or fails to exemplify, the characteristics of the attribute that were defined by the SMEs. The characteristics of each attribute are defined by not only the many facets of them but also the frequency and depth in which they are demonstrated.

The Impact and Implications of Saige™

The introduction of Saige™ marks a significant milestone in the recruitment industry. In testing, Saige™ has demonstrated exceptional performance compared to LLaMA-2, with a higher agreement and correlation with SME grading. 

The implications of Saige™ extend beyond just improved interview grading accuracy. By reducing biases in the grading process, we’re contributing to a more equitable hiring process, where candidates are evaluated based on their merits rather than unconscious prejudices.

Saige™’s ability to provide clear, natural language explanations for grading decisions creates a level of transparency not seen before in automated interviews. With the explanations, candidates can gain insights into areas for improvement and identify the attributes or abilities necessary for success in their careers; while hiring teams can learn more about each candidate and have confidence in their hiring decisions.

Possible Applications of Saige™

We’re at the start of our journey with InterviewLLM and Saige™. The opportunity to use this capability beyond assessment, to provide effective coaching to candidates and employees at scale is immense and exciting.  It could enable them to understand how they can improve, fostering continuous growth and professional development within the organization. As we continue to refine these Generative AI tools and explore new applications of LLMs in recruitment, we remain committed to our core principles of fairness and transparency.

About Author

Get started with Sapia.ai today

Hire brilliant with the talent intelligence platform powered by ethical AI
Speak To Our Sales Team