Back to Blog
Home
Resources
Blog
Question-aware outlier answer detection for fairer AI scoring of interviews

Question-aware outlier answer detection for fairer AI scoring of interviews

04 Apr 2023

4 min read

Question-aware outlier answer detection for fairer AI scoring of interviews

Artificial Intelligence-based interview scoring learns from past interview answers, which makes it hard for it to determine if a candidate is legitimately answering the question if their response includes context or an example rarely seen in training data.

Moreover, AI interview platforms may be susceptible to adversarial inputs where an irrelevant answer may receive a high score. Both scenarios raise fairness concerns and can erode trust in AI job interviews (Madaio et al, 2020).

This is why identifying outliers that differ significantly from the majority of answers and flagging them for manual review become crucial steps toward responsible and fair use of AI interview software. While simple rule-based methods (Reiz and Pongor, 2011) could help filter out some irrelevant answers based on answer length and regular expressions, these methods do not take into account the context and content of the answer and question. Someone may describe a very unique, yet relevant, situation in response to an AI for interviews question, which you wouldn’t want to disregard.

In this study, we introduce an unsupervised, question-aware, multi-context outlier detection model that can help detect anomalous answers contextually and semantically. The unsupervised approach is deemed to be more practical compared to a supervised model that requires a large labeled dataset of outlier answers. It helps bootstrap an outlier detector that can then be enhanced through human feedback.

We tested the outlier model to ascertain how well it is able to correctly identify 177,691 actual hired candidate interview answers from outliers, (e.g., movie reviews, news articles, nonsensical text, and sentences generated using BERT (Vaswani et al, 2017) with random starting words).

Our model outperformed the baseline One-class SVM outlier detector (Li et al, 2003), in detecting outliers from actual interview answers. The performance of our model over the baseline unsupervised model can be explained by both question-aware learning and multi-context learning, which help yield better contextual representations for detecting outlier answers from typical interview answers.

We also conducted a human evaluation on 10,689 interview answers of candidates who were not hired and might have provided outlier answers. Our model predicted 0.16% of the answers as outliers with only 5.9% of them being false positives. All of these false predictions describe contexts related to family and personal life in their answers but are relevant to the question. It is reasonable that these answers are labeled as an outlier by our model since they are contextually and semantically different from most interview answers.

While a data-driven AI interviewer can help counter flaws in human interviewers, answers that are significantly different to training data can lead to spurious predictive outcomes. In this study, we show how a question-aware multi-context outlier detection model could be applied to identify outlier answers. Flagging such answers for human review enhances fairness as well as provides a supervised signal to improve the outlier detection model over time.

References:

Dai, Y., Qi, J., & Zhang, R. (2020). Joint recognition of names and publications in academic homepages. In Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 133-141).

Li, K. L., Huang, H. K., Tian, S. F., & Xu, W. (2003). Improving one-class SVM for anomaly detection. In Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693) (Vol. 5, pp. 3077-3081). IEEE.

Madaio, M. A., Stark, L., Wortman Vaughan, J., & Wallach, H. (2020). Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-14).

Reiz, B., & Pongor, S. (2011). Psychologically Inspired, Rule-Based Outlier Detection in Noisy Data. In SYNASC (pp. 131-136)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

About Author

Laura Belfield

Head of Marketing

Question-aware outlier answer detection for fairer AI scoring of interviews

About Author

AI Recruitment Tools: Conversational AI recruiting is a game-changer

AI Bias in Hiring: 5 Strategies and Tools to Fix It

Why generic AI belongs nowhere near your hiring process

Get started with Sapia.ai today

Solutions

Products

Resources

Company

Legal