Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

S Pokhriyal, S Jain, G Ghalme, S Dhamal… - arXiv preprint arXiv …, 2024 - arxiv.org
arXiv preprint arXiv:2402.05575, 2024arxiv.org
Existing approaches to fairness in stochastic multi-armed bandits (MAB) primarily focus on
exposure guarantee to individual arms. When arms are naturally grouped by certain attribute
(s), we propose Bi-Level Fairness, which considers two levels of fairness. At the first level, Bi-
Level Fairness guarantees a certain minimum exposure to each group. To address the
unbalanced allocation of pulls to individual arms within a group, we consider meritocratic
fairness at the second level, which ensures that each arm is pulled according to its merit …
Existing approaches to fairness in stochastic multi-armed bandits (MAB) primarily focus on exposure guarantee to individual arms. When arms are naturally grouped by certain attribute(s), we propose Bi-Level Fairness, which considers two levels of fairness. At the first level, Bi-Level Fairness guarantees a certain minimum exposure to each group. To address the unbalanced allocation of pulls to individual arms within a group, we consider meritocratic fairness at the second level, which ensures that each arm is pulled according to its merit within the group. Our work shows that we can adapt a UCB-based algorithm to achieve a Bi-Level Fairness by providing (i) anytime Group Exposure Fairness guarantees and (ii) ensuring individual-level Meritocratic Fairness within each group. We first show that one can decompose regret bounds into two components: (a) regret due to anytime group exposure fairness and (b) regret due to meritocratic fairness within each group. Our proposed algorithm BF-UCB balances these two regrets optimally to achieve the upper bound of on regret; being the stopping time. With the help of simulated experiments, we further show that BF-UCB achieves sub-linear regret; provides better group and individual exposure guarantees compared to existing algorithms; and does not result in a significant drop in reward with respect to UCB algorithm, which does not impose any fairness constraint.
arxiv.org