Motivated by the phenomenon of strategic agents gaming a recommendation system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms strategically misreport privately observed contexts to the learner. % under strategic context manipulation. We treat the algorithm design problem as one of emph{mechanism design} under uncertainty and propose the Optimistic Grim Trigger Mechanism (OptGTM) that minimizes regret while simultaneously incentivizing the agents to be approximately truthful. We show that…
Source: Read MoreÂ