Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this — typically using fixed criteria such as “helpfulness” and “harmfulness”. In our work, we instead propose using flexible, instruction-specific criteria as a means of broadening the impact that reinforcement learning can have in eliciting instruction following. We propose “Reinforcement Learning from Checklist Feedback” (RLCF). From instructions, we extract checklists and evaluate how well responses satisfy each item – using both AI judges and specialized…
Source: Read MoreÂ