Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Vincent Conitzer; Rachel Freedman; Jobst Heitzig; Wesley H. Holliday; Bob M. Jacobs; Nathan Lambert; Milan Mosse; Eric Pacuit; Stuart Russell; Hailey Schoelkopf; Emanuel Tewolde; William S. Zwicker

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde & William S. Zwicker

Abstract

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Author Profiles

Eric Pacuit

University of Maryland, College Park

Rachel Freedman

Oxford University

Wesley H. Holliday

University of California, Berkeley

3 more

My notes

Similar books and articles

New gradients of error reinforcement in multiple-choice human learning.Melvin H. Marx & Marion E. Bunch - 1951 - Journal of Experimental Psychology 41 (2):93.

Diminished Feedback Evaluation and Knowledge Updating Underlying Age-Related Differences in Choice Behavior During Feedback Learning.Tineke de Haan, Berry van den Berg, Marty G. Woldorff, André Aleman & Monicque M. Lorist - 2021 - Frontiers in Human Neuroscience 15.

How to Create Shared Symbols.Nicolas Fay, Bradley Walker, Nik Swoboda & Simon Garrod - 2018 - Cognitive Science 42 (S1):241-269.

Responsible Management Education in the Digital Age: An Experiment with Liberal Art and Science Education in China.Liang Yu - 2023 - In Christian Hauser & Wolfgang Amann (eds.), The Future of Responsible Management Education: University Leadership and the Digital Transformation Challenge. Springer Verlag. pp. 79-98.

Gradients of error reinforcement in normal multiple-choice learning situations.Melvin H. Marx - 1957 - Journal of Experimental Psychology 54 (3):225.

Machine Learning, Functions and Goals.Patrick Butlin - 2022 - Croatian Journal of Philosophy 22 (66):351-370.

Learning strategic environments: an experimental study of strategy formation and transfer. [REVIEW]Andreas Nicklisch - 2011 - Theory and Decision 71 (4):539-558.

Automation, Alignment, and the Cooperative Interface.Julian David Jonker - forthcoming - The Journal of Ethics:1-22.

Theory choice, non-epistemic values, and machine learning.Ravit Dotan - 2020 - Synthese (11):1-21.

AI Ethics and Value Alignment for Nonhuman Animals.Soenke Ziesche - 2021 - Philosophies 6 (2):31.

Instilling moral value alignment by means of multi-objective reinforcement learning.M. Rodriguez-Soto, M. Serramia, M. Lopez-Sanchez & J. Antonio Rodriguez-Aguilar - 2022 - Ethics and Information Technology 24 (9).

Learning in context through conflict and alignment: Farmers and scientists in search of sustainable agriculture.Jasper Eshuis & Marian Stuiver - 2005 - Agriculture and Human Values 22 (2):137-148.

Understanding Human Decision Making in an Interactive Landslide Simulator Tool via Reinforcement Learning.Pratik Chaturvedi & Varun Dutt - 2021 - Frontiers in Psychology 11.

Performance in a Collaborative Search Task: The Role of Feedback and Alignment.Moreno I. Coco, Rick Dale & Frank Keller - 2018 - Topics in Cognitive Science 10 (1):55-79.

Analytics

Added to PP
2024-04-17

Downloads
21 (#734,423)

6 months
21 (#125,410)

Historical graph of downloads

How can I increase my downloads?

Author Profiles

Eric Pacuit

University of Maryland, College Park

Rachel Freedman

Oxford University

Wesley H. Holliday

University of California, Berkeley

3 more

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Abstract

Author Profiles

Categories

Keywords

Reprint years

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Author Profiles

Citations of this work

References found in this work