Current cases of AI misalignment and their implications for future risks

Synthese 202 (5):1-23 (2023)
  Copy   BIBTEX

Abstract

How can one build AI systems such that they pursue the goals their designers want them to pursue? This is the alignment problem. Numerous authors have raised concerns that, as research advances and systems become more powerful over time, misalignment might lead to catastrophic outcomes, perhaps even to the extinction or permanent disempowerment of humanity. In this paper, I analyze the severity of this risk based on current instances of misalignment. More specifically, I argue that contemporary large language models and game-playing agents are sometimes misaligned. These cases suggest that misalignment tends to have a variety of features: misalignment can be hard to detect, predict and remedy, it does not depend on a specific architecture or training paradigm, it tends to diminish a system’s usefulness and it is the default outcome of creating AI via machine learning. Subsequently, based on these features, I show that the risk of AI alignment magnifies with respect to more capable systems. Not only might more capable systems cause more harm when misaligned, aligning them should be expected to be more difficult than aligning current AI.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,227

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

The Burdens of Life.Mark Wells - 2019 - Philosophia 47 (5):1613-1620.
Prudent Precaution in Clinical Trials of Nanomedicines.Gary E. Marchant & Rachel A. Lindor - 2012 - Journal of Law, Medicine and Ethics 40 (4):831-840.
Agricultural biotechnology and the future benefits argument.Jeffrey Burkhardt - 2001 - Journal of Agricultural and Environmental Ethics 14 (2):135-145.
Can informed consent to research be adapted to risk?Danielle Bromwich & Annette Rid - 2015 - Journal of Medical Ethics 41 (7):521-528.
Transportation.Jonathan L. Gifford - 2009 - In Jan Kyrre Berg Olsen Friis, Stig Andur Pedersen & Vincent F. Hendricks (eds.), A Companion to the Philosophy of Technology. Oxford, UK: Wiley-Blackwell. pp. 532–537.
Genetic discrimination and mental illness: a case report.J. G. Wong - 2001 - Journal of Medical Ethics 27 (6):393-397.
Future directions for the melioration model of addiction.Kris N. Kirby - 1996 - Behavioral and Brain Sciences 19 (4):583-583.

Analytics

Added to PP
2023-10-27

Downloads
49 (#326,216)

6 months
49 (#89,916)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Leonard Dung
Universität Erlangen-Nürnberg

Citations of this work

Understanding Artificial Agency.Leonard Dung - forthcoming - Philosophical Quarterly.

Add more citations

References found in this work

No references found.

Add more references