Engineering AI for provable retention of objectives over time

AI Magazine 45 (2):1-11 (2024)
  Copy   BIBTEX

Abstract

I argue that ensuring artificial intelligence (AI) retains alignment with human values over time is critical yet understudied. Most research focuses on static alignment, neglecting crucial retention dynamics enabling stability during learning and autonomy. This paper elucidates limitations constraining provable retention, arguing key gaps include formalizing dynamics, transparency of advanced systems, participatory scaling, and risks of uncontrolled recursive self-improvement. I synthesize technical and ethical perspectives into a conceptual framework grounded in control theory and philosophy to analyze dynamics. I argue priorities should shift towards capability modulation, participatory design, and advanced modeling to verify enduring alignment. Overall, I argue that realizing AI safely aligned throughout its lifetime necessitates translating principles into formal methods, demonstrations, and systems integrating technical and humanistic rigor.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,227

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Arithmetical interpretations of dynamic logic.Petr Hájek - 1983 - Journal of Symbolic Logic 48 (3):704-713.
Proof vs Provability: On Brouwer’s Time Problem.Palle Yourgrau - 2020 - History and Philosophy of Logic 41 (2):140-153.
The End-Use Problem in Engineering Ethics.C. Thomas Rogers - 1980 - PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1980:464 - 480.
First-list retention and time and method of recall.John P. Houston - 1966 - Journal of Experimental Psychology 71 (6):839.
On the Provable Contradictions of the Connexive Logics C and C3.Satoru Niki & Heinrich Wansing - 2023 - Journal of Philosophical Logic 52 (5):1355-1383.
The relative effect of a time interval upon learning and retention.L. M. Johnson - 1939 - Journal of Experimental Psychology 24 (2):169.
Is a provable measure of time possible-on the protophysics of time.P. Rohs - 1986 - Philosophische Rundschau 33 (1-2):133-151.
Intuitionistically provable recursive well-orderings.Harvey M. Friedman & Andre Scedrov - 1986 - Annals of Pure and Applied Logic 30 (2):165-171.

Analytics

Added to PP
2024-04-14

Downloads
5 (#1,544,164)

6 months
5 (#647,370)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references