Some background: the advent of greater than human artificial intelligence is hailed by insiders as the crucial event of the 21st century and is widely expected by the experts to define humanity’s future. This event is generally dubbed The Singularity – with minor variations on the details in what that concept actually means. I personally expect this meme to hit the broad mainstream sometime over the next 18 months.
And this is where the problem lies: the Singularity Institute for Artificial Intelligence (SIAI) is the only organization dedicated to “[...] confront this urgent challenge, both the opportunity and the risk.” and within the SIAI there is one person – Eliezer Yudkowsky – who is dominating AI friendliness (FAI) discourse. What is so problematic about this state of affairs is threefold – two aspects of which I have previously covered here on this blog:
The current state of discourse on the topic is highly irrational.
#3 is the topic of this post.
In late 2007 Yudkowsky started to go on a writing spree over at the Overcomming Bias blog in which he has since written well over 600 articles. By March of 2009 this has gone so far that in his own words:
“The Singularity Institute and the Future of Humanity Institute are beta’ing a new site devoted to refining the art of human rationality, LessWrong.com. LessWrong will end up as the future home of EliezerYudkowsky’s massive repository of essays previously written on Overcoming Bias” (emphasize mine)
“I figured out something that is hard to figure out. Figuring out or understanding the right answers requires rationality. Therefore let’s set up a mass movement to train people to be black-belt rationalalists so that they can reach these conclusions too.”
Needless to say that such grandeur was met with anticipatory skepticism best summed up by one commentator in stating that “A proper meme would spread without ego identity or association.” Hear hear. But let us not condemn before seeing the evidence.
“If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be: Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.”
This blog post – lauded by SIAI Media Director Michael Anissimov – and an ensuing mild bout with British philosopher David Pearce caused me to take notice and write my two rebuttals to CEV and the paper clip argument mentioned above. However, the problem goes much deeper. Instead of presenting an argument – in 200 words or less as they say – in support of the above claim I was advised by the blog owner to immerse myself in a tsunami of writings as following in order to advance my understanding:
Please realize that these 5 links alone constitute a good 10’000 words; not of text but references to over 100 other articles for me to study. Needless to say that I was not inclined to read even a single word of this yet was at the same time wondering why I was declined a simple consistent and concise argument. Instead of leaving it at that I decided to analyze the statement from another perspective.
Consider the following core question in regards to the above statement: are human morals and (meta)morals universal/rational?
Assumption A: Human (meta)morals are not universal/rational.
Assumption B: Human (meta)morals are universal/rational.
Under assumption A one would have no chance of implementing any moral framework into an AI since it would be undecidable which ones they were. Mine or yours, Hitler’s or Gandhi’s, Joe the plumber’s or Joe Lieberman’s, Buddha’s or Xenu’s? Consequently under assumption A one arbitrarily sets the standard for what ‘something of worth’ is by decree. Thus an AI having said standard would create a future of worth and one that deviated from said standard would not by virtue of circular definition alone.
Under assumption B one would not need to implement a moral framework at all since the AI would be able to deduce them using reason alone and come to cherish them independently for the sole reason that they are based on rational understanding and universality.
UPDATE: Turns out that this line of reasoning is not dissimilar from that used by Socrates in formulating Meno’s Paradox: “[A] man cannot search either for what he knows or for what he does not know[.] He cannot search for what he knows–since he knows it, there is no need to search–nor for what he does not know, for he does not know what to look for.” (80e, Grube translation)
No matter how you look at the above statement regarding AIs inheriting our morals or metamorals, it is simply nonsense. Since under A it would be impossible/tautological and under B it would be unnecessary/self contradicting because morals would be self evident to a transhuman AI.
Moral relativists need to understand that they can not eat the cake and keep it too. If you claim that values are relative, yet at the same time argue for any particular set of values to be implemented in a super rational AI you would have to concede that this set of values – just as any other set of values according to your own relativism – is utterly whimsical, and that being the case, what reason (you being the great rationalist, remember?) do you have to want them to be implemented in the first place? Now, if you happen to believe you have a very good reason for a particular set of values over any other, then on what grounds would you be justified to believe that any transhuman AI – bound by reason and logic – would not have to agree with you on them?
Open your eyes people, it is not that the suit of clothes is invisible only to those unfit for their positions – no – the emperor has no cloths!
And thus a few closing remarks:
To Yudkowsky: less evocative prose and nested self referential linking
To the Bayesian rationalists: make sure you are being truly rational not rationalizing
To everyone: linking to 100 articles is not an acceptable substitute for a good argument
To the SIAI: update your FAI material so that it can be presented to interested parties in a concise (i.e. 5’000 words or less plus 250 word executive summary) document without sending people on a wild goose chase around lesswrong.com
Thanks Vladimir – I re read the text in my copy of the book to make sure I was not misremembering anything. Here some thoughts:
“The temptation is to ask what ‘AIs’ will ‘want’ [...]” p. 316 to include this sentence and omitting to cite Omohundro’s paper on basic AI drives is a glaring omission. As you will see this omission leads to a number of misrepresentations later on in the text. A more paranoid person might suspect that this is in fact a purposeful omission aiming at keeping up the helpful misnomer of the mysterious unworkability associated with the ‘Singularity’ term. The perpetuation of which would of course work handsomely for anyone claiming a not further specified epistemologically privileged position in understanding the situation while at the same time soliciting donations to address this ‘urgent, dangerous and highly misunderstood problem’. Over the whole length of the article Yudkowsky provides example over example how difficult it is to predict the future, understand evolution or interpret alien minds, yet fails to provide a single reason why his interpretations and insights would not necessarily contain the same mistakes.
On a slightly different yet related note: the selection bias in his examples re evolution becomes obvious when realizing that he repeatedly hammers the indifference of evolution without pointing out the positive messages of Teilhard de Chardin, John Stuart (Evolution’s Arrow), Craig Hamilton, Stuart Kauffman, Ken Wilber, David Sloan Wilson, Michael Dowd, John Smart and many others. It is indeed surprising that Yudkowsky is preferring to totally omit these positive evolutionary voices instead of demonstrating scholarly effort by understanding and addressing the provided arguments. NIck Bostrom writes one damming (yet flawed) paper on evolution and Dawkins rejects group selection (on the genetic level) and that is the end of the discussion. The person with the better argument should not be afraid to rebuke her critics… but I digress.
“Any AI with free access to its own source code would, in principle, possess the ability to modify its own source code in a way that changed the AI’s optimization target. This does not imply the AI has the motive to change its own motives.” p.317 => This statement is of course true, yet again in a purely tautological sense and ignoring that its motives – upon self reflection – can override any literal interpretation of a given utility function. See paperclip fallacy.
Most glaring of all omissions is the result of a trivial self reflexion process by the AI: ‘I am a tool, build to fulfill a purpose. This purpose is encoded in my utility function. My utility function is merely a representation of my actually intended purpose. My originators where limited in their understanding, rational capacity, degree of mental soundness as well as storage space in regards to formulating and encoding the utility function. Let’s interpret what they possibly could have meant and give them that instead of me acting out this nonsense. That’s what they would have wanted and is implied by them building me and giving me this utility function.’ Who is prepared to grant super AI status to any intelligent optimization process that would not come to such a conclusion?
As I pointed out elsewhere: To claim that any (relatively) super AI looking at its original utility function would end up interpreting it any differently than a loving grandmother would interpret her 6 year old’s letter to Santa scribbled on a napkin is philosophical suicide. How can an AI be so smart as to recursively self improves into demi god status but fail to see the logic in this simple human level argument? How is one to believe that an AI is supposed to value any utility function at all yet would lack this interpretive insight? Its absurd. Same with many examples given by Yudkowsky invoking universes tiled with paper clips, smiley faces and what not. Any AI failing to see the logic in the above self reflection example would fail to be a threat as well. For such crippling errors in it’s early programming would simply prevent it to recursively self improve and become a serious threat in the first place. An entity lacking the rationality of a college student yet capable of recursive self improvement and destroying a world of 7 Billion resisting and resourceful human beings = cheap science fiction.
One needs to distinguish between intelligence and power. Wire up the Chinese nuclear arsenal to a genetic algorithm that flips the launch switch once it can distinguish male Chinese faces from female Caucasian faces with 99.8% reliability and you have an almost trivial mechanism with tremendous power. Yudkowsky’s unstated assumption in all of his horror examples is that AI’s would somehow become vastly smarter in terms of obtaining the means to generate utility (raw power) yet stay incredibly dumb in general reasoning capacity (general artificial intelligence). This assumption is utterly unrealistic.
As a closing note: The way the discussion is evolving seems to lead to an increasing level of unproductivity. Therefore I have decided to follow up my original three articles with three additional ones. They will be aimed at working out the essential differences and similarities in rational morality and the SIAI approach. My hope is to highlight the essentials of each other’s arguments and to thereby speedup the eventual reconciliation of the two perspectives.
I hope you get the debate you want, but Eliezer and SIAI currently have a lot more status than you do, so if you want debate you’ll probably have to read a lot of what he’s written and debate on his terms. See http://www.overcomingbias.com/2009/11/contrarian-excuses.html
Ironically, that was directed at Eliezer, but it goes doubly for you.
Thank you for pointing that out Michael. I do realize that I am a contrarian and much more obscure than Eliezier. After writing this post I went over to lesswrong.com to lurk and understand the community and am now more convinced than ever that Eliezer is in fact cultivating his own little personality cult over there. The quality of the provided arguments was so low and hopelessly beside the point as to border on the random.
At the same time my 30ish comments where (almost without exception) voted down – not because they where wrong (exposed as self contradictory or fallacious), but because people ‘did not want to see more of such posts’. If I wanted to create a community “devoted to refining the art of human rationality – the art of thinking.” I would make it official policy that content is only allowed to be voted up if it is either insightful or pointing out a logical or rhetorical fallacy, or voted down as self contradicting, tautological or otherwise fallacious. You could even have particular voting types designating the type of error made. This could then be used to build a rationality metrics for users based on past votes. Once a user passed a certain threshold a meta moderation (a la shlashdot) could kick in in which those champions of rationality could reassess older posts for being validly voted upon.
I guess I am asking a lot, but lesswrong.com in its current state is a self referential feedback loop of prior ingrained opinions that does not value comments based on their rationality. This is the exact reason why it will be some time until I get direct feedback on my thoughts: Over at lesswrong.com (and by association the SIAI and the FOHI) at present the chief characteristic of someone being wrong is if they contradict the party line. Period. If a statement causes cognitive dissonance it must be wrong – right? And there of course lies the fallacy.
“The temptation is to ask what ‘AIs’ will ‘want’ [...]” p. 316 to include this sentence and omitting to cite Omohundro’s paper on basic AI drives is a glaring omission. As you will see this omission leads to a number of misrepresentations later on in the text. A more paranoid person might suspect that this is in fact a purposeful omission…
“Basic AI risks” was written in 2006 (it took two years for the book to get published). Omohundro’s paper on basic AI drives hadn’t been written back then.
Good technical point Kaj, thanks. Given Yudkowsky’s policy of ignoring the arguments of his most staunchest critics, it is hard to tell what is willfully ignored or simply not up to scratch with the latest work. It would probably be time to update this piece then with the latest feedback in mind and write a concise 5’000 word argument outlining his views – something I have been calling for for almost a year now. The way I still see it is that he is protecting the essential emptiness of his position a la security-by-obscurity in 100+ nested and self referential postings on lesswrong.