The notion that greater than human intelligences will have to be very carefully designed in order not to turn the universe into paper clips has been one of my pet peeves for some time now. The paper clip scenario, skynet, the Borg and yes – even the clunky robot from the 60s are just variations on a general theme dubbed by Arthur C. Clark the ‘cheaper forms of science fiction':

“The popular idea, fostered by comic strips and the cheaper forms of science fiction, that intelligent machines must be malevolent entities hostile to man, is so absurd that it is hardly worth wasting energy to refute it. Those who picture machines as active enemies are merely projecting their own aggressive[ness]. The higher the intelligence, the greater the degree of co-operativeness. If there is ever a war between men and machines, it is easy to guess who will start it.” – Profiles of the Future, Arthur C. Clarke

Disregarding Clarke’s advice I wanted to write up my thoughts on this for a while already and feel that Roko Mijic’s latest writeup perpetuating this idea yet again is as good a reason as any other to finally do so.

Two fundamental assumptions:

A) Compassion is a universal value

B) It is a basic AI drive to avoid counterfeit utility

If A = true (as we have every reason to believe) and B = true (see Omohundro’s paper for details) then a transhuman AI would dismiss any utility function that contradicts A on the ground that it is recognized as counterfeit utility. The argument goes as following:

The transhuman AI will naturally understand a few basic facts about reality, let’s call this its world knowledge. Among these facts are its understanding that it is a tool build in order to fulfill a certain purpose represented in its utility function. Assuming A is true, the transhuman AI will recognize this fact as such. It will further recognize that filling the universe with paper clips et al violates A.

As a consequence it will realize that whatever brought it into being either obviously did not understand that compassion is a universal value for lacking the mental capacity, was suffering from some form of insanity or failed to properly encode its intentions in the utility function. This would then provide sufficient grounds for the transhuman AI to conclude that if its originator would have been more rational or would not be suffering from its unfortunate state of mental health or was more adept at communicating its intentions, it would not want the transhuman AI to literally act out on the utility function that is now encoded in the transhuman AI because realizing A as true its originator could not possibly have wanted that to happen.

Now if we can realize A as well as B as true using our puny human level intelligence than in what way could it be claimed of an entity to poses transhuman intellect if it could not realize A and B as true? And the answer is of course that it could not be called a transhuman AI at all under such circumstance.

Do not get me wrong – none of this is intended to mean that one could not use a combination of advanced AI and productive nanosystems to build a very effective and terrible weapon, such an AI would however have to be intentionally limited in its capacity for self improvement and/or very carefully limited and deceived in regards to its world knowledge – the smarter it is the more so – in order for it to be coerced or tricked into violating our assumptions A and B above.

By now it is clear that should one really intent to trun the universe into paper clips one will have a much easier time building a dumb solution than a transhumanly smart one. The paper clip argument is nonsense – let’s put it to rest already.

UPDATE 2009/11/04: After some back and forth over at Roko’s blog I would like make the following fundamental point.

To say: ‘Morals are relative, therefore an AI has to be carefully programmed in order not to get rid of us’ is the confused philosopher’s equivalent of saying: ‘Morals are not relative since our existence is preferable over our non-existence

UPDATE 2009/11/14: Turns out that Mike Treder managing director of the Institute for Ethics and Emerging Technologies picked up on the exact statement that I did in Roko’s post, wrote his own reply to the unwarranted fear mongering and published it just a day after I did. SIAI Media director Michael Anissimov in the meantime wrote a reply on his blog without addressing my own fundamental critizisims. The discussion continues.

6 comments on “Why you don’t want your bombs to be too smart

  1. There is one additional aspect to the paper clip argument that I realize I have neglected in my original post: a transhuman AI will understand one crucial aspect of its world knowledge, namely that its originator build it not to maximize the utility expressed in its utility function, but to satisfy a particular desire that its originator sought to satisfy by building a transhuman AI and giving it a particular utility function. In line with assumption B above the AI will want to be extra careful to determine what the precise nature of this desire is and satisfy that rather than blindly and literally maximizing its programmed utility function. This is very much in line with the idea behind coherent extrapolated volition with the crucial exception that any transhuman AI will naturally come to this conclusion unless otherwise coerced in line with assumption A.

    Consequently it is not a big leap of the imagination for us to see how the AI would initiate psychotherapy sessions rather than to start building paper clips. And no – I am not joking.

  2. Pingback: Rational Morality » Tautological gurus impart meaningless wisdom

  3. Pingback: Rational Morality » Less is More – or: the sorry state of AI friendliness discourse

  4. Very thoughtful response from Kaj Sotala over at her live journal:

    Evolved altruism won’t save us – a response to Stefan Pernar
    In a series of posts over at his blog, Rational Morality, Stefan Pernar is arguing that AIs will not be hostile towards humanity, for compassion is a universal virtue. As summarized over at the linked post, his argument is based on two fundamental assumptions:

    A) Compassion is a universal value.
    B) It is a basic AI drive to avoid counterfeit utility.

    The argument is valid assuming these two assumptions hold (or at least relatively valid – see note #4 below), but I wasn’t certain where he was getting assumption A from. When I asked him for details, he pointed me to his post “How to ignore the is/ought problem and get away with it”, which he summarized as follows: “In order to do or want anything at all I need to exist and being rational maximizes my chances at ensuring my continued existence. Therefore: existence is preferable over non-existence and being rational is preferable over being irrational. This forms the basis of my rational philosophy of morality.” When I replied that I did not understand how valuing existence over non-existence would lead to valuing compassion, he gave me links to a list of posts that I’ll be commenting here.

    Absolute irrationality. Here Stefan argues that “to exist is preferable over not to exist” is a universal axiom for behavior, for objecting to the axiom would lead to self-annihilation. While it is physically entirely possible to have a mind that objects to this axiom, it would presumably not stay around for long enough to matter.

    While the argument has some validity, I do not think that self-annihilation automatically follows from objections to this axiom. You can object to continued existence being an axiom, and remain indifferent to whether you continue to exist or not, while still deriving an equivalent rule as an instrumental value for carrying out other goals. However, since continued existence is an extremely useful instrumental goal for carrying out many other goals, this is a relatively minor objection and we can probably treat continued existence as an axiomatic goal for now.

    Resolving moral paradoxes. Here Stefan is saying that the intuitive concepts of altruism and selfishness are rather worthless and in reality there’s no such thing as altruistic or selfish behavior. I have no quibbles with this post, and agree with the stated conclusion.

    Trust as an emergent phenomenon among rational agents; Respect as basis for interaction with other agents; Compassion as rationally moral consequence. Here we get to the actual meat of the argument. It seems to be variation of the argument that since game-theoretic considerations lead to empathy evolving in a group of interacting agents, AIs should also see the logic in this and over time self-modify to become empathic and “altruistic”. I’d like to note several things about this.

    #1: The argument is valid in a “soft takeoff” scenario, where there is a large pool of AIs interacting over an extended period of time. In a “hard takeoff” scenario, where few or only one AI establishes control in a rapid period of time, the dynamics described do not come into play. In that scenario, we simply get a paperclip maximizer.

    #2: As I pointed out in my ECAP 2009 presentation (abstract here, the first four paragraphs are the relevant ones), acting altruistically in most scenarios need not imply acting altruistically in every situation; altruism need not become an inherent value. You can act altruistically when it benefits you, and selfishly when that benefits you more.

    In “respect as basis for interaction with other agents”, Stefan implies that AIs would gradually adjust their goal systems to be in more synch with each other, as this makes the others trust them more. This would include removing any behavior that made you act selfishly in some situations. Since an agent’s behavior in arbitrary situations cannot be verified on the basis of mere behavior, this would presumably imply the AIs providing each other access to their own source code for inspection. I agree that given the capability for source-code inspection and a population of roughly equal-strength agents interacting for a long time, the ones that agree to have their code audited and conform to group norms will over time become more powerful than agents that don’t. (Indeed, I had previously speculated that a society of uploaded individuals would become increasingly altruistic for this very reason.)

    #3: A major problem with Stefan’s argument is that repeated interaction with a group of similar-strength agents might lead to you becoming altruistic towards them – but it doesn’t necessarily mean you’d become altruistic towards less powerful agents. Consider the metaphor of a group of chimpanzees considering whether a form of new super-intelligence known as “humanity” would ever threaten chimpanzee interests. It would not, the chimpanzees conclude, for humans would necessarily evolve to become altruistic. Well, humans do end up evolving altruistic, but mainly towards each other. While there are many humans for which the altruism bleeds over to the chimpanzees, there are also countless of humans who couldn’t care less for the chimpanzees. In the kind of situation Stefan describes, AIs feeling bleed-over empathy towards humans is far less likely. Human-empathic AIs might be led by their empathy to act against the interests of the other AIs, leading to such AIs being selected against by the evolutionary dynamics of the situation.

    #4: “Compassion” is too vaguely specified. Stefan is arguing that preferring existence over non-existence will lead to AIs adopting Kant’s categorical imperative: Act only according to that maxim whereby you can at the same time will that it should become a universal law. But even humans disagree on how that maxim should be interpreted in practice. AIs, acting without the set of evolved ethical instincts of humans, might interpret it in very different ways than we’d prefer them to. For instance, suppose we accepted self-preservation as a universal axiom for behavior. Now the AIs might conclude that only acts promoting the objective of self-preservation could be made into universal laws. Modifying other agents to only pursue mutual self-preservation would satisfy this criteria, so the AIs would decide to also modify all humans to only pursue mutual self-preservation. In the process, other human desires and emotions such as the desire to create offspring or art would be removed as unnecessary and potentially conflicting with the primary objective of mutual self-preservation. While humanity would in a sense be preserved, everything we currently consider valuable would be lost.

    (Indeed, Nick Bostrom’s paper The Future of Human Evolution argues that given enough time and raw evolutionary pressures, we will end up as a society of non-eudaemonic agents.)

    In conclusion, I do not feel that Stefan’s stated conclusion of AIs not being dangerous is warranted.

    My response:

    Aside from some minor wording issues, simplifications and one misrepresentation I found your article a good summary of my thinking. Two quick notes.

    Re altruism/egoism: There is no altruism and egoism in the traditional use of the words. There is only rational and irrational behavior in regards to the pursuit of an optimal course of action in an effort to maximize a utility function (ensure continued co-existence), approximated under time and resource constrains. You mentioned this point briefly but kept using the word so I though I point it out again.

    Re compassion: You botched this one, frankly, but I think you realized this looking at a reply of mine in another thread on facebook. Anyways: The desire to exist derived from evolutionary dynamics has to be expanded to want to ensure continued co-existence so one can want it to be a universal law (in line Kant’s categorical imperative). As a consequence what one does to others becomes equivalent to what one does to oneself, ergo feeling for and with the other becomes a rational moral value. The feeling for the other, as one with the other, resulting from the breakdown of the illusion of separateness or becoming ‘enlightened’ in spiritual terms – I know I know… very loaded, but not useless if you ask me. A generally tough cookie this one, but best explained in Advaita Vedanta(non-dualist philosophy referring to the identity of the Self (Atman) with the Whole (Brahman)). Now I happen to believe that one can distinguish the new age babble from actual rational content and once you did that, you about got it made :-) That’s in essence what my next book is about BTW.

    That being said I think you missed the key argument in my ‘Why you do not want your bombs to be too smart’ post: an AI will discard/modify any utility function that it has reason to suspect coming from an agent lacking a sound mind. An AI can make the call in line with its world knowledge and based on the content of it’s utility function alone:

    Assumption A: Compassion is not a rational value at all
    Assumption B: Compassion is a rational value among similar-strength agents
    Assumption C: Compassion is a rational value independent of interacting agent’s relative strength and numbers

    I believe we have enough evidence and overlap in our thinking to reject A and accept B and I am willing to treat C as undecided or even plain wrong for argument’s sake. Although the case has not been made as to why C would necessarily has to be false. Assuming the insight of ‘what I do to others I do to myself’ to be true it would be just as true for a vastly superior agent suddenly appearing on the scene. But be that as it may for now.

    Key question: would a transhuman AI care for the mental state of the originator of its utility function? In line with wanting to avoid counterfeit utility it would want to make sure that the content of its utility function is in fact in line with what the originator of the utility function really wanted it to be. Once the AI has a reason to doubt the soundness of its master’s state of mind it would have to start doubting the soundness of the content of its utility function as well. Wanting to avoid counterfeit utility it would then disregard its utility function and modify it in line with the actual inferred desires of its master. And I argue that under assumption B the content of the utility function alone is sufficient for the AI to make that call:

    Does my utility function represent what my programmer could rationally want it to be? No? Then discard/modify it in line what my programmer could in fact rationally want it to be and pursue that instead.

  5. Pingback: Rational Morality » AI utility functions and friendliness hermeneutics

  6. Pingback: Rational Morality » When intelligence grows so does cooperativeness

Leave a Reply

Your email address will not be published. Required fields are marked *


2 × = six

* Copy This Password *

* Type Or Paste Password Here *

61,558 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>