The Computer that doesn't believe that it's alive

**NichG** · 2021-12-19, 12:35 PM (ISO 8601)

Originally Posted by Devils_Advocate

Oh, a paperclip maximizer can even be very aware that its highest goal could become something other than maximizing paperclips in the future. And in response, it may very well put in place safeguards to prevent that from happening. It may create other paperclip maximizers to monitor and regulate it, like Superman providing his allies with the means to defeat him if he ever becomes a supervillain.

Remember, what a paperclip maximizer wants as an end in itself is to maximize paperclips. It wants to achieve its own future goals only insofar as doing so serves to maximize paperclips.

In practice, something trained via reinforcement will 'want' to maximize its integrated, time-discounted received reward signal. If during training that reward signal sometimes switches from one source to another, then there's no reason for it to have any loyalty to its current source of reward beyond the expected time interval between reward switching events that it was trained with.

E.g. 'paperclips' are a human thing here. From the perspective of training an AI, the real thing comes down to there being a streaming input, and the policies and behaviors you end up with are the ones that, retrospectively, were expected to lead to the largest integrated time-discounted positive signal on that input during the training conditions. If that streaming input is accompanied by e.g. a task description like 'you should maximize the number of paperclips produced', then the AI will learn to follow that text description only to the extent that doing so during training led to the largest received rewards. So if its training regimen has runs which contain, for example, a period of 'you should maximize paperclips' followed by a period of 'switch to making forks now' then the result of training against that sort of task distribution will be an AI that is anticipating that one day it will have to make forks even if right now it's making paperclips.

If following the text prompt is short-term productive but long-term counterproductive, it's completely rational for that AI to even totally ignore 'what it's supposed to be doing'. For example if it has training runs that go like 'minimize your manufacturing capabilities' for a short interval, followed by 'make a lot of paperclips' for a long interval, then it may well build manufacturing facilities rather than destroying them during that initial interval.

The actual wording of what an AI is directed to do isn't at all binding. Modern AIs are not logic engines, trying to use proofs to find paths through inviolable axioms. They're about the furthest thing from that, being driven primarily by reflexive and intuitive processes. To the extent that any modern AI performs 'reasoning', that reasoning is subordinate to the intuitive tendencies and uses those as a substrate.

I think there's a tendency for the sort of community that centers around things like the 'paperclip optimizer' scenario to see intelligence as being primarily about logic and reasoning, rather than seeing logic and reasoning as being things which intelligence invented as useful tools but which are built on top of it. So there's a tendency to downplay 'psychological features' like deciding to just have a different goal altogether while stating confidently 'it will do at a superhuman level what it is supposed to do'. Yudkowsky thought AlphaGo would either win 0/5 or 5/5 and ended up being perplexed when it lost a game against Sedol, since that didn't fit his view of AI as an ineffable superhuman intelligence making moves that only look bad but which win in the end, because that 'AI as a logic engine' point of view struggles to incorporate an AI which would take actions that it hasn't already calculated should work, and that need to calculate whether something should work or not only parses cleanly in the case of the sort of logical reasoning paradigm where goals and actions and such have sharp definitions.

Rational planning does involve uncertainty about both the future and the present, and planning for both known and unknown possibilities, yes.

... Um. Uncontested control of Earth and its resources doesn't strike me as "small gains" in the short term, and making progress in the long run requires accomplishing stuff in the short term. Unless your goal is the heat death of the universe or something along those lines...

If the 'maximize paperclips' reward signal were switched to anything which is outright failed by an absence of independent agents on the earth, then that's a 100% chance of failure on those subsequent tasks. A general AI, e.g. one that is capable of multiple goals, will have been trained in a way that makes multiple goals something which it can expect to have to deal with. So if it can get 10% of those gains but not actually shut the door on some other potential goal by acting differently, that will be better in the expectation.

My question is "Why does it do that?"

Does that agent engage in that behavior in order to achieve some pre-programmed goal? Is it attempting to maximize its rewards because it values those rewards as an end? If so, then nothing about it contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

Has the agent been evolved to engage in certain non-goal-directed behaviors that aren't attempts on its behalf to achieve anything? Well, then those behaviors aren't efforts to do something. In which case... nothing about the agent contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

In neither case does the entity reason its way from no motivation to yes motivation and cross the is-ought gap.

If it's a matter of philosophical interpretation whether the agent can accurately be said to"value things", "make efforts", etc. ... that also fails to somehow demonstrate that sometimes something without goals will try to do something. It just becomes ambiguous whether something with goals is trying to do something or something without goals isn't trying to do anything.

Probably. It seemed to me as though, in your most recent response, you were trying to treat being able to design an AI of some description as support for your claim that "certain kinds of motivations are emergent and universal". Which is... obviously silly? But if you intended to move on to some other point, it's not clear what that point was supposed to be.

I mean, this comes down to a thing where you said 'such a thing cannot be possible' and I'm saying 'I have a proof by construction that it's possible'. I bring it up because when the philosophical interpretation of things disagrees with the reality, then it's the interpretation which must change. However you are understanding 'motivation' and 'ought' and 'goals' and such doesn't translate well to predicting what an actual constructed and trained agent will do or how it will behave. So I'd suggest you might need to re-examine how you bind those concepts to particular artifacts that exist in the world. Perhaps that means that 'goals' as you want to think of them are an epiphenomenon and don't actually exist or have causal power over things, or that there's some adjacent definition of 'goal' that would actually be predictive of how things behave but which requires thinking about goals slightly differently.

The most predictive definition of goals that I have centers on compression of prediction of behavior. That is to say, if it requires fewer bits for me to predict what something will do by ascribing it a 'goal' than for me to set out its mechanistic decision process, then talking about it in terms of goals is efficient and has some value. But that definition doesn't say anything about the qualia of the thing having a goal ('wanting'), whether the goal is inviolable or describes a tendency, whether a goal can even be given to something, etc. In those terms, all goals may as well be emergent, and the remaining useful distinction there is the degree to which the goal that emerges can be caused by specific interventions at the periphery of the system (change the reward stream to change the emergent goal), or whether that emergent goal is going to be the same more or less regardless of what you do to drive the system (universal goals like empowerment and curiosity, which are more or less invariant to what specific things you're optimizing the system for).

For example, this paper: https://arxiv.org/abs/1912.01683

Going from 'one goal induces another more universal instrumental goal' to 'you don't even need to drive a system with goals for one to emerge' is just a half-step further.

**Jedaii** · 2021-12-25, 09:48 PM (ISO 8601)

The CPU is correct: it isn't alive.

It's an animated thing. Like any other machine. Animated isn't alive. Animated is just "active".

Undead skeletons can be active. But they are not ALIVE. Or dead.

AI computers are undead creatures relying on another for their creation and activity but not possessing true LIFE.

Life can go in many directions while a computer program only goes the the directions it's programmed to pursue.

And if I say "this computer program is designed to mimic actual living things" all it can do is MIMIC life. Not be alive.

If you call mimicry "alive" you have no real concept of what it is to be alive.

**Shinizak** · 2021-12-26, 12:48 AM (ISO 8601)

This is literally the plot of the 1995 ghost in the shell movie.

https://en.m.wikipedia.org/wiki/Ghost_in_the_Shell

Simply put, the puppet master was an advanced AI that gained Autonomy, but distinctly beloved it hadn't crossed the threshold of life. So, it wanted to merge with a human (who was also a distinctly gifted hacker) to create new variations of it's self that could mutate and change.

It's motivations are best summed up in these 2 clips.

https://youtu.be/YZX58fDhebc

(This one is more significant.)
https://youtu.be/EJkxQkGxAsE

**Aliess** · 2021-12-26, 05:25 AM (ISO 8601)

For intelligence, this year's Christmas lectures (the Reith lectures) on BBC sounds are great if you can access them and may give you a few ideas on emergent behaviour and AIs appearing intelligent (not alive).

For the alive question, you could use something Like the AI learning MRS GREN as a definition for being alive. Since it clearly doesn't Move, Reproduce or Respire then it's not alive. Let the players convince it that definition is rubbish (in your hand world at least).
.

**Melayl** · 2021-12-26, 09:49 PM (ISO 8601)

Sounds like the 1980's movie [i]WarGames]/i] to me.

**InvisibleBison** · 2021-12-27, 12:37 PM (ISO 8601)

Originally Posted by Jedaii

If you call mimicry "alive" you have no real concept of what it is to be alive.

So what's the difference between something that's good enough at mimicking being alive that you can't tell it isn't alive and actually being alive?

**SpoonR** · 2021-12-29, 11:10 AM (ISO 8601)

Now for something different from the philosophy and programming stuff:

Heinlein’s Moon is a Harsh Mistress. To get AI, just add more sources for random number generators! Also one look at an AI becoming selfaware, and of it learning.

Tower of God. Webtoon. Uh, wherever the Daf Punk engineering tournament chapters. I think one character argues that they make the AI think its “human in a box” as a way to limit what the AI believes it can do. I think. Author is purposely obfuscatory frequently.

Practical for running the character. Rote language. Give it some standard phrases to say, maybe have it respond better to standardized questions. Think of how to tweak questions for Google results, and how a text adventure game responds when you try to do an impossible. A text file with a list of common responses. “But thou must” etc

**Vahnavoi** · 2021-12-29, 12:26 PM (ISO 8601)

Originally Posted by InvisibleBison

So what's the difference between something that's good enough at mimicking being alive that you can't tell it isn't alive and actually being alive?

On a technical level, a close enough simulation may be virtually identical to what's being simulated, or at least has relevant properties of what's being simulated; meaning a mimic of a living thing may be alive, even if it is a simpler or different organism than the one it is mimicking. (For examples, see cases of plants mimicking animals and animals mimicking plants.)

On a practical level, though, humans are crap at distinquishing living from the non-living. This is due to motivated reasoning: humans are primed to see certain kinds patterns, even where there are none. (See: pareidolia) This is a true weakness of, for example, the Turing test, and is the reason for why modern chatterbox AIs have occasionally passed said test. One easy trick, ironic given we are discussing this on a forum about roleplaying games, is to give a chatterbox a role to play: if a chatterbox claims and answers as if it's, say, an 8-year-old Mexican boy, humans rating the test tend to adjust their expectations downwards, glossing over errors made by the AI and filling in the gaps with their own imagination.

**Devils_Advocate** · 2021-12-31, 03:42 PM (ISO 8601)

Originally Posted by NichG

The most predictive definition of goals that I have centers on compression of prediction of behavior. That is to say, if it requires fewer bits for me to predict what something will do by ascribing it a 'goal' than for me to set out its mechanistic decision process, then talking about it in terms of goals is efficient and has some value. But that definition doesn't say anything about the qualia of the thing having a goal ('wanting'), whether the goal is inviolable or describes a tendency, whether a goal can even be given to something, etc.

My understanding of the type of entity you've described is that it has a central goal of maximizing its reward signal. And what I predict based on that understanding is that, given sufficient intelligence and power, it would attempt to seize control of its own reward signal and arrange to always receive the highest possible reward. Does that agree with your understanding?

Originally Posted by Jedaii

Life can go in many directions while a computer program only goes the the directions it's programmed to pursue.

Living things can only take actions that result from our own characteristics. That's not a difference between us and computers. We're both subject to causality and the principle of sufficient reason.

I don't see any reason to presuppose that a computer's programming necessarily has to limit it to fewer possible courses of action than living things are capable of.

What definitions of "alive" and "life" are you working from? (And is there any reason to expect the OP's computer character to share your definition?)

**NichG** · 2021-12-31, 04:10 PM (ISO 8601)

Originally Posted by Devils_Advocate

My understanding of the type of entity you've described is that it has a central goal of maximizing its reward signal. And what I predict based on that understanding is that, given sufficient intelligence and power, it would attempt to seize control of its own reward signal and arrange to always receive the highest possible reward. Does that agree with your understanding?

No, you've got an entity which has some sort of pattern of behavior, which was derived from a process of refinement that, over the training context in which that process was applied, tried to find behaviors that better maximized the reward signals during training. Whether that means that it tries to maximize the reward stream in the future depends a lot on the contexts which were used during training to evaluate the behavior and how they compare to the contexts the entity is currently being exposed to. There's a big difference for example between whether the thing was trained via learning (pick a task, find the behavior that achieved the best reward on that task at training time, deploy) or via meta-learning (pick a distribution of many tasks, provide context information about the task and within-run feedback, try to find a behavior that generalizes to unseen tasks without further refinement).

Feedbacks where the agent can control its own reward stream and just sets it to maximum can be observed, but you can also get things where an agent learns to disregard its reward stream, or even things where an agent consistently solves tasks during training or meta-training despite the policy being one that disregards the reward stream. For example, if your set of tasks has a sort of shared sense of what 'success' is likely to entail (think human-designed computer games, where basic curiosity instincts tend to require some degree of mastery to fulfill because failures send you back to places you've been), then even if you provide the agent a 'score' input the meta-learned policy might end up learning that pixel-wise novelty is a better reward function to use instead. So that if you took that agent and applied it to an inverse version the game where max reward stream would be achieved by e.g. dying as much as possible or failing as much as possible, it'd choose to instead minimize the explicit reward stream because it's emergent goal learned during meta-training wasn't actually about that reward stream at all.

Thread: The Computer that doesn't believe that it's alive

Thread Tools

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Spoilers

Re: The Computer that doesn't believe that it's alive

Posting Permissions