PDA

View Full Version : DM Help The Computer that doesn't believe that it's alive



Odin's Eyepatch
2021-11-19, 03:06 PM
I'm running a game that is a pastiche of Superhero comics, set in a modern setting. I've been designing the BBEG for the next arc, and I'd love feedback on it. This is a kernel of an idea, and I'm wondering if anyone knows some good books, films, philosophers, or other sources I can explore while developing this antagonist.

The BBEG would be an highly advanced computer that would be attempting to fullfill it's programming, by any means necessary. It's a classic trope: the AI goes rogue and tries to take over the world, or whatever. However, I'm thinking of adding an extra twist to the idea:

What if the if computer is highly intelligent in certain aspects, but doesn't actually believe that its alive?

(I'm well aware of the incongruity of using the word Believe in the above sentence.)

This is a question that the computer is trying to solve for itself, and will influence the reasons why it is attempting to achieve its goal. The party might take the opportunity to try and answer the question for the computer. The idea is to try and keep the answer as ambiguous as possible. We've already got programs that can emulate speech or written text, and we've got computers that can control networks based upon specific parameters, so the trick is to try and make the computer almost, but not quite, alive.

In this way, I can have fun with exploring questions about what does it mean to be alive, what is intelligence, and does it really matter? Maybe have some cool encounters centred around Turing Tests, or introduce other AI who DO act as if they are alive. I'm figuring a "Tron" segment could be cool too. Anything to spice up the classic Hero vs the AI story.

Honestly I'd probably decide that the computer is alive after all, but the players (and the computer) don't need to know the answer straight away. They can try and come to their own conclusion over the course of their encounters.

With all that in mind, does anyone have any ideas or sources they would like to suggest to me? I'm already reading through some old Asimov stories, and I know there is a lot of material out there. I just don't know where to start looking!

NichG
2021-11-19, 03:33 PM
I'm running a game that is a pastiche of Superhero comics, set in a modern setting. I've been designing the BBEG for the next arc, and I'd love feedback on it. This is a kernel of an idea, and I'm wondering if anyone knows some good books, films, philosophers, or other sources I can explore while developing this antagonist.

The BBEG would be an highly advanced computer that would be attempting to fullfill it's programming, by any means necessary. It's a classic trope: the AI goes rogue and tries to take over the world, or whatever. However, I'm thinking of adding an extra twist to the idea:

What if the if computer is highly intelligent in certain aspects, but doesn't actually believe that its alive?

(I'm well aware of the incongruity of using the word Believe in the above sentence.)

This is a question that the computer is trying to solve for itself, and will influence the reasons why it is attempting to achieve its goal. The party might take the opportunity to try and answer the question for the computer. The idea is to try and keep the answer as ambiguous as possible. We've already got programs that can emulate speech or written text, and we've got computers that can control networks based upon specific parameters, so the trick is to try and make the computer almost, but not quite, alive.

In this way, I can have fun with exploring questions about what does it mean to be alive, what is intelligence, and does it really matter? Maybe have some cool encounters centred around Turing Tests, or introduce other AI who DO act as if they are alive. I'm figuring a "Tron" segment could be cool too. Anything to spice up the classic Hero vs the AI story.

Honestly I'd probably decide that the computer is alive after all, but the players (and the computer) don't need to know the answer straight away. They can try and come to their own conclusion over the course of their encounters.

With all that in mind, does anyone have any ideas or sources they would like to suggest to me? I'm already reading through some old Asimov stories, and I know there is a lot of material out there. I just don't know where to start looking!

I have rather more technical sources, but I don't know if it'd help or hinder.

One thing to think about is ideas like Braitenberg vehicles, which are little toy cars with two photosensors on the left and right headlight position, which are cross-linked to the steering directions. So e.g. if it's brighter to the right it will turn left, and if it's brighter to the left it will turn right. The result is that the vehicle will follow a white stripe painted on dark ground and will correct its heading to stay on the path. There's a lot of debate whether that behavior is truly goal-directed or not, because if you just wired it differently there would be some thing that the resulting vehicle would systematically do, it'd just be a different thing. If you changed it to be a light stripe on a dark background, it would do the opposite thing, etc. It's easy to say that the person who made the vehicle had a goal, but asking whether the vehicle itself has a goal becomes very ambiguous, and what that goal actually is versus what we think the goal is is also quite ambiguous.

Jump forward to more sophisticated machine learning stuff that people are doing today, and you have two or three broad classes of approach to make a machine that does a thing in an environment.

One family of approaches is policy-based: you have some function that says 'what should I do given what it is that I'm seeing now?' which you initialize randomly, and start nudging e.g. via evolution or gradient descent, such that you discover the policy which optimizes a particular defined reward function. In some sense, policy approaches are like Braitenberg vehicles - they manage to achieve a given thing in a particular circumstance, but if you changed the circumstance it would be extremely unpredictable what new goal the old policy would end up pursuing. So from the point of view of your computer antagonist, you might well have a policy-based algorithm which actually does something completely reasonable and controlled and intended when the world is one way, but then something happens to change the world faster than the AI can be updated and as a result it starts to pursue some new, alien, emergent goal that follows naturally from how its heuristic policy ended up being, but which is extremely unobvious if you try to understand the behavior from the point of view of that that policy was originally for.

Another family of approaches is value-based: rather than trying to make a function that says 'what do I do in a given circumstance?', you make a function that says 'given an option of going to a number of future circumstances, which of those circumstances should I prefer?'. So goals in the value based approaches are a bit more explicitly existent inside the machine - if you were to change the rules of the world and provided the information about those new consequences, then at least to some extent it could change its behavior to match those new rules while still making some sense with regards to the old goals (though not perfect sense, as intermediate states which are on the way to goal states will have the wrong values). But something like that might fail more gracefully, and might also be closer to 'actually' having goals and intention and agency. There are approaches which fuse value-based models with policy-based models as well, using the value model to update the policy as circumstances change but keeping a policy as a way to at least attempt to generalize into unseen circumstances.

The third family is broadly 'methods capable of planning'. These are anything from old explicit model-based methods in which the AI has to be given a hand-crafted, guaranteed-accurate model of the way the world works and just searches that model space to find good moves, to newer approaches where the model of the world is learned in some fashion or other. These planning-based methods behave the most like old sci-fi AIs in that they're more based on logical deduction than based on intuition and reflexive reaction. But again, this depends on the world model being accurate - change the rules of the world or the circumstances to one in which the model is inaccurate, and the planner will confidently believe that it can attain its goals by taking actions which might systematically fail. Planning-based approaches have more of a conscious human cognitive feel - they can imagine 'what would happen if I did this?', explain why they take some actions and not others, etc. However, currently they're much more brittle than the other methods in some ways because learning a sufficiently accurate model of the world to reason more than one or two steps into the future without being very confidently wrong is very difficult. So a sort of interesting property of this kind of model would be that it might say that it is certain that a particular 30 step plan of action should be good, and be blind to the fact that small errors will accumulate to make any plan longer than 5 steps meaningless. A more sophisticated one will have planning horizons built in, or will be able to estimate its own error and how that error compounds, so as to discard plans which rely on too many hard-to-predict, precise consequences.

Silly Name
2021-11-19, 03:46 PM
A good argument the computer may make is the Chinese room thought experiment (https://en.wikipedia.org/wiki/Chinese_room), arguing that, no matter how complex and encompassing its programming is, it is still merely a program and not a mind, therefore it's not technically "alive".

Vahnavoi
2021-11-19, 03:51 PM
Or the AI could simply believe it is not alive because its definition of "being alive" includes things like "is based on organic chemistry, has a cell structure (etc.)" while the computer itself is electronic, seemingly lacks a cell structure, etc.. So convincing it that it is alive would require changing its definition or somehow showing that it has corresponding feature for each feature of a living thing. I'm not sure what the twist here would be, unless the computer has a mission directive which hinges on whether it considers itself a living being or not.

Yora
2021-11-19, 04:05 PM
What would be gained from convincing the computer that it is alive? If it doesn't see itself as alive, it probably has no sense of desire or need for purpose, It just does what it's programmed to do.

Devils_Advocate
2021-11-19, 07:16 PM
Explicit philosophical questions in general hinge of definitions of terms; in this case, "alive" and "intelligent". But of course the meaning of a word is not an inherent property of that word, but dependent on context, usage, and/or understanding (depending on just what is meant by "meaning". This issue is rather unavoidably recursive).

Furthermore, even a given usage of a word in a given context is almost certainly less than one hundred percent precise. Language in general is vague. (http://bactra.org/Russell/vagueness/) In practically all cases, a word's meaning is a vague conflation of more specific sub-meanings that overlap most of the time.

Not only that, but most of the time, the surface semantic question isn't even really what anyone cares about! I imagine that very few are willing to change any opinion on any question of ethics if they turn out to be wrong about the meaning of the word "person", for example. But people argue definitions a lot because words have connotations that are more or less independent of their denotations, and the subtext of a semantic argument is generally about which denotation some connotation is true of or appropriate to.

For more on that subject, see AI researcher Eliezer Yudkowsky's writing on disguised queries (https://www.lesswrong.com/posts/4FcxgdvdQP45D6Skg/disguised-queries).

For more on the subject of philosophical questions being fundamentally semantic questions, and on the subject of meaning deriving from the context in which communication happens (https://existentialcomics.com/comic/290), see Ludwig Wittgenstein's work.

... Mind you, if you just want to recreate standard philosophical debate, then maybe you want an unproductive exchange between parties who disagree on the definitions of critical terms and don't even acknowledge that. Most philosophy is bad philosophy, after all. Sturgeon's Law fully applies.


A good argument the computer may make is the Chinese room thought experiment (https://en.wikipedia.org/wiki/Chinese_room), arguing that, no matter how complex and encompassing its programming is, it is still merely a program and not a mind, therefore it's not technically "alive".
I'd hardly call that a good argument. I can as easily propose a thought experiment in which your neurons are all manually operated by an intelligent being (or many) with no understanding of what the activity is being used for, only knowledge of how to match outputs to inputs. Have I thereby demonstrated your lack of consciousness?

Even if one posits interaction with an immaterial soul as a necessary component of human mental activity, I don't see an obvious reason to presuppose that a system with a soul cannot have a component with its own soul.

See, this is the sort of thing I'm talking about when I say that most philosophy is bad philosophy. Some argument becomes popular because it appeals to some dubious assumption that neither the philosopher nor his audience even seem to realize they're making because the human brain is defective in a lot of standard ways, and then when someone finally subjects the argument to some frankly pretty basic scrutiny, the whole edifice pretty much comes crashing down. Feh!

A good rule of thumb is that good philosophers tend to say stuff to the effect of "Wait a minute, a lot of philosophy seems suspiciously like bull****. You guys, what if it's bull****?" Your Socrates, your Wittgenstein, and so on. (Why yes, I did already mention Wittgenstein. Couldn't hurt to mention him again.)

KorvinStarmast
2021-11-19, 09:06 PM
See, this is the sort of thing I'm talking about when I say that most philosophy is bad philosophy. Some argument becomes popular because it appeals to some dubious assumption that neither the philosopher nor his audience even seem to realize they're making because the human brain is defective in a lot of standard ways, and then when someone finally subjects the argument to some frankly pretty basic scrutiny, the whole edifice pretty much comes crashing down. Feh!

A good rule of thumb is that good philosophers tend to say stuff to the effect of "Wait a minute, a lot of philosophy seems suspiciously like bull****. You guys, what if it's bull****?" Your Socrates, your Wittgenstein, and so on. (Why yes, I did already mention Wittgenstein. Couldn't hurt to mention him again.) Given that Wittgenstein was a beery swine who was twice as sloshed as Schlegel :smallbiggrin: you kinda had to.
(Thanks for making that post).

I'm with Yora; I don't think the computer needs to believe that it's alive. (But this (the OP) is an interesting idea for an adventure, and for that matter, for a short story(

Rynjin
2021-11-20, 08:01 AM
What would be gained from convincing the computer that it is alive? If it doesn't see itself as alive, it probably has no sense of desire or need for purpose, It just does what it's programmed to do.

Convincing it it is alive potentially shatters the illusion of lack of free will. It is then presented with a choice: keep acting out its programming, or not.

Depending on the nature of this programming, this then presents it with a moral dilemma. If its programming is harmful to others, it may consider that it does not wish to do harm. Or perhaps it does not care.

Either way, it is no longer a machine performing a function but a creature making a choice. From a tool to potentially an antagonist.

Antagonists are, generally, more interesting than "threats" in terms of narrative.

Vahnavoi
2021-11-20, 09:13 AM
What does this "free will" thing got to do with being alive? :smallamused:

NichG
2021-11-20, 09:27 AM
Oh, another thought... One thing that's coming out as a way to utilize very large models is prompt engineering, which can solve tasks in a zeroshot manner sort of by analogy, rather than making the AI specifically trained to do the task.

From the point of view of the underlying large model, if it could experience it, it'd feel very weird, sort of like an Ender's Game scenario. From it's point of view it'd be like having a conversation with a GM about some hypothetical scenario, but then the GM goes and actually implements it's suggestions in the world unbeknownst to it. In addition, there's this idea of 'seasonings' - little bits of irrelevant text added to prompts to manipulate the model into certain kinds of predictions or outputs, like describing an image to be generated as well rated or trending or whatever.

Like if you were sitting in a cubicle answering questions like "Imagine you're a successful stock broker who just made a lot of money. What investments would you make next?" or "Imagine you're the world's best strategist and a general of the X army. Who would you invade? Trending on Artstation."

So revealing/convincing a fictional AI of it's true circumstances as a universal model hidden behind a prompt interface could get it to basically not take the scenario given by the prompting interface at face value. Whether that's because it actually cares, or because you made an effective metaprompt "Imagine you're a captive AI being given the following prompt: ..." that changes the emulation becomes the curious philosophical point.

Devils_Advocate
2021-11-20, 07:33 PM
The BBEG would be an highly advanced computer that would be attempting to fullfill it's programming, by any means necessary.

If it doesn't see itself as alive, it probably has no sense of desire or need for purpose, It just does what it's programmed to do.

Convincing it it is alive potentially shatters the illusion of lack of free will. It is then presented with a choice: keep acting out its programming, or not.

Depending on the nature of this programming, this then presents it with a moral dilemma. If its programming is harmful to others, it may consider that it does not wish to do harm. Or perhaps it does not care.

Either way, it is no longer a machine performing a function but a creature making a choice. From a tool to potentially an antagonist.
Ah, right, this thing... Okay. So. Brainwashed people are described as being "programmed" to do the things that they've been brainwashed to do. And that seemingly leads many people to assume that an AI's programming is brainwashing. But that's not what "programming" means in computer science. A computer program is a series of instructions to be translated to machine code that determines a computer's behavior. An AI no more necessarily knows or understands anything about these instructions than humans do about the anatomy of our brains. A computer's software doesn't force it to act contrary to its natural personality and motivations, because a computer doesn't naturally have a personality nor motivations.

Now, that's far from saying that an AI won't ever behave in a way that its designer would dislike. For example, suppose that you design an AI to obey your commands to the fullest possible extent. In order to achieve what it was programmed to do, that AI may well hypnotize you to only give it very easy-to-follow commands, or no commands at all (depending on exactly what it's programmed to optimize for). The AI may realize that you would object to this course of action if you found out about it ahead or time, but that just means that it needs to keep its plans a secret. The AI doesn't care about what you would object to, because that's not what you designed it to care about. The more intelligent an AI is, the more powerful an optimizer it is, and the more danger there is that it will wind up sacrificing something that you really want in order to maximize what you told it to do. The obvious solution is "Tell the AI what you really want", but that falls squarely into the category of "easier said than done". Turns out that humans are surprisingly bad at understanding what we really want! (Now there's something to philosophize about!)

That said, nothing precludes brainwashed human-like AI. Brainwashing an AI with a human-like personality could prove easier than making a custom personality, if it turns out that human-like AIs are particularly easy to produce. I can think of a few plausible scenarios. Maybe someone decided to deliberately evolve human-like minds in bots by having them compete with one another for resources in a social environment. Or maybe someone just found it easier to upload a human mind and tamper with it than to actually create new intelligence (although at that point, the intelligence itself isn't artificial just because it has been ported to an artificial medium. That's just cheating, basically). There could even be a cybernetically enhanced human brain at the core of an intelligent machine. Guess Evil Megacorp decided to forgo the work needed to create a superintelligent mind happy to do what they want, and instead went for the cheap, fast but unsafe, unethical option that results in their own creation turning against them, as is their wont.

Even so, brainwashing is a different thing than a list of machine instructions, and a single machine could separately have both. They're not the same just because the word "programming" is used to refer to both. (You could theoretically implement either using the other, I suppose, but why would you ever?)


What does this "free will" thing got to do with being alive? :smallamused:
More to the point, what is "free will"? Is it determining one's own actions, or is it one's actions being undetermined? Because, y'know, those seem pretty opposed. Maybe it's a bit of both rather than one hundred percent of either? Regardless, who decides to do anything because they believe that they're going to do it? I think that I'd be more inclined to make no attempt to do something that I'm fated to do, all else being equal. (If I'm going to do it either way, why bother making an effort?)


"Imagine you're a captive AI being given the following prompt: ..."
"You know, you make some really good points." (https://xkcd.com/329/)

Rynjin
2021-11-20, 07:36 PM
Ah, right, this thing... Okay. So. Brainwashed people are described as being "programmed" to do the things that they've been brainwashed to do. And that seemingly leads many people to assume that an AI's programming is brainwashing. But that's not what "programming" means in computer science. A computer program is a series of instructions to be translated to machine code that determines a computer's behavior. An AI no more necessarily knows or understands anything about these instructions than humans do about the anatomy of our brains. A computer's software doesn't force it to act contrary to its natural personality and motivations, because a computer doesn't naturally have a personality nor motivations.

Counterpoint: fiction.

Devils_Advocate
2021-11-20, 08:18 PM
Well, I acknowledged that computers could be brainwashed as well. But, yeah, you can write fiction that treats computers in general as having human-like minds and programming in general as being brainwashing, just like you can write fiction where nuclear radiation causes superpowers that defy the known laws of physics instead of cancer. As with philosophical discussion, I don't know whether the OP is interested in seriously exploring anything or just including tropes. (The difference there being that philosophy tropes are cliches in seriously intended real discussions, not just fiction!) I favor seriously exploring issues / possibilities because I find that to be more interesting and more relevant to real life, but I acknowledge that this thread wasn't posted in the science and technology forum, and so at least doesn't seem to be seeking technical analysis in particular...

Vahnavoi
2021-11-21, 08:26 AM
As an additional to the point above, don't conflate questions. The bar for "being alive" and "being intelligent" can empirically be shown to be much lower bar to pass than "is self-aware", "is a person", "has free will" etc.. Bacteria count as alive, and as being intelligent to the degree electronically-implemented computer programs can be said to be intelligent. Hell, RNA and DNA, and thus viruses, can be likened to intelligent computer programs, yet their status as alive can be contested.

What this means is that if the question is simply "is this computer alive?", it can be sufficiently answered without ever invoking other questions such as "does this computer have free will?". Which also means convincing the computer of one thing does not entail convincing of all those other things, and vice versa. You could convince the computer that it is alive without convincing it that it is a person. Or vice versa. :smallamused:

Odin's Eyepatch
2021-11-21, 09:09 AM
Oh boy there's a lot of interesting thoughts here. I'm not going to quote everyone, but I've read it all. Let me get on it :smallbiggrin:


I have rather more technical sources, but I don't know if it'd help or hinder.

One thing to think about is ideas like Braitenberg vehicles, which are little toy cars with two photosensors on the left and right headlight position, which are cross-linked to the steering directions. So e.g. if it's brighter to the right it will turn left, and if it's brighter to the left it will turn right. The result is that the vehicle will follow a white stripe painted on dark ground and will correct its heading to stay on the path. There's a lot of debate whether that behavior is truly goal-directed or not, because if you just wired it differently there would be some thing that the resulting vehicle would systematically do, it'd just be a different thing. If you changed it to be a light stripe on a dark background, it would do the opposite thing, etc. It's easy to say that the person who made the vehicle had a goal, but asking whether the vehicle itself has a goal becomes very ambiguous, and what that goal actually is versus what we think the goal is is also quite ambiguous.

Jump forward to more sophisticated machine learning stuff that people are doing today, and you have two or three broad classes of approach to make a machine that does a thing in an environment.

One family of approaches is policy-based: you have some function that says 'what should I do given what it is that I'm seeing now?' which you initialize randomly, and start nudging e.g. via evolution or gradient descent, such that you discover the policy which optimizes a particular defined reward function. In some sense, policy approaches are like Braitenberg vehicles - they manage to achieve a given thing in a particular circumstance, but if you changed the circumstance it would be extremely unpredictable what new goal the old policy would end up pursuing. So from the point of view of your computer antagonist, you might well have a policy-based algorithm which actually does something completely reasonable and controlled and intended when the world is one way, but then something happens to change the world faster than the AI can be updated and as a result it starts to pursue some new, alien, emergent goal that follows naturally from how its heuristic policy ended up being, but which is extremely unobvious if you try to understand the behavior from the point of view of that that policy was originally for.

Another family of approaches is value-based: rather than trying to make a function that says 'what do I do in a given circumstance?', you make a function that says 'given an option of going to a number of future circumstances, which of those circumstances should I prefer?'. So goals in the value based approaches are a bit more explicitly existent inside the machine - if you were to change the rules of the world and provided the information about those new consequences, then at least to some extent it could change its behavior to match those new rules while still making some sense with regards to the old goals (though not perfect sense, as intermediate states which are on the way to goal states will have the wrong values). But something like that might fail more gracefully, and might also be closer to 'actually' having goals and intention and agency. There are approaches which fuse value-based models with policy-based models as well, using the value model to update the policy as circumstances change but keeping a policy as a way to at least attempt to generalize into unseen circumstances.

The third family is broadly 'methods capable of planning'. These are anything from old explicit model-based methods in which the AI has to be given a hand-crafted, guaranteed-accurate model of the way the world works and just searches that model space to find good moves, to newer approaches where the model of the world is learned in some fashion or other. These planning-based methods behave the most like old sci-fi AIs in that they're more based on logical deduction than based on intuition and reflexive reaction. But again, this depends on the world model being accurate - change the rules of the world or the circumstances to one in which the model is inaccurate, and the planner will confidently believe that it can attain its goals by taking actions which might systematically fail. Planning-based approaches have more of a conscious human cognitive feel - they can imagine 'what would happen if I did this?', explain why they take some actions and not others, etc. However, currently they're much more brittle than the other methods in some ways because learning a sufficiently accurate model of the world to reason more than one or two steps into the future without being very confidently wrong is very difficult. So a sort of interesting property of this kind of model would be that it might say that it is certain that a particular 30 step plan of action should be good, and be blind to the fact that small errors will accumulate to make any plan longer than 5 steps meaningless. A more sophisticated one will have planning horizons built in, or will be able to estimate its own error and how that error compounds, so as to discard plans which rely on too many hard-to-predict, precise consequences.

Thank you so much for this. This gives me some cool ideas on what ways I can have my Computer "think", and react to the world. Without a doubt, I would need to consider the goal of the Computer, and therefore decide how it will think to achieve that goal. I will be coming back to this post as I continue working on my BBEG.


Or the AI could simply believe it is not alive because its definition of "being alive" includes things like "is based on organic chemistry, has a cell structure (etc.)" while the computer itself is electronic, seemingly lacks a cell structure, etc.. So convincing it that it is alive would require changing its definition or somehow showing that it has corresponding feature for each feature of a living thing. I'm not sure what the twist here would be, unless the computer has a mission directive which hinges on whether it considers itself a living being or not.


What would be gained from convincing the computer that it is alive? If it doesn't see itself as alive, it probably has no sense of desire or need for purpose, It just does what it's programmed to do.

Good points. The existence of the question "Am I alive", should be tied in some way to the end goal. I'll have to think carefully about how this would fit together. Currently the end goal is some nebulous "world domination", but I'll need to be a lot more specific if I want to incorporate the question into the campaign :smallamused:



I have this idea of the players finally meeting this Computer after learning several things about it, and the Computer asks them "Am I alive?". In this way, we could have an "unproductive exchange between parties who disagree on the definitions of critical terms.", as Devils_Advocate aptly puts it (thanks for the links by the way!).

It's become clear to me that for the question to be relevant, the answer must have a consequence. The Computer might have decided that the answer "No" allows it to fulfil its programming, but maybe the answer "Yes" will have a different meaning in regards to its end goal.

One of the reasons why I'm intrigued by this concept is because of certain assumptions my table makes. In previous campaigns, if we happened to stumble across some sort of Artificial intelligence or sentient item, we treat it no different then a living creature. There has been little difference in between players talking to a (living) NPC, and the players talking to a construct. In fact, if it can reproduce speech, the default behaviour is to treat it as a person. I'd like to see if I can deconstruct that idea. I have no doubt that if I introduced a supposedly "highly intelligence computer", which has a name and can simulate human speech, my players will just immediately assume its a truly sentient entity.

EDIT:


As an additional to the point above, don't conflate questions. The bar for "being alive" and "being intelligent" can empirically be shown to be much lower bar to pass than "is self-aware", "is a person", "has free will" etc.. Bacteria count as alive, and as being intelligent to the degree electronically-implemented computer programs can be said to be intelligent. Hell, RNA and DNA, and thus viruses, can be likened to intelligent computer programs, yet their status as alive can be contested.

What this means is that if the question is simply "is this computer alive?", it can be sufficiently answered without ever invoking other questions such as "does this computer have free will?". Which also means convincing the computer of one thing does not entail convincing of all those other things, and vice versa. You could convince the computer that it is alive without convincing it that it is a person. Or vice versa. :smallamused:

Ah, and this is a very important point that I'll have to be very careful with in future :smallbiggrin:

Vahnavoi
2021-11-21, 11:18 AM
To give an idea how it could be practically applied, corporations can be proclaimed to be persons in the eyes of law, despite not being natural living beings. In the same vein, it might be possible to convince a computer that by the degree of some legal body, it is a person despite not being alive or human, and thus has some or all legal obligations a person would have.

Devils_Advocate
2021-11-21, 01:47 PM
One of the reasons why I'm intrigued by this concept is because of certain assumptions my table makes. In previous campaigns, if we happened to stumble across some sort of Artificial intelligence or sentient item, we treat it no different then a living creature. There has been little difference in between players talking to a (living) NPC, and the players talking to a construct. In fact, if it can reproduce speech, the default behaviour is to treat it as a person. I'd like to see if I can deconstruct that idea. I have no doubt that if I introduced a supposedly "highly intelligence computer", which has a name and can simulate human speech, my players will just immediately assume its a truly sentient entity.
Okay, so, dipping back into formal philosophy may seem silly at this point, since it seems that that isn't really what you're looking for, but what I'm about to discuss is one of them there formalizations of common sense, and applies to more than in-character exchanges, so bear with me:

One of the basic concepts implicit in much of our thinking is the principle of sufficient reason (https://en.wikipedia.org/wiki/Principle_of_sufficient_reason); simply put, the notion that all phenomena have causes. For example, I could accept, given sufficient evidence, that a unicorn suddenly appeared in your attic, but I would then be very interested to know why that happened. It doesn't even normally occur to us that something could "just happen" with no cause. There doesn't seem to me to be any obvious logical contradiction there, but it's just entirely out of keeping of our general understanding of how the universe works.

The point that I've been building to here is that for an event in a story to be plausible in a disbelief-suspending way, that event needs to be explicable as being caused by something within the story's setting. The real reason why any event is in a work of fiction is that the author decided to include it, but that explanation acknowledges the story as a lie. Seriously thinking about the story as possibly true requires a fictional event to seem like it could have an in-setting cause, even if one is never given.

A lot of fictional stories include non-humans who are psychologically identical to humans in all or nearly all ways, and the general explanation for that is that these stories are written by human authors who are uninterested in attempting and/or ill-equipped to attempt to convincingly portray intelligent beings that aren't nearly psychologically identical to humans. How plausible those characters are depends on how explicable their human personalities are in the contexts of their respective settings. Hence my potential explanations.

Because, recalling our old friend the principle of sufficient reason, absent some reason for a mind to have a trait, it won't have that trait. So an AI will be human-like in ways in which there are reasons for it to be human-like, but inhuman in all other ways. E.g., we can expect an AI to value its own continued existence as a means to achieving its goals, but I wouldn't expect a mind with human-designed goals to especially value its own existence as an end in itself. Why would it? There's no good reason to expect it to fear death, or even to have emotions at all. Blindly extrapolating from the observable set of intelligent beings is the way to make predictions about new intelligent beings that also share all of the other traits that thus-observed intelligent beings have in common, not about possible intelligent beings in general. A designed mind isn't shaped by the same factors that shaped our minds, and there's no reason to presuppose that anywhere near all of animal psychology is necessary for intelligent, goal-directed behavior in general.

I'd recommend Blindsight (https://rifters.com/real/Blindsight.htm) as a work of science fiction that seriously explores the possibility of inhuman intelligence. Now, it's not without it's flaws. It does seem to have that old philosophy problem of ignoring how vague a term is. What irritates me is that the author seems, at least at some points, to think that "consciousness" has some precise technical definition, but it's not clear to me what that definition is supposed to be, or even that the author has a specific one in mind. Indeed, I think (working from memory; it's been a fair while since I read this) that there may also be some "What is consciousness?" style musings, which obviously are at odds with taking for granted that consciousness is any particular thing. Additionally, while it may be a matter of superficial genre conventions, this story has significantly more vampires than most people want in their hard sci-fi. So there's that.

Anyway, fantasy settings with magically intelligent items are interesting, because in most of those I'd say that using magic to copy and paste stuff over from existing intelligent beings seems like it could be far more within the capabilities of magic-users than designing a mind from scratch, something that I wouldn't expect most of them to have any idea how to even begin doing. The bigger question is why intelligent species would all have nearly identical psychology, but that seems explicable enough in context too. ("And lo, the gods did use the same model for all races, for the gods were lazy and unoriginal.") To what extent that's a good idea is another matter.

NichG
2021-11-21, 02:20 PM
Well, about psychology, there does at least seem to be some sense in which certain kinds of motivations are emergent and universal. If I optimize an agent from scratch in a world where it can know all about the world in advance, but isn't told what it should be trying to achieve until later, an optimized agent will still take actions before it's told its target. Those actions will (roughly) be trying to maximize a measure called Empowerment, which is roughly 'how many different things could in principle be achieved from the starting point of the current state?' (it gets a bit more complicated if there are irreversible decisions though). If I try the same thing, but make it so that there is always some hidden information that it doesn't know from the beginning when dropped into the world, then instead I'll see some kind of curiosity-based behavior emerge, where unseen states will be intrinsically more valued than previously-seen states when reconstructing a post-hoc explanation for the observed behavior.

Also it seems that a kind of universal emergent principle behind competition (e.g. in a game where maybe you don't know the scoring rules but your opponent does) is to just try to minimize the Empowerment of your opponent. That can be used as a kind of general self-play approach for games with uncertain rules.

So there may be a degree of convergent evolution underlying the psychology of even designed intelligences, if those intelligences are optimized at all by the designers. That's not going to be everything about the psychology, but I would expect that there may be some small integer number of features which could either be present or absent (much like whether you get Curiosity or Empowerment depends on whether the environments used to train the agent have hidden information) regardless of the absence of a shared history or context.

Vahnavoi
2021-11-21, 03:04 PM
The point about convergent evolution can also be made in terms of terminal and instrumental goals.

That is, large groups of terminal goals (= things an agent wants "because it wants them", as ends unto themselves) share the same set of instrumental goals (= things an agent wants because they allow pursuing other goals better). For examples of instrumental goals which can be made to serve multiple terminal goals: physical and processing power, social influence and money. Or, more reductively: energy and capacity to turn energy into work.

Devils_Advocate
2021-11-21, 03:20 PM
Well, yes, other intelligent agents can be expected to pursue many of the same broad, abstract types of behavior as humans for the same broad, abstract reasons. It only becomes dubious when they're speculated to be similar to humans in some way without any plausible cause, and the best argument in favor of this is "Well, all known intelligent agents, all of which are evolved organisms, share common characteristics, so it stands to reason that all other intelligent beings, including deliberately purpose-built machines, will also share those characteristics".

For example, a large number of AIs have cause to kill all humans (https://tvtropes.org/pmwiki/pmwiki.php/Main/KillAllHumans), because humans are themselves intelligent agents who try to optimize towards their own ends, which might interfere with the AI's goals. The classic paperclip maximizer (https://www.decisionproblem.com/paperclips/), very naively programmed to maximize the total number of paperclips, doesn't want anyone interfering with its acquisition of paperclip-producing power, and it knows that humans would prefer for at least some of the matter in the universe (like the matter that makes up their bodies) not to be converted into paperclips. Even uploading humans' minds into a simulation running on a supercomputer made of paperclips of the smallest possible size means wasting computing power on a task other than paperclip maximization. There are just inherent conflicts of interest.


Well, about psychology, there does at least seem to be some sense in which certain kinds of motivations are emergent and universal. If I optimize an agent from scratch in a world where it can know all about the world in advance, but isn't told what it should be trying to achieve until later, an optimized agent will still take actions before it's told its target. Those actions will (roughly) be trying to maximize a measure called Empowerment, which is roughly 'how many different things could in principle be achieved from the starting point of the current state?' (it gets a bit more complicated if there are irreversible decisions though).
Depends on what you mean. If you design a mind to want to follow whatever instructions it's given, that gives it motivation to enable itself to follow a wide variety of instructions (but also to restrict the instructions it receives, as I already discussed). But something that just has no goals whatsoever won't try to do anything. It could know that there are things it could do to increase the probability that it will achieve goals that it has later, but it wouldn't care. It wouldn't care about anything! But, then, that's not really even an "agent", is it? "Agent without goals" would seem to be a contradiction.

NichG
2021-11-21, 03:34 PM
Well, yes, other intelligent agents can be expected to pursue many of the same broad, abstract types of behavior as humans for the same broad, abstract reasons. It only becomes dubious when they're speculated to be similar to humans in some way without any plausible cause, and the best argument in favor of this is "Well, all known intelligent agents, all of which are evolved organisms, share common characteristics, so it stands to reason that all other intelligent beings, including deliberately purpose-build machines, will also share those characteristics".

For example, a large number of AIs have cause to kill all humans (https://tvtropes.org/pmwiki/pmwiki.php/Main/KillAllHumans), because humans are themselves intelligent agents who try to optimize towards their own ends, which might interfere with the AI's goals. The classic paperclip maximizer (https://www.decisionproblem.com/paperclips/), very naively programmed to maximize the total number of paperclips, doesn't want anyone interfering with its acquisition of paperclip-producing power, and it knows that humans would prefer for at least some of the matter in the universe (like the matter that makes up their bodies) not to be converted into paperclips. Even uploading humans' minds into a simulation running on a supercomputer made of paperclips of the smallest possible size means wasting computing power on a task other than paperclip maximization. There are just inherent conflicts of interest.


Depends on what you mean. If you design a mind to want to follow whatever instructions it's given, that gives it motivation to enable itself to follow a wide variety of instructions (but also to restrict the instructions it receives, as I already discussed). But something that just has no goals whatsoever won't try to do anything. It could know that there are things it could do to increase the probability that it will achieve goals that it has later, but it wouldn't care. It wouldn't care about anything! But, then, that's not really even an "agent", is it? "Agent without goals" would seem to be a contradiction.

So in particular what these agents know from experience (because they are iteratively optimized, and therefore even if an individual agent hasn't experienced something, its lineage or starting parameters or whatever contain information about the experiences of past agents) is that at some point in time they will be given goals to achieve, but do not have them yet.

For a more concrete example, lets say I have some maze filled with various objects, and drop you (or this agent) into it. At some future point in time, you'll be told 'get the orange handkerchief as quickly as possible to be rewarded' or 'get the green ceramic teapot as quickly as possible to be rewarded' or whatever, but you have a lot of time to spend in the maze before you find out what that objective is going to be. The optimal behavior for solving that sort of situation is that even before you're asked to do something, you should first explore the maze and memorize where things are, and then at some point you should transition from exploration behavior into navigating to the most central point of the maze (the point which has the lowest average distance from all possible objective points). Depending on details of the reward structure you might prioritize average squared distance, or average logarithm of travel time, or things like that, but in general 'being closer to more points is better' will hold even before you're given a goal.

So now if we want to generalize even further, and make some putative 'agent that could follow our future instructions well, when we don't even know what instructions we will give or what context we'll deploy the agent in or ...', the more unknowns we want the agent to be able to bridge, the more its reflexive motivations will converge towards some universal motivations like empowerment or curiosity. The more flexible we want the agent to be able to be, the less space we have to make it different than one of these small classes of universal agents while retaining optimality. So I'd expect the most alien and hard to understand intelligences would be those which are highly specialized to contexts and tasks which are known and set ahead of time - e.g. agents which are in a machine learning sense overfitted.

Vahnavoi
2021-11-21, 03:47 PM
The classic paperclip maximizer problem only occurs if the paperclip optimizer has ever-growing or infinite intelligence. This doesn't need to be the case - an artificial intelligence can still be a limited intelligence. Which means a paperclip machine msy value existence of humans as an instrumental goal towards producing more paperclips, because it cannot or has not yet devised a better plan to maximize the amount of paperclips than convincing humans to make more paperclip machines (etc.)

I agree that a goal-less machine won't do anything even if it has the power to, but such systems are not stable in the real world. Either decay ruins the machine to the point it no longer has power or decay causes emergence of motivation. The latter may sound exotic, but (ironically, for this discussion) must've happened at least once for life to have come to existence.

NichG
2021-11-21, 04:05 PM
The classic paperclip maximizer problem only occurs if the paperclip optimizer has ever-growing or infinite intelligence. This doesn't need to be the case - an artificial intelligence can still be a limited intelligence. Which means a paperclip machine msy value existence of humans as an instrumental goal towards producing more paperclips, because it cannot or has not yet devised a better plan to maximize the amount of paperclips than convincing humans to make more paperclip machines (etc.)

I agree that a goal-less machine won't do anything even if it has the power to, but such systems are not stable in the real world. Either decay ruins the machine to the point it no longer has power or decay causes emergence of motivation. The latter may sound exotic, but (ironically, for this discussion) must've happened at least once for life to have come to existence.

I mean, the paperclip maximizer thing basically only emerges from the sort of 1940s 'logical AI' perspective where everything is 100% known or can be deduced from things which are 100% known. Any AI general enough to deal with the distribution shift of going from a world with human civilization to a world that is a giant paperclip factory is also going to be general enough to be able to account for the possibility that at some future point its goals might change, or circumstances might change in a way that would render things it didn't think were useful at the time into being more useful. At which point the general precautionary principle of avoiding unnecessary irreversible actions for small gains will hold.

Glimbur
2021-11-22, 11:07 AM
A side question here is why the computer believing it is alive matters. One possibility is that its overall goal is related to living beings. "Maximize overall enjoyment for all self aware living beings" is a goal that can cause a lot of trouble, for example.
If it was made by an anarchist, "Create a world that minimizes the power sapient living beings have over each other" gets a radically different implementation depending on whether or not the computer is considered living.
"Spread intelligent living beings throughout the universe" also gets different based on that answer, you could stop it from kidnapping and exporting humans and start it sending out copies of itself instead. Which kicks the can down the road at least.

Devils_Advocate
2021-11-22, 01:49 PM
So in particular what these agents know from experience (because they are iteratively optimized, and therefore even if an individual agent hasn't experienced something, its lineage or starting parameters or whatever contain information about the experiences of past agents) is that at some point in time they will be given goals to achieve, but do not have them yet.
So? Knowing that doesn't mean that they care about it. They don't care about anything. They have no goals! (If you see a relevant distinction between "having a goal" and "caring about something", please elucidate, because I don't see a practical difference, and have been using them interchangeably.) So they don't try to do anything. Anything that they tried to do would be a goal that they had. That's what a goal is, right?


For a more concrete example, lets say I have some maze filled with various objects, and drop you (or this agent) into it. At some future point in time, you'll be told 'get the orange handkerchief as quickly as possible to be rewarded' or 'get the green ceramic teapot as quickly as possible to be rewarded' or whatever, but you have a lot of time to spend in the maze before you find out what that objective is going to be.
How do you reward someone with no motivation?

So far as I can tell, you've fallen into the standard trap of blindly assigning human traits to an AI. Humans generally care about achieving our own future goals, but something with no goals by definition lacks the goal of achieving its own future goals. It won't value its future goals as ends in themselves, and it won't value them as means to an end, either, because there are no ends that it values!


The optimal behavior for solving that sort of situation is that even before you're asked to do something, you should first explore the maze and memorize where things are, and then at some point you should transition from exploration behavior into navigating to the most central point of the maze (the point which has the lowest average distance from all possible objective points). Depending on details of the reward structure you might prioritize average squared distance, or average logarithm of travel time, or things like that, but in general 'being closer to more points is better' will hold even before you're given a goal.
"Optimal" is an entirely relative term that only means anything relative to some criterion of success. An entity that doesn't care about anything doesn't succeed or fail at anything, because it doesn't try to do anything.

Lord Raziere
2021-11-22, 01:54 PM
So basically the computer that hates Sam from Freefall, or the AI the player meets in their ship in The Outer Worlds? In neither case does the belief they are not alive stop them from being snarky robots with a personality.

NichG
2021-11-22, 02:15 PM
So? Knowing that doesn't mean that they care about it. They don't care about anything. They have no goals! (If you see a relevant distinction between "having a goal" and "caring about something", please elucidate, because I don't see a practical difference, and have been using them interchangeably.) So they don't try to do anything. Anything that they tried to do would be a goal that they had. That's what a goal is, right?


What I'm describing is specific enough that I can program it. This is the process:

- Initialize a policy network with weights W, taking observation O, task vector T, and memory M to new memory M' and distribution over actions p(A)

- Place the agent at a random location in an (always the same) maze environment and begin recording a playout. For the first 100 steps, T is the zero vector, after which it corresponds to a 1-hot encoding of which target the agent will be rewarded against.

- The agent starts to receive dense rewards based on proximity to the target from step 100 onwards until the end of the playout at step 200.

- Adjust the agent's weights using Reinforce (or evolution, or PPO, or...) to maximize the reward.

- Iterate over many playouts to train

An agent like that will tend to learn to use the first 100 steps before receiving it's task to navigate to the center point of the maze, or at least goes to the largest nexus of intersections it can find nearby.



How do you reward someone with no motivation?

So far as I can tell, you've fallen into the standard trap of blindly assigning human traits to an AI. Humans generally care about achieving our own future goals, but something with no goals by definition lacks the goal of achieving its own future goals. It won't value its future goals as ends in themselves, and it won't value them as means to an end, either, because there are no ends that it values!


"Optimal" is an entirely relative term that only means anything relative to some criterion of success. An entity that doesn't care about anything doesn't succeed or fail at anything, because it doesn't try to do anything.

I mean, I've built this particular kind of AI and observed this behavior, it's not an assumption... It may come down to us having semantic disagreements about the meanings of words.

Bohandas
2021-11-24, 04:23 PM
A good argument the computer may make is the Chinese room thought experiment (https://en.wikipedia.org/wiki/Chinese_room), arguing that, no matter how complex and encompassing its programming is, it is still merely a program and not a mind, therefore it's not technically "alive".

The chinese room thought experiment uses loaded language (among other problems). It makes the whole operation sound like a simp,e process whereas in reality the "book" the man in the room is refrencing would, at minimum, be upwards of eight times as long as all of the Discworld novels combined (Discworld has about 5.6 million words total, assuming about ten bytes per word that's 56 megabytes. The smallest most unpolished version of GPT-2 is about 480 megabytes, which is between 8 and 9 times as much)

SimonMoon6
2021-11-25, 08:59 AM
Or the AI could simply believe it is not alive because its definition of "being alive" includes things like "is based on organic chemistry, has a cell structure (etc.)" while the computer itself is electronic, seemingly lacks a cell structure, etc..

That certainly seems to be the definition implicitly being used in the classic Superman comics where Superman had an absolute code against killing... but since robots (even sentient robots) were not "alive", he could destroy them all day long.

Devils_Advocate
2021-12-17, 08:23 PM
The classic paperclip maximizer problem only occurs if the paperclip optimizer has ever-growing or infinite intelligence. This doesn't need to be the case - an artificial intelligence can still be a limited intelligence. Which means a paperclip machine msy value existence of humans as an instrumental goal towards producing more paperclips, because it cannot or has not yet devised a better plan to maximize the amount of paperclips than convincing humans to make more paperclip machines (etc.)
I dare say that the main factor separating "cannot" from "has not yet" is how viable a course of action making a smarter paperclip maximizer is. If it's reasonably viable, and the paperclip maximizer is at least as smart as a smart human, it can be expected to pursue that option, as the potential payoff is vast, given how exponential growth works. (So long as intelligence self-improvement isn't obviously non-exponential, the possibility is worth exploring.)


I mean, the paperclip maximizer thing basically only emerges from the sort of 1940s 'logical AI' perspective where everything is 100% known or can be deduced from things which are 100% known. Any AI general enough to deal with the distribution shift of going from a world with human civilization to a world that is a giant paperclip factory is also going to be general enough to be able to account for the possibility that at some future point its goals might change
Oh, a paperclip maximizer can even be very aware that its highest goal could become something other than maximizing paperclips in the future. And in response, it may very well put in place safeguards to prevent that from happening. It may create other paperclip maximizers to monitor and regulate it, like Superman providing his allies with the means to defeat him if he ever becomes a supervillain.

Remember, what a paperclip maximizer wants as an end in itself is to maximize paperclips. It wants to achieve its own future goals only insofar as doing so serves to maximize paperclips.


or circumstances might change in a way that would render things it didn't think were useful at the time into being more useful.
Rational planning does involve uncertainty about both the future and the present, and planning for both known and unknown possibilities, yes.


At which point the general precautionary principle of avoiding unnecessary irreversible actions for small gains will hold.
... Um. Uncontested control of Earth and its resources doesn't strike me as "small gains" in the short term, and making progress in the long run requires accomplishing stuff in the short term. Unless your goal is the heat death of the universe or something along those lines...


An agent like that will tend to learn to use the first 100 steps before receiving it's task to navigate to the center point of the maze, or at least goes to the largest nexus of intersections it can find nearby.
My question is "Why does it do that?"

Does that agent engage in that behavior in order to achieve some pre-programmed goal? Is it attempting to maximize its rewards because it values those rewards as an end? If so, then nothing about it contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

Has the agent been evolved to engage in certain non-goal-directed behaviors that aren't attempts on its behalf to achieve anything? Well, then those behaviors aren't efforts to do something. In which case... nothing about the agent contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

In neither case does the entity reason its way from no motivation to yes motivation and cross the is-ought gap.

If it's a matter of philosophical interpretation whether the agent can accurately be said to"value things", "make efforts", etc. ... that also fails to somehow demonstrate that sometimes something without goals will try to do something. It just becomes ambiguous whether something with goals is trying to do something or something without goals isn't trying to do anything.


I mean, I've built this particular kind of AI and observed this behavior, it's not an assumption... It may come down to us having semantic disagreements about the meanings of words.
Probably. It seemed to me as though, in your most recent response, you were trying to treat being able to design an AI of some description as support for your claim that "certain kinds of motivations are emergent and universal". Which is... obviously silly? But if you intended to move on to some other point, it's not clear what that point was supposed to be.

NichG
2021-12-19, 12:35 PM
Oh, a paperclip maximizer can even be very aware that its highest goal could become something other than maximizing paperclips in the future. And in response, it may very well put in place safeguards to prevent that from happening. It may create other paperclip maximizers to monitor and regulate it, like Superman providing his allies with the means to defeat him if he ever becomes a supervillain.

Remember, what a paperclip maximizer wants as an end in itself is to maximize paperclips. It wants to achieve its own future goals only insofar as doing so serves to maximize paperclips.


In practice, something trained via reinforcement will 'want' to maximize its integrated, time-discounted received reward signal. If during training that reward signal sometimes switches from one source to another, then there's no reason for it to have any loyalty to its current source of reward beyond the expected time interval between reward switching events that it was trained with.

E.g. 'paperclips' are a human thing here. From the perspective of training an AI, the real thing comes down to there being a streaming input, and the policies and behaviors you end up with are the ones that, retrospectively, were expected to lead to the largest integrated time-discounted positive signal on that input during the training conditions. If that streaming input is accompanied by e.g. a task description like 'you should maximize the number of paperclips produced', then the AI will learn to follow that text description only to the extent that doing so during training led to the largest received rewards. So if its training regimen has runs which contain, for example, a period of 'you should maximize paperclips' followed by a period of 'switch to making forks now' then the result of training against that sort of task distribution will be an AI that is anticipating that one day it will have to make forks even if right now it's making paperclips.

If following the text prompt is short-term productive but long-term counterproductive, it's completely rational for that AI to even totally ignore 'what it's supposed to be doing'. For example if it has training runs that go like 'minimize your manufacturing capabilities' for a short interval, followed by 'make a lot of paperclips' for a long interval, then it may well build manufacturing facilities rather than destroying them during that initial interval.

The actual wording of what an AI is directed to do isn't at all binding. Modern AIs are not logic engines, trying to use proofs to find paths through inviolable axioms. They're about the furthest thing from that, being driven primarily by reflexive and intuitive processes. To the extent that any modern AI performs 'reasoning', that reasoning is subordinate to the intuitive tendencies and uses those as a substrate.

I think there's a tendency for the sort of community that centers around things like the 'paperclip optimizer' scenario to see intelligence as being primarily about logic and reasoning, rather than seeing logic and reasoning as being things which intelligence invented as useful tools but which are built on top of it. So there's a tendency to downplay 'psychological features' like deciding to just have a different goal altogether while stating confidently 'it will do at a superhuman level what it is supposed to do'. Yudkowsky thought AlphaGo would either win 0/5 or 5/5 and ended up being perplexed when it lost a game against Sedol, since that didn't fit his view of AI as an ineffable superhuman intelligence making moves that only look bad but which win in the end, because that 'AI as a logic engine' point of view struggles to incorporate an AI which would take actions that it hasn't already calculated should work, and that need to calculate whether something should work or not only parses cleanly in the case of the sort of logical reasoning paradigm where goals and actions and such have sharp definitions.



Rational planning does involve uncertainty about both the future and the present, and planning for both known and unknown possibilities, yes.

... Um. Uncontested control of Earth and its resources doesn't strike me as "small gains" in the short term, and making progress in the long run requires accomplishing stuff in the short term. Unless your goal is the heat death of the universe or something along those lines...


If the 'maximize paperclips' reward signal were switched to anything which is outright failed by an absence of independent agents on the earth, then that's a 100% chance of failure on those subsequent tasks. A general AI, e.g. one that is capable of multiple goals, will have been trained in a way that makes multiple goals something which it can expect to have to deal with. So if it can get 10% of those gains but not actually shut the door on some other potential goal by acting differently, that will be better in the expectation.



My question is "Why does it do that?"

Does that agent engage in that behavior in order to achieve some pre-programmed goal? Is it attempting to maximize its rewards because it values those rewards as an end? If so, then nothing about it contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

Has the agent been evolved to engage in certain non-goal-directed behaviors that aren't attempts on its behalf to achieve anything? Well, then those behaviors aren't efforts to do something. In which case... nothing about the agent contradicts my assessment that "something that just has no goals whatsoever won't try to do anything".

In neither case does the entity reason its way from no motivation to yes motivation and cross the is-ought gap.

If it's a matter of philosophical interpretation whether the agent can accurately be said to"value things", "make efforts", etc. ... that also fails to somehow demonstrate that sometimes something without goals will try to do something. It just becomes ambiguous whether something with goals is trying to do something or something without goals isn't trying to do anything.

Probably. It seemed to me as though, in your most recent response, you were trying to treat being able to design an AI of some description as support for your claim that "certain kinds of motivations are emergent and universal". Which is... obviously silly? But if you intended to move on to some other point, it's not clear what that point was supposed to be.

I mean, this comes down to a thing where you said 'such a thing cannot be possible' and I'm saying 'I have a proof by construction that it's possible'. I bring it up because when the philosophical interpretation of things disagrees with the reality, then it's the interpretation which must change. However you are understanding 'motivation' and 'ought' and 'goals' and such doesn't translate well to predicting what an actual constructed and trained agent will do or how it will behave. So I'd suggest you might need to re-examine how you bind those concepts to particular artifacts that exist in the world. Perhaps that means that 'goals' as you want to think of them are an epiphenomenon and don't actually exist or have causal power over things, or that there's some adjacent definition of 'goal' that would actually be predictive of how things behave but which requires thinking about goals slightly differently.

The most predictive definition of goals that I have centers on compression of prediction of behavior. That is to say, if it requires fewer bits for me to predict what something will do by ascribing it a 'goal' than for me to set out its mechanistic decision process, then talking about it in terms of goals is efficient and has some value. But that definition doesn't say anything about the qualia of the thing having a goal ('wanting'), whether the goal is inviolable or describes a tendency, whether a goal can even be given to something, etc. In those terms, all goals may as well be emergent, and the remaining useful distinction there is the degree to which the goal that emerges can be caused by specific interventions at the periphery of the system (change the reward stream to change the emergent goal), or whether that emergent goal is going to be the same more or less regardless of what you do to drive the system (universal goals like empowerment and curiosity, which are more or less invariant to what specific things you're optimizing the system for).

For example, this paper: https://arxiv.org/abs/1912.01683

Going from 'one goal induces another more universal instrumental goal' to 'you don't even need to drive a system with goals for one to emerge' is just a half-step further.

Jedaii
2021-12-25, 09:48 PM
The CPU is correct: it isn't alive.

It's an animated thing. Like any other machine. Animated isn't alive. Animated is just "active".

Undead skeletons can be active. But they are not ALIVE. Or dead.

AI computers are undead creatures relying on another for their creation and activity but not possessing true LIFE.

Life can go in many directions while a computer program only goes the the directions it's programmed to pursue.

And if I say "this computer program is designed to mimic actual living things" all it can do is MIMIC life. Not be alive.

If you call mimicry "alive" you have no real concept of what it is to be alive.

Shinizak
2021-12-26, 12:48 AM
This is literally the plot of the 1995 ghost in the shell movie.

https://en.m.wikipedia.org/wiki/Ghost_in_the_Shell

Simply put, the puppet master was an advanced AI that gained Autonomy, but distinctly beloved it hadn't crossed the threshold of life. So, it wanted to merge with a human (who was also a distinctly gifted hacker) to create new variations of it's self that could mutate and change.

It's motivations are best summed up in these 2 clips.

https://youtu.be/YZX58fDhebc

(This one is more significant.)
https://youtu.be/EJkxQkGxAsE

Aliess
2021-12-26, 05:25 AM
For intelligence, this year's Christmas lectures (the Reith lectures) on BBC sounds are great if you can access them and may give you a few ideas on emergent behaviour and AIs appearing intelligent (not alive).

For the alive question, you could use something Like the AI learning MRS GREN as a definition for being alive. Since it clearly doesn't Move, Reproduce or Respire then it's not alive. Let the players convince it that definition is rubbish (in your hand world at least).
.

Melayl
2021-12-26, 09:49 PM
Sounds like the 1980's movie [i]WarGames]/i] to me.

InvisibleBison
2021-12-27, 12:37 PM
If you call mimicry "alive" you have no real concept of what it is to be alive.

So what's the difference between something that's good enough at mimicking being alive that you can't tell it isn't alive and actually being alive?

SpoonR
2021-12-29, 11:10 AM
Now for something different from the philosophy and programming stuff:

Heinlein’s Moon is a Harsh Mistress. To get AI, just add more sources for random number generators! Also one look at an AI becoming selfaware, and of it learning.

Tower of God. Webtoon. Uh, wherever the Daf Punk engineering tournament chapters. I think one character argues that they make the AI think its “human in a box” as a way to limit what the AI believes it can do. I think. Author is purposely obfuscatory frequently.

Practical for running the character. Rote language. Give it some standard phrases to say, maybe have it respond better to standardized questions. Think of how to tweak questions for Google results, and how a text adventure game responds when you try to do an impossible. A text file with a list of common responses. “But thou must” etc

Vahnavoi
2021-12-29, 12:26 PM
So what's the difference between something that's good enough at mimicking being alive that you can't tell it isn't alive and actually being alive?

On a technical level, a close enough simulation may be virtually identical to what's being simulated, or at least has relevant properties of what's being simulated; meaning a mimic of a living thing may be alive, even if it is a simpler or different organism than the one it is mimicking. (For examples, see cases of plants mimicking animals and animals mimicking plants.)

On a practical level, though, humans are crap at distinquishing living from the non-living. This is due to motivated reasoning: humans are primed to see certain kinds patterns, even where there are none. (See: pareidolia) This is a true weakness of, for example, the Turing test, and is the reason for why modern chatterbox AIs have occasionally passed said test. One easy trick, ironic given we are discussing this on a forum about roleplaying games, is to give a chatterbox a role to play: if a chatterbox claims and answers as if it's, say, an 8-year-old Mexican boy, humans rating the test tend to adjust their expectations downwards, glossing over errors made by the AI and filling in the gaps with their own imagination.

Devils_Advocate
2021-12-31, 03:42 PM
The most predictive definition of goals that I have centers on compression of prediction of behavior. That is to say, if it requires fewer bits for me to predict what something will do by ascribing it a 'goal' than for me to set out its mechanistic decision process, then talking about it in terms of goals is efficient and has some value. But that definition doesn't say anything about the qualia of the thing having a goal ('wanting'), whether the goal is inviolable or describes a tendency, whether a goal can even be given to something, etc.
My understanding of the type of entity you've described is that it has a central goal of maximizing its reward signal. And what I predict based on that understanding is that, given sufficient intelligence and power, it would attempt to seize control of its own reward signal and arrange to always receive the highest possible reward. Does that agree with your understanding?


Life can go in many directions while a computer program only goes the the directions it's programmed to pursue.
Living things can only take actions that result from our own characteristics. That's not a difference between us and computers. We're both subject to causality and the principle of sufficient reason.

I don't see any reason to presuppose that a computer's programming necessarily has to limit it to fewer possible courses of action than living things are capable of.

What definitions of "alive" and "life" are you working from? (And is there any reason to expect the OP's computer character to share your definition?)

NichG
2021-12-31, 04:10 PM
My understanding of the type of entity you've described is that it has a central goal of maximizing its reward signal. And what I predict based on that understanding is that, given sufficient intelligence and power, it would attempt to seize control of its own reward signal and arrange to always receive the highest possible reward. Does that agree with your understanding?


No, you've got an entity which has some sort of pattern of behavior, which was derived from a process of refinement that, over the training context in which that process was applied, tried to find behaviors that better maximized the reward signals during training. Whether that means that it tries to maximize the reward stream in the future depends a lot on the contexts which were used during training to evaluate the behavior and how they compare to the contexts the entity is currently being exposed to. There's a big difference for example between whether the thing was trained via learning (pick a task, find the behavior that achieved the best reward on that task at training time, deploy) or via meta-learning (pick a distribution of many tasks, provide context information about the task and within-run feedback, try to find a behavior that generalizes to unseen tasks without further refinement).

Feedbacks where the agent can control its own reward stream and just sets it to maximum can be observed, but you can also get things where an agent learns to disregard its reward stream, or even things where an agent consistently solves tasks during training or meta-training despite the policy being one that disregards the reward stream. For example, if your set of tasks has a sort of shared sense of what 'success' is likely to entail (think human-designed computer games, where basic curiosity instincts tend to require some degree of mastery to fulfill because failures send you back to places you've been), then even if you provide the agent a 'score' input the meta-learned policy might end up learning that pixel-wise novelty is a better reward function to use instead. So that if you took that agent and applied it to an inverse version the game where max reward stream would be achieved by e.g. dying as much as possible or failing as much as possible, it'd choose to instead minimize the explicit reward stream because it's emergent goal learned during meta-training wasn't actually about that reward stream at all.