AI Dungeon Master [Archive] - Giant in the Playground Forums

ThomasDelvus

2024-03-21, 04:12 PM

OldTrees1

2024-03-21, 11:55 PM

General Feedback for LLMs as AI DMs:
They literally don't care about accuracy. If you say your character does XYZ, the AI DM does not care about remembering that event nor state. When the AI DM says an NPC does XYZ, the AI DM does not care about remembering that event nor state. As the game progresses it becomes increasingly obvious to the players that there is no game.

Instead of trying to make an AI DM, consider leveraging an intelligence that does care to run an interactive game. A LLM can be useful for brainstorming, but don't try to drill holes using a hammer.

PS: Also only those with the GPT Plus package can see the page. So it is paywalled. Luckily this does not prevent feedback based on the categorical limitations of LLMs.

Dalinar

2024-03-23, 09:50 AM

Echoing the above. An LLM is a statistical model that predicts what text might come next in a conversation and displays it to you. There isn't really an intelligence behind it to interpret any sort of meaning from that text (calling it "AI" is largely a matter of marketing it to investors), which makes any sort of continuity unlikely. (And, in fact, the LLM will tell you as much if you try to ask it certain things, like "pick a number between one and ten.")

If you are interested in using an LLM to alleviate the longstanding DM shortage, right now the best it can do is act as a sounding board to bounce ideas off of. Maybe you can use it to generate flowery descriptions of things if you're not good at that, though I'd still aggressively edit anything along those lines (not because they're going to come out badly 100% of the time, you just don't want to open yourself up to accusations of "oh you just copy-pasted that from chatGPT" if it doesn't sound like your writing/voice).

I wouldn't put one in charge of anything player-facing at this time.

Telonius

2024-03-23, 09:52 AM

If it's anything like the image generators I'd expect to be fighting lots of Rakshasas.

Zarhan

2024-03-23, 09:55 AM

On the other hand, this probably work as a perfect GM for Paranoia, where GM fiat overrules everything...

JLandan

2024-03-23, 12:20 PM

I play with some guys from Cal Tech. They tried using a LLM to DM. They couldn't make it work. It was fantastic at description and battlemap plotting, but it couldn't run a combat without screwing up. Example: We enter a room with 4 goblins, the Barbarian rushes in and whacks one of them, end of combat, no recognition of the other 3 goblins.

Currently they're working on a proprietary AI Dungeonmaster System, that they call AIDS (I know, college humor, very similar to junior high humor). It's meant as a DM tool, not replacement, to do the heavy lifting. Still a work in progress, but they have hope.

Kylar0990

2024-03-23, 09:02 PM

I’ve been working on teaching ChatGPT 4 how to be a decent Dungeon Master. Would love any feedback.

chat.openai.com/g/g-tyhutCVUv-d-d-r-dungeon-master

Please forgive me if this isn’t the appropriate place to post this.

TD

What they currently call 'AI' is mostly hype. Predictive text models are not intelligent. Being able to spit out words that correlate but with no understanding of causation isn't intelligence. The current models being used need people to clean up the output or continued iterations degrade worse than continuing to photocopy a copy of a copy of a copy of a copy ...

So while it may be useful to set the scene or describe static events, it can't handle an actual interaction for any length of time.

Witty Username

2024-04-03, 09:15 AM

A short jaunt to AI art for some mtg proxy cards, and I can safely say AI is pretty limited in terms of feedback and correction.

An AI DM will likely be able to respond but anything even minorly off will be very difficult to correct.

LudicSavant

2024-04-03, 02:16 PM

There are some TTRPGs out there that have a system for DMless play, like Gubat Banwa.

Psyren

2024-04-03, 03:43 PM

+1 to OldTrees1, your best bet is to use "AI" to supplement a proper DM by automating more tedious activities like scene-setting and NPC generation.

Having said that:

There are some TTRPGs out there that have a system for DMless play, like Gubat Banwa.

Adding to this, you could also go DM-less by running a solo adventure like Wolves of Langston or Tyrant of Zhentil Keep (they work similar to a Choose Your Own Adventure book), with the encounter difficulty scaled up for multiple characters. The party needs to pretty much make story decisions in lockstep though, or even just have one decisionmaker while the rest are piloting warm bodies.

tchntm43

2024-04-03, 08:43 PM

Hael

2024-04-03, 09:39 PM

Current LLMs are good with giving details to small things in a scene. Did the players ask about the nature of the metal chandolier (which has nothing to do with the story), and you want to make it believable.. Ask chatgpt to give something quick.

It will be decent at making outlines of a plot you might have created, or filling in side character details, but overall its going to be a pretty mediocre story teller and rules arbiter at this stage. Both in terms of accuracy, coherency, quality as well as speed (you will spend more time correcting it than if someone simply did it themselves).

That won’t last though. Eventually it will be a solid DM. But right now, the context length isn’t there (newer models will eventually be able to digest several tomes worth of information) and be able to keep details of the story/map in memory.

It will also get better at reasoning and creative story telling. So I have no doubt that we will see at least an average AI DM at some point in the future (say next ten years or so).

Do I think it will ever replace Matt Colville? Probably not.

Psyren

2024-04-03, 11:10 PM

To properly have an AI dungeon master, it needs to combine the LLM with a reference of hard-coded information on rules and also a place to write information it can't forget, like how many enemies are in combat or what player character names are.

In terms of hybrid models, I think there isn't anything like this yet, but it's coming. If not with GPT-5 (which is expected to be released this year), then almost certainly with GPT-6.

I would have said automating D&D wouldn't be a priority for the engineers developing these... but who am I kidding :smalltongue:

Do I think it will ever replace Matt Colville? Probably not.

To butcher paraphrase an old saying (https://www.businessinsider.com/outrunning-the-bear-2011-10) - "AI doesn't need to out-DM Matt Colville, it just needs to out-DM me."

Witty Username

2024-04-04, 12:48 AM

Last I checked, AI is still sketchy telling the difference between busses and ostriches. Overall I am not all that worried in the now.

Bohandas

2024-04-04, 11:10 AM

You could use it as a DM's aid, or you could use it as a GM for a wholly unstructured game (a la what AI Dungeon already does) but not as the main DM for a structured rule-based RPG because I don't think it can handle that level of structure and also it has the memory of a goldfish

Jakinbandw

2024-04-04, 09:19 PM

I can't access it right now, but I've been hearing good things about claude 3. If someone has access and would be willing to give it a shot, I'd be curious. Probably have to be theater of the mind combat, but that isn't rules illegal in dnd.

Theodoxus

2024-04-05, 08:09 AM

Last I checked, AI is still sketchy telling the difference between busses and ostriches. Overall I am not all that worried in the now.

I LOL'd at this, until I went and tested it in Copilot... it started off strong, talking about how ostriches are native to Africa and are the largest living bird species. Buses are man-made vehicles that are used all over the world... and then it swapped 'emus' for 'bus' and it went downhill quickly (for the ostrich vs bus comparison - the emus stats were on point as far as I could tell.)

Kinda sad when you can't even use rudimentary LLM for library sciences...

Bohandas

2024-04-05, 12:48 PM

My experience with AI has veen like something legitimately out of sci-fi. Unfortunately the sci-fi in question is Portal 2, and specifically the Fact Sphere

https://www.youtube.com/watch?v=1rmp-_cT3co

OldTrees1

2024-04-05, 07:26 PM

I LOL'd at this, until I went and tested it in Copilot... it started off strong, talking about how ostriches are native to Africa and are the largest living bird species. Buses are man-made vehicles that are used all over the world... and then it swapped 'emus' for 'bus' and it went downhill quickly (for the ostrich vs bus comparison - the emus stats were on point as far as I could tell.)

Kinda sad when you can't even use rudimentary LLM for library sciences...

Why do you find it kinda sad that a rudimentary LLM would be misaligned for library sciences?

I don't find it sad that a hammer makes a poor pen, and I think LLM and library sciences are in a similar situation. I feel LLMs are categorically misaligned for something that cares so much about accuracy as library sciences. There was a court case where a lawyer used ChatGPT to look up case law (library science right there) but the LLM did not care about accuracy, so it "hallucinated" when it followed its prioritizes and the predicted next words formed fictional cases because they felt like a more natural response to the question.

Or is the "kinda sad" part that Microsoft is trying to present Copilot as a "AI-powered personal tutor" despite it being misaligned for library sciences?

Honestly I think there are other types of AI (Watson playing Jeopardy in 2007 for example) are better aligned for library sciences.

Bohandas

2024-04-06, 12:42 AM

The Fact Sphere is not defective. Its facts are wholly accurate and interesting

Theodoxus

2024-04-06, 06:38 AM

Or is the "kinda sad" part that Microsoft is trying to present Copilot as a "AI-powered personal tutor" despite it being misaligned for library sciences?

This. Based on the tiny sample size that is my coworkers... let's just say, your typical American white collar worker doesn't grok what AI actually is, and based on what MS is foisting on us, their fears of a robotic overlord ala Matrix or Terminator are massively overblown.

Bohandas

2024-04-06, 09:27 AM

EggKookoo

2024-04-06, 12:07 PM

These conversations always remind me of the old Firesign Theatre bit (and I paraphrase)...

Hacker: Give me access to file C.
Computer: PRESENT QUERIES IN THE FORM OF A QUESTION
Hacker: ...give me access to file C?
Computer: ACCESS GRANTED

Anyway, yeah, GPT-4 isn't going to ever get there. GPT-5 maybe, but as many point out not without some kind of specialized access to rules info and DMing instructions that it can reference.

OldTrees1

2024-04-06, 12:38 PM

This. Based on the tiny sample size that is my coworkers... let's just say, your typical American white collar worker doesn't grok what AI actually is, and based on what MS is foisting on us, their fears of a robotic overlord ala Matrix or Terminator are massively overblown.

Yeah. LLMs are good at replying to anything and they are optimized for the reply looking like a natural continuation of the conversation. This makes it very easy for us to trick ourselves and others into thinking it is more capable that it is. This is partially because LLMs can pass the Turing Test.

Your coworker fears of robotic overlords are the same concerns as AI safety researchers are concerned about. Those are legit concerns. However those concerns are for when Artificial General Intelligence (AGI) is discovered. Aka not yet.

Early on, when there was limited information about how the LLMs worked, it was easy to confuse the LLM's ability to reply for a consciousness. A Google AI engineer was testing LaMDA (an AI a different Google team created) and mistook this LLM ability to reply for the LLM being conscious. The more one knows about LLMs, the harder it is to trick yourself into thinking they are more capable than they are.

In the meantime you have individuals who barely know about LLMs trying to jump on this tech fad and get rich quick. I expect most of them are honest and tricked themselves into believing the LLM is capable. Microsoft executives might not understand how bad an idea CoPilot when marketed as a tutor. Or maybe they realized by now but have not figured out a way to pivot and salvage it?

And this means the circle of jobs executives think can be replaced by LLMs is larger than the circle of jobs that can be replaced by LLMs. So labor has to worry about a wider displacement blast radius (and consumers need to worry about degraded quality or reliability).

...

The minor detail that we can trick ourselves about LLM means society is creating bigger problems for itself and people are splintered and thus ill prepared to react to these problems. Yeah, that is kinda sad.

As far as I can tell the majority of people should be focused on the problem of exec who thinks LLMs are more competent and tries to replace labor. That problem is closer. AI Art is the might topic there, but AI Text will follow eventually.

Some people should be focused on the problem of AI companies having insufficient focus on AI safety research. We would be better off if OpenAI had understood ChatGPT's downsides better before it was released. At minimum they should have know about its ability to pass the Turning test and the reason it hallucinates. This would also help AI safety research in general.

AI safety researchers should continue to worry about AGI alongside current AI risks. AGI is a long ways off (I think?), but it is a problem that you can only work on before it arrives.

https://www.youtube.com/watch?v=DwhD21TO6eo

Zim: Insufficient data!?! Can't you just make an educated guess!?!

Computer: Okaaaay... Uh... Founded in 1492 by... uh... ...demons, the FBI is a crack law enforcement agency... designed to... uh... I dunno... Fight aliens?

Zim: I knew it! This is bad! This is so bad!

Very apt. :smallbiggrin:

Anyway, yeah, GPT-4 isn't going to ever get there. GPT-5 maybe, but as many point out not without some kind of specialized access to rules info and DMing instructions that it can reference.

If I were required to design an AI hybrid model where the user cared about accuracy but I was required to use a LLM as the core, it would probably follow this analogy:
Imagine a bookshelf with books.
Step 1: Randomize the bookshelf
Step 2: Validate if the books are in a valid order
Step 3: Repeat steps 1 and 2 until the books are in a valid order

Depending on the requirements we have for valid orders of the books (how much we care about accuracy), this process would be worse than the validator (the non LLM) doing all of the work itself.

In the case of DMing, I expect a hybrid GPT-N model could be used for a cRPG, but would be inefficient as a DM of a TTRPG compared to a human DM with a LLM DM assistant.

Bohandas

2024-04-06, 04:47 PM

There was a court case where a lawyer used ChatGPT to look up case law (library science right there) but the LLM did not care about accuracy, so it "hallucinated" when it followed its prioritizes and the predicted next words formed fictional cases because they felt like a more natural response to the question

AI can;t help but be creative. Yet another place where "serious" science-fiction got it completely upside-down

This. Based on the tiny sample size that is my coworkers... let's just say, your typical American white collar worker doesn't grok what AI actually is, and based on what MS is foisting on us, their fears of a robotic overlord ala Matrix or Terminator are massively overblown.

I want to write a story where a rogue AI takes over some country that's already a militaristic totalitarian dictatorship and the rest of the world's panicking "aaaaugh! Rogue AI!" but the people in the dictatorship are like "actually this is the best leader we've had in 25 years"

(Also another one where the moral is that corporations are already paperclip maximizers, just with money instead of paperclips)

NichG

2024-04-06, 09:07 PM

Focusing in on the 'how to make a good hybrid' part of this to try to avoid a rant spiral...

LLMs are best as translation layers between things, rather than as the engine themselves.

Want to be able to handle searches like 'I need an enemy that is threatening to wizard-heavy parties' or 'What sort of monster might live in a lake of blood?', you can use an LLM to parse that natural language into more formal search queries against a database and return monsters from that database that are guaranteed to at least exist and not be hallucinations. If the LLM misinterprets the search query, worst case is you get no results or inappropriate results - but its not like there are much better alternatives available for indirect search questions like that if you don't personally have encyclopedic knowledge of all the monster manuals or lots of mythology. So a use like that is likely to be able to add value above just doing it yourself, and its sharply bounded as to its ability to actually detract.

So if you want to automate the entirety of running a game, I'd look at the specific places where you need to either interpret open-ended inputs or generate open-ended, creative outputs. Then just use LLMs in those cases to fill the gaps, and program the rest like you would program a CRPG. Don't try to get an LLM to run D&D combats, use an engine like BG3 or whatever and just have the LLM be able to inject new objects, make spot modifications, etc through a formal command language that its trained to translate natural language scene descriptions and action descriptions into. That way you're not leaning on the LLM to say what happens, you're just leaning on the LLM to e.g. generate a list of objects to spawn on a randomly generated tavern table, or a list of NPCs likely to be in the main room of the tavern, or to decide whether a given NPC is angered or cowed by a certain natural language threat, or things like that.

Will there still be exploits, weird inconsistencies, etc? Sure. But you're making your job a lot easier by not trying to get a statistical model to understand rules text enough to apply it consistently and track HP and status of 20 NPCs. I mean, that's hard for humans too!

OldTrees1

2024-04-07, 01:52 AM

AI can;t help but be creative. Yet another place where "serious" science-fiction got it completely upside-down
This depends on the type of AI. I assume the science-fiction in question is talking about a different previously known type of AI. If so then it might be correct. For example the AI in self driving cars and the science-fiction stories about AI misevaluating something new that was not in its training environment.

I know the LLM type of AI (generative AI) is all the rage right now, but it did not replace the other types of AI.

Focusing in on the 'how to make a good hybrid' part of this to try to avoid a rant spiral...

LLMs are best as translation layers between things, rather than as the engine themselves.

Want to be able to handle searches like 'I need an enemy that is threatening to wizard-heavy parties' or 'What sort of monster might live in a lake of blood?', you can use an LLM to parse that natural language into more formal search queries against a database and return monsters from that database that are guaranteed to at least exist and not be hallucinations. If the LLM misinterprets the search query, worst case is you get no results or inappropriate results - but its not like there are much better alternatives available for indirect search questions like that if you don't personally have encyclopedic knowledge of all the monster manuals or lots of mythology. So a use like that is likely to be able to add value above just doing it yourself, and its sharply bounded as to its ability to actually detract.

Limiting the LLM to modifying the query input to a more reliable core engine, and not letting it modify the output dramatically limits the range of the hallucinations. It can only ask for the wrong information. Providing an "inappropriate result" is still a hallucination, but it is restricted. That hybrid model design is a good improvement.

Bohandas

2024-04-07, 10:05 AM

This depends on the type of AI. I assume the science-fiction in question is talking about a different previously known type of AI. If so then it might be correct. For example the AI in self driving cars and the science-fiction stories about AI misevaluating something new that was not in its training environment.

What I'm talking about is that in sci-fi whenever there's a character who's a big computer or a robot or something and who also has a lot to say they're usually overly serious and logical (unless the sci-fi program in question is a comedy)