Contemplating Encounter Types for Solo "Playtesting" [Archive]

PhoenixPhyre

2022-03-03, 10:58 AM

Quotes because the real play-test happens when other people get their hands on the product and it's being used in a real game. But doing mock testing can give a lot of information that bare number-crunching doesn't. At least...if it's varied and useful enough. Note: this would also serve as a set of possibly-useful challenges for the various arguments out there (which tend to focus on known-in-advance, 1-combat scenarios with all the details slanted to favor whoever is making the argument at hand).

I'm considering the following setup, please tell me if I'm missing anything:

A test party consists of 4 basic rules[1] characters + the subject of the test (SUT), each of level X (chosen per test). That is:
* Human Champion Fighter (SnB)
* Hill Dwarf Life Cleric
* High Elf Evocation Wizard
* Subject of the test, with whatever modification is necessary. Basic rules races/classes aren't assumed for this one.
using point buy with no feats or multiclassing. Higher levels (7+) can assume that weapon users have a magic weapon (but no presumption of a +X). Other than that, use the high magic starting at higher levels guidelines. For the purpose of the test, you can buy magic items at the median of the Xanathar's prices out of your starting gold, but each character acts individually (so no pooling the gold on the test character).

A test run consists of one "adventuring day" (ie no long rests). Each set has the following properties:
Length: How many combats there are in that particular "adventuring day". A number between 1 and 10, rolled randomly.
Short Rest Frequency: If length > 1, this is the number of fights between each short rest (with any shortfall ending up at the end of the day). A number between 1 and length, rolled randomly.
Encounters generated for each day are shuffled randomly.

Each encounter has the following properties:
Difficulty: Chosen from the usual spectrum, with the addition of Deadly x N categories above deadly, formed by adding another bracket of width equal to Deadly on to the end N times.
Type: See spoilered list, chosen randomly

Solo: A single enemy of appropriate difficulty. Probably should be legendary and/or mythic.
Elite Squad: A small group (3-5) enemies of roughly equal CR.
Master and Minions: One strong monster and many (~5-8) much weaker minions.
Hordes: Lots (8-16) small monsters. Often goes well with waves.

Starting Conditions: One of the following, chosen after Type (because some combinations don't make sense). Close == start within 60'. Far == start 60 < x < 200'.

Straight up, Close|Far: No special conditions
Attempted Ambush, Close: One side or the other (50% chance) is stealthy (roll stealth and adjudicate surprise as normal)
Assault, Close|Far: The party is assaulting fortifications. The enemies have substantial cover to start with.
Surrounded, Close: The enemies come from multiple sides. Not compatible with Solo :smallwink:
Lair, Close: The party is entering the enemies' lair (relevant mostly for solos and creatures who have lair properties).

Terrain: One from the list below, chosen randomly.

Hostile: Lava pits, exploding geysers, cliffs, whatever you can think of. But there's environmental damage coming, possibly avoidable or exploitable.
Chokepoint: There's a natural chokepoint on the map.
Wide Open: High ceilings and distant walls, or out in the open.
Obstructed: Lots of cover (such as trees, walls, rubble, etc).
Underwater: Like Wide Open, but, well, wet.
Difficult: Instead of obstructions, there's just difficult terrain (swamps, bogs, streams, dense vegetation, whatever).

Composition: Ideally would be generated completely randomly from appropriate creatures based on CR. The goal is to have a wide variety and avoid biases.
For ease of use (despite not being really realistic), the terminal condition of these fights is always total victory.

Things to watch for:
* Comparing the SUT to the closest relevant role of the other party members. Do they dramatically outperform (against the basic rules party this is likely, because the baseline is low)? Underperform (a warning sign because the baseline is low, when comparing to the basic rules party)?
* Endurance over long days
* Nova capability over short days
* Particular things they have difficulty with or are really good at
* ??

[1] If your table has a different set-point and you're building for a particular group, go ahead and use that. But try to keep it pretty generic and not intentionally synergistic with the SUT. The point isn't to see how far you can push the SUT, it's to compare to a generic baseline of whatever power level you're shooting for.

Magikeeper

2022-03-03, 10:51 PM

The baseline party, particularly the fighter, would be more useful if you at least gave them better PHP subclasses. For fighter, battlemaster would likely be the one to go with. Otherwise any decent homebrew is likely to at least slightly overshadow the champion fighter and you're left wondering if that's okay or not (because champion is itself overshadowed by other widely played subclasses).

--------

That said, for solo checks there are a number of spreadsheets floating around the web that calculate expected DPR for a great many non-caster class+subclass combos lv1-20 under various circumstances. You can also get a general feel for what kinds of abilities are available at what level. You should also assume both w/feat and no feat play, otherwise you might end up with a class that combos with feats in a powerful way existing classes do not.

PhoenixPhyre

2022-03-03, 11:03 PM

The baseline party, particularly the fighter, would be more useful if you at least gave them better PHP subclasses. For fighter, battlemaster would likely be the one to go with. Otherwise any decent homebrew is likely to at least slightly overshadow the champion fighter and you're left wondering if that's okay or not (because champion is itself overshadowed by other widely played subclasses).

The game's baseline is the basic rules. Sure, you can go above that, but the point of that particular check is to check the other side. If you're less functional than a champion fighter at that role, there's something really bad going on. Plus, one of the points of the rest of the party being dead simple is that you don't have to think about playing them. They're the equivalent of bot players.

That said, for solo checks there are a number of spreadsheets floating around the web that calculate expected DPR for a great many non-caster class+subclass combos lv1-20 under various circumstances. You can also get a general feel for what kinds of abilities are available at what level. You should also assume both w/feat and no feat play, otherwise you might end up with a class that combos with feats in a powerful way existing classes do not.

I DO NOT trust those spreadsheets to actually mean anything in actual play. Seriously--the assumptions made dominate the result entirely, starting with the "infinite-health training dummy in a blank room" nature of them. And this kind of play-testing is much more about things other than DPR (which honestly rarely matters much). It's about how the abilities play together in a range of circumstances.

Frogreaver

2022-03-03, 11:19 PM

The problem is that you can't run enough repeated trials to get a real feel for how these classes actually perform. Running through a mock encounter 1 time tells you nearly nothing. Running through it 4-5 tells you a bit more but still not enough for the performance to overcome the randomness.

The 2nd issue is that there is no representative adventuring structure for the party or the adventuring day. So, whatever you test against doesn't reveal much more than whether a particular character in a particular party, played using certain tactics, performed well against a specific adventuring day. Change the creatures encountered, number of encounters, number of rests, the party makeup, really nearly anything, and you can walk away with drastically different results.

You mentioned whiteroom being bad because the assumptions made drive the outcome, this kind of an exercise is even worse for that!

Kane0

2022-03-04, 12:01 AM

Would this thread be helpful?
https://forums.giantitp.com/showthread.php?637936-Halloween-Tactical-Challenge

PhoenixPhyre

2022-03-04, 12:41 AM

The problem is that you can't run enough repeated trials to get a real feel for how these classes actually perform. Running through a mock encounter 1 time tells you nearly nothing. Running through it 4-5 tells you a bit more but still not enough for the performance to overcome the randomness.

The 2nd issue is that there is no representative adventuring structure for the party or the adventuring day. So, whatever you test against doesn't reveal much more than whether a particular character in a particular party, played using certain tactics, performed well against a specific adventuring day. Change the creatures encountered, number of encounters, number of rests, the party makeup, really nearly anything, and you can walk away with drastically different results.

You mentioned whiteroom being bad because the assumptions made drive the outcome, this kind of an exercise is even worse for that!

The whole point is this is you don't do single encounters. You do at minimum days. Consisting of random numbers of varying encounters, where most of the setup isn't decided at build time. And then do so multiple times varying the parameters (at least re-rolling the encounters and types and numbers and things).

Actual campaign running is best, but I'd say running (artificial) days is better than running individual encounters is better than any white room theory based on statistical measures. Because you can't really take averages; the distribution isn't large enough for that, since the target numbers aren't fixed.

Basically, the whole point of this is to partially relax the oppressive assumptions you have to make to do that spreadsheets and get something that, while it's not really real, is a heck of a lot closer.

You could do the same thing by running random dungeons. But that's lower density, so it'd take more effort for the same number of encounters. And a lot more setup and decision making.

Edit: so as an example, a single test run at level 1 might look like (using randomization):
Length = 4 encounters.
Short rest frequency = 4. So no short rests that day.
Encounters (to be run in random order):
1. Deadly. Horde. Straight up, close range. Obstructed. 7 goblins (totals adjusted to stay at difficulty). 900 total (adjusted) XP.
2. Easy. Horde. Surrounded, close range. Chokepoint in players favor. 9 hyena. 225 adjusted XP.
3. Medium. Elite Squad. Surrounded, close range. Underwater. 2 lizardfolk. 300 adjusted XP.
4. Medium. Elite Squad. Straight up, far. Hostile terrain (rockfalls). 2 black bears. 300 adjusted XP.

PhantomSoul

2022-03-04, 09:25 AM

Would this thread be helpful?
https://forums.giantitp.com/showthread.php?637936-Halloween-Tactical-Challenge

I was expecting this:

https://www.dndcombat.com/ as discussed in https://forums.giantitp.com/showthread.php?635817-I-ve-created-a-free-5e-D-amp-D-combat-simulator-website

I think days over encounters is good to test where the better balance points are for characters/classes. But it also means you effectively need even more data to make reliable conclusions (which is where a simulator could be handy). Everything's a white room when it isn't currently at your table, but if you can play with the parameters you can figure out what sorts of factors have what sorts of effects. (Sure, some is intuitive and details about which spell is cast when or how you target foes as a group are going to make a big difference, but in an ideal world those could also eventually be tested.)

Frogreaver

2022-03-04, 09:56 AM

The whole point is this is you don't do single encounters. You do at minimum days. Consisting of random numbers of varying encounters, where most of the setup isn't decided at build time. And then do so multiple times varying the parameters (at least re-rolling the encounters and types and numbers and things).

Actual campaign running is best, but I'd say running (artificial) days is better than running individual encounters is better than any white room theory based on statistical measures. Because you can't really take averages; the distribution isn't large enough for that, since the target numbers aren't fixed.

Basically, the whole point of this is to partially relax the oppressive assumptions you have to make to do that spreadsheets and get something that, while it's not really real, is a heck of a lot closer.

You could do the same thing by running random dungeons. But that's lower density, so it'd take more effort for the same number of encounters. And a lot more setup and decision making.

Edit: so as an example, a single test run at level 1 might look like (using randomization):
Length = 4 encounters.
Short rest frequency = 4. So no short rests that day.
Encounters (to be run in random order):
1. Deadly. Horde. Straight up, close range. Obstructed. 7 goblins (totals adjusted to stay at difficulty). 900 total (adjusted) XP.
2. Easy. Horde. Surrounded, close range. Chokepoint in players favor. 9 hyena. 225 adjusted XP.
3. Medium. Elite Squad. Surrounded, close range. Underwater. 2 lizardfolk. 300 adjusted XP.
4. Medium. Elite Squad. Straight up, far. Hostile terrain (rockfalls). 2 black bears. 300 adjusted XP.

It’s even worse that it’s randomized encounters. I could run the same party through 1 trial and have great results. I could turn around on trial 2 and have terrible results because of the specific encounters given.

Ultimately the parameters you use to set up the exercise decide the outcome. In that respect it’s no different than a white room. It also has additional issues around doing enough trials to overcome the randomness of the d20.

KorvinStarmast

2022-03-04, 10:09 AM

I'm considering the following setup, please tell me if I'm missing anything:

A test party consists of 4 basic rules[1] characters + the subject of the test (SUT), each of level X (chosen per test). That is:
* Human Champion Fighter (SnB)
* Hill Dwarf Life Cleric
* High Elf Evocation Wizard
* Subject of the test, with whatever modification is necessary. Basic rules races/classes aren't assumed for this one.

Using point buy with no feats or multiclassing.
Higher levels (7+) can assume that weapon users have a magic weapon (but no presumption of a +X).
Other than that, use the high magic starting at higher levels guidelines.
For the purpose of the test, you can buy magic items at the median of the Xanathar's prices out of your starting gold, but each character acts individually (so no pooling the gold on the test character).
My thoughts:
1. The test party consists of four characters: 3 Basic Rules characters + the subject of the test. I think that's what you meant, based on the Champion, Cleric, Wizard +SUT.
2. Wordsmithing suggestion: Subject of the Test is either ST or SotT, not sure why SUT was used.
3. Magic Items: I recommend providing a constrained/curated list, and decree that no legendary items and no artifacts are included. The variability of the power of items, on the uncommon list alone, could skew your results to the point of invalidating your play test. Things like a ring of spell storing being an obvious example of "nope" based on what your test seems to want to achieve. That curated / vetted list needs to be one of the bounds of your test scheme.
4. I like the variability of rest lengths.
5. I will suggest that the "Adjusted XP budget for the day" value be used as the encounter budget so that you have a consistency for the day, or, if that's not enough to stress the test party, DMG listed XP budget times {X} (1.5, 1.2 whatever). While it's a general guide for informing daily resource limitations, I have found it to be useful as a framework for building daily encounters.
6. I have also found that the difference between 4 and 5 PCs to be enough to agree that 4 is the correct number for your test.

PhoenixPhyre

2022-03-04, 10:37 AM

It’s even worse that it’s randomized encounters. I could run the same party through 1 trial and have great results. I could turn around on trial 2 and have terrible results because of the specific encounters given.

Ultimately the parameters you use to set up the exercise decide the outcome. In that respect it’s no different than a white room. It also has additional issues around doing enough trials to overcome the randomness of the d20.

And that fact (that there's strong variance in capabilities) is exactly the point. This is not a DPR calculation exercise (which is all the simulations give you). DPR is an utterly useless fact, especially when it's calculated by hitting an infinite-health training dummy that never moves or fights back. DPR is not the important thing in the game.

And you can't just assume that the noise from the d20 will average out--a real campaign (where anything is designed to actually be used) will have random encounters (or at least varying encounters). And won't have enough encounters vs a target of the same AC and other parameters to reach statistical clarity. So those simulations? Don't actually provide any value except digital ego measurement online (look how big the numbers on my build are!). I'm saying this with experience--I've run those simulations for my various homebrew. And how they performed in real campaigns was quite different, exactly because of all the other factors. Including noise. Noise is important. The simulators give a false precision that is highly misleading.

In computer programming terms, this is fuzz testing. See what happens when you throw a huge range of variance at them. And not just generate a single summary statistic, but keep track of their performance. So this build does really well against <type & conditions> but sucks it up against <other type and conditions>. That's valuable data. And doesn't really depend on rolls. Or if it does, then that's important information as well! A build/class that is highly luck-based is a very different thing than one that's consistent. And the simulations can't say anything about that, because they ignore that and just take averages. Low probability events happen all the darn time in a real campaign. Including the time where I rolled nat 1s 5-6 consecutive times. Ie the entire combat. That monster? Did nothing the entire combat. Certainly not what the simulators would say. But that was the only appearance of that monster in the campaign, so no large numbers to balance it out. And even if it had appeared again, it'd have been a different entity in a different fight. Simulators act like you can just sum up all those into one big pool. But they're separate distributions that you can't just merge.

Now of course if you have a smart enough simulation that it can actually track round by round in a real environment, it could be used to dramatically speed this up. But the varying parameters (exploring the parameter space) is critical to actually measuring anything of worth.

My thoughts:
1. The test party consists of four characters: 3 Basic Rules characters + the subject of the test. I think that's what you meant, based on the Champion, Cleric, Wizard +SUT.
2. Wordsmithing suggestion: Subject of the Test is either ST or SotT, not sure why SUT was used.
3. Magic Items: I recommend providing a constrained/curated list, and decree that no legendary items and no artifacts are included. The variability of the power of items, on the uncommon list alone, could skew your results to the point of invalidating your play test. Things like a ring of spell storing being an obvious example of "nope" based on what your test seems to want to achieve. That curated / vetted list needs to be one of the bounds of your test scheme.
4. I like the variability of rest lengths.
5. I will suggest that the "Adjusted XP budget for the day" value be used as the encounter budget so that you have a consistency for the day, or, if that's not enough to stress the test party, DMG listed XP budget times {X} (1.5, 1.2 whatever). While it's a general guide for informing daily resource limitations, I have found it to be useful as a framework for building daily encounters.
6. I have also found that the difference between 4 and 5 PCs to be enough to agree that 4 is the correct number for your test.

1. I mainly just forgot the poor rogue. He must have succeeded on his Dexterity (Stealth) check against my passive perception. I intended 5 (so you could always compare to something).
2. Sorry, switched terminology (doing too much unit testing). SUT is Subject Under Test, the "standard" term for the piece of the code you're testing.
3. There's just not enough gold in the parameters to buy legendary items or artifacts.

20,000 gp plus 1d10 × 250 gp, three uncommon magic items, two rare items, one very rare item, normal starting equipment

But I get the point.

5. Personally I've found it of little use. But that's one way to generate the difficulties. Which, to note, aren't determined randomly. That's an input of the test, to be chosen at test time and manually varied.

6. There's variance, but comparability is also valuable. You could do either one.

KorvinStarmast

2022-03-04, 10:42 AM

1. I mainly just forgot the poor rogue. He must have succeeded on his Dexterity (Stealth) check against my passive perception. I intended 5 (so you could always compare to something). Then use daily budget times 1.3 or something like that.

2. Sorry, switched terminology (doing too much unit testing). SUT is Subject Under Test, the "standard" term for the piece of the code you're testing.
Go it.

3. There's just not enough gold in the parameters to buy legendary items or artifacts.
OK, then where's the ceiling on test parties? Level 15 or 16? (8th level spells but non 9ths).
You might want to add that as a constraint to bound the testing scheme (how big is my sandbox? is a question we asked a few times (has it really been 15 years ago?) as we put various software packages into the "how to break it" phase of development).

5. Personally I've found it of little use. But that's one way to generate the difficulties. Which, to note, aren't determined randomly. That's an input of the test, to be chosen at test time and manually varied. Understood, but with 5 I'd add a multiple as a default. 1.3 is a SWAG, 1.5 might be a better choice.

PhoenixPhyre

2022-03-04, 10:51 AM

Then use daily budget times 1.3 or something like that.
OK, then where's the ceiling on test parties? Level 15? You might want to add that as a constraint to bound the test.
Understood, but with 5 I'd add a multiple as a default. 1.3 is a SWAG, 1.5 might be a better choice.

Even a level 20 party can't buy a legendary. And artifacts don't have Xanathar's prices. 21.375 kgp (the average for a T4 character by the guidelines) doesn't even buy you a very rare (xanathar's average is 35,000 gp). And magic items are per person, with the prohibition on pooling cash. Are there particular items that need scrubbing? Sure. But this isn't designed to be used "in hostile terrain" (ie by people trying to prove stuff). It's a tool for designers to use to judge their own creations. And intentionally screwing the test, while it can give information (ie this build gets way stronger if given <tool X>), is not generally something I'd assume a designer would do. So it's not proofed against malicious intent.

As for budget, I'm not constraining how someone using this decides the difficulty. If they want to distribute it based on the encounter budget (x some multiple)? That's tester prerogative. You'd learn different things with different settings. So I'd figure each person using it would pick their own baseline similar to what the baseline they use for their campaigns is.

As a note, I don't use xp budgeting at all as a DM. I don't actually calculate encounter difficulties or budgets. I just put things on the map until it feels right for the narrative so far and the location. :smallwink: Doing a hard fight earlier doesn't make the next ones any easier.

KorvinStarmast

2022-03-04, 11:06 AM

Even a level 20 party can't buy a legendary. Copy all. So rares and below.

As a note, I don't use xp budgeting at all as a DM. I don't actually calculate encounter difficulties or budgets. I just put things on the map until it feels right for the narrative so far and the location. :smallwink: Doing a hard fight earlier doesn't make the next ones any easier. Nor does it stop the average dragon rider from charging in, regardless. :smalltongue:

I used it a lot when I was first DMing this edition; backwards planning when I wasn't running a published module (which was most of my early DMing). I had no intuition on encounter "feel" so I used that crutch. I now have developed enough systems 'feel' to use it a lot less often. (The German term Fingerspitzengefühl is what I mean by 'feel' here; it's a term that my first piano teacher drilled into my head).

PhoenixPhyre

2022-03-04, 11:34 AM

Copy all. So rares and below.
Nor does it stop the average dragon rider from charging in, regardless. :smalltongue:

Nope. No plan survives first contact with the enemy. Although for constants like that, maybe the plan should include that as a given =).

I used it a lot when I was first DMing this edition; backwards planning when I wasn't running a published module (which was most of my early DMing). I had no intuition on encounter "feel" so I used that crutch. I now have developed enough systems 'feel' to use it a lot less often. (The German term Fingerspitzengefühl is what I mean by 'feel' here; it's a term that my first piano teacher drilled into my head).

I did as well, but it always let me down. And things never go quite how I plan (see above, except you're not my enemies), so I just wing it. I've actually run a number of fights without pre-writing stat blocks for certain enemies. I knew what their (narrative) capabilities and rough parameters were, but just translated that into mechanics on the fly.

CR and adventuring day budgets were designed, according to Mike Mearls, as a crutch for people who don't have the sense for their particular table. Not as hard-and-fast rules or even guidelines. They're helps designed to be thrown away as you dial in your party's needs. Which is a refreshing take. Encounter balance is not a science. It's an art with lots of subjective components. Some tables like the default set to "you're gonna die". Others want cinematic battles with very little actual risk, just a lot of high-flying flashy stuff. And then there are other people who want different things.

It's another reason I'm in favor of subjective measures of balance like "does this feel right" over purely objective things like DPR. DPR has always felt to me[1] like the lamppost problem[2]. Sure, it's easy to calculate and compare. But it's not actually what we want to know, most of the time.

[1] except in strongly varying things. If the DPR of one build is an order of magnitude greater than another of similar niche, there's something wrong with one or the other. But those rarely happen outside cases of pushing the limits of optimization (and anti-optimization).
[2] The drunk looking under the lamppost for his keys because it's brighter there.

zlefin

2022-03-04, 06:08 PM

For what actual uses is this aiming? While the fuzz testing idea is good; it seems like the variability is high enough that you'd need a computer actually running things to get even moderately useful tests out of this due to the high variability in each potential run of a day.

Has anyone adapted any of the old 3.x era testing systems? Some of those might work well if suitably modified for changes in creature cr and the cr system. With some digging I located one of them: the same game test.
https://dungeons.fandom.com/wiki/Dungeons_and_Dragons_Wiki:The_Same_Game_Test

DPR is far from perfect; but it remains a useful benchmarking system. Much like you describe, it's most helpful for people who haven't got a good feel for the system yet. It's also more relevant for those homebrewing rather than for play or balance discussions.

Is there any good way to incorporate non-combat encounters into your system?

PhoenixPhyre

2022-03-04, 06:50 PM

For what actual uses is this aiming? While the fuzz testing idea is good; it seems like the variability is high enough that you'd need a computer actually running things to get even moderately useful tests out of this due to the high variability in each potential run of a day.

Only if you're focusing solely on quantifiable data. Which is only a tiny fraction of what makes a class work. Which is also why DPR fails as a metric--it only matters for huge differences. Because the assumptions made in calculating it swallow any use it has.

DPR calculations other than the most basic suffer from the lamp-post[1] issue. And generally make assumptions about the whole day that are far from realistic.

The point of this is to get a feel for how something plays. For testing things like "do these mechanics work well together? Did I miss something (like bonus action contention or just fiddliness or awkwardness)? It's not a benchmarking tool primarily, except in the "big red flags" form (ie if the class is underperforming a robotic regular human SnB Champion, something's wrong).

Has anyone adapted any of the old 3.x era testing systems? Some of those might work well if suitably modified for changes in creature cr and the cr system. With some digging I located one of them: the same game test.
https://dungeons.fandom.com/wiki/Dungeons_and_Dragons_Wiki:The_Same_Game_Test

DPR is far from perfect; but it remains a useful benchmarking system. Much like you describe, it's most helpful for people who haven't got a good feel for the system yet. It's also more relevant for those homebrewing rather than for play or balance discussions.

Any build that fails the same-game test (adapted) is unsuitable for play entirely and obviously. And requires significant anti-optimization or strong differential optimization. The gaps are way narrower in 5e (yes, even for full casters vs regular martials).

And white-room DPR actually doesn't mean anything of note for anything but internet arguments. Because it's measuring things to a fraction of a unit precision when the real noise (including statistics but mostly variance in encounters and adventuring days) is in the tens of units. It's false precision. Unless the DPR is off by a factor of 2 or more, it's within the noise.

Is there any good way to incorporate non-combat encounters into your system?

Not right now. But those would likely be handled separately, because the possible variance there is enormous and highly DM dependent. And (at least for me), most non-combat encounters are much more about the player and the character's connection to the world (background, backstory, personality, etc) than the character's mechanical properties.

[1] in this case, using a metric that's tractable to compute rather than one that actually measures any part of the system of real interest. Just because you can measure it doesn't mean it means anything.

Frogreaver

2022-03-04, 10:04 PM

And white-room DPR actually doesn't mean anything of note for anything but internet arguments. Because it's measuring things to a fraction of a unit precision when the real noise (including statistics but mostly variance in encounters and adventuring days) is in the tens of units. It's false precision. Unless the DPR is off by a factor of 2 or more, it's within the noise.

Single target DPR for ranged characters has minimal variance (especially if they have some way of still attacking normally when an enemy gets to melee ranges).

Melee often has alot of variance though.
AOE has a huge variance.

KorvinStarmast

2022-03-04, 10:20 PM

Nope. No plan survives first contact with the enemy. Footnote required, Murphy's Laws of Combat. :smallwink:

I did as well, but it always let me down. It was a tool, not a Hard Truth.

They're helps designed to be thrown away as you dial in your party's needs. Yes dialable difficulty is a good thing.
{snip DPR digression}
As I am not a "DPR au outrance" advocate, not sure why that's coming my way.
My bard (Dil) is IMO a classic example of "DPR isn't the only measure of effectiveness" - I hope you agree, since you got to see her operate from levels 1 to 20.
(And if not, why not?)

Frogreaver

2022-03-04, 11:48 PM

For what actual uses is this aiming? While the fuzz testing idea is good; it seems like the variability is high enough that you'd need a computer actually running things to get even moderately useful tests out of this due to the high variability in each potential run of a day.

That's what I've been trying to say.

DPR is far from perfect; but it remains a useful benchmarking system. Much like you describe, it's most helpful for people who haven't got a good feel for the system yet. It's also more relevant for those homebrewing rather than for play or balance discussions.

DPR isn't perfect. But it's also not as imperfect as it's detractors try to make it out - especially as a single comparison point for martials.

Magikeeper

2022-03-05, 12:56 AM

I DO NOT trust those spreadsheets to actually mean anything in actual play. Seriously--the assumptions made dominate the result entirely, starting with the "infinite-health training dummy in a blank room" nature of them. And this kind of play-testing is much more about things other than DPR (which honestly rarely matters much). It's about how the abilities play together in a range of circumstances.

If you look into how the calculations were formed it's good for getting the gists of what kind of at-will damage is going on. The components matter as well - a class with a 2d10 weapon attack will be viewed suspiciously even if the class, if taken as a whole, would prove fine in practice. Most DMs are not going to do anything as extensive as what you're suggesting (or rather virtually none of them will do that) - their first impression is going to be formed via somewhat flawed metrics like DPR. Also how the abilities are worded - always use standard phrasing if you can*.

What is the #1 goal of homebrew, at least homebrew that isn't being created for someone's personal game only? To make it into games and be enjoyable. Making it into games is why you check stuff like DPR calculations. It doesn't matter all that much at all in play, I'll agree, but it can matter a whole awful lot when you're trying to get DMs to not knee-jerk declare something OP dredge - or avoid a player getting envious. I'd say those calculations are more useful for homebrewers than optimizers.

Homebrew has to be careful about how far it strays from norms - anything unusual about it should be an important aspect of what the homebrew is setting out to do. If dealing unusually high/low DPR amounts isn't the homebrew's thing, then try to stay within the bounds. Sure, there are cases where someone might decide to ignore the norms for some reason or another, but they should still be familiar with the norms before they decide to go outside of them.

*speaking of which I need to do another phrasing pass on the Ozodrin, replace all the ' with ft/feet...