Averages are a wasted argument - sample size matters [Archive]

Schwann145

2022-03-23, 01:37 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

Catullus64

2022-03-23, 01:44 PM

Because people really like to lend their arguments an air of indisputable objectivity, and citing averages can do that for the uncritical reader. But to do so, they almost always have to assume conditions and adjustments that no longer resemble real play.

D&D is a field science, not a laboratory one. (Rhetorical allegory - it's not a science at all.)

MoiMagnus

2022-03-23, 01:48 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

Because there is a high enough sample of players for the average to be reliable at the level of a community.
Because otherwise there is not much to say on the optimisation side. Sure, the theory and practice might differ because of sample size, but it's already night and day because of the GM's playstyle.
Because even on small samples, that's a good first approximate before going into more details about the variance & stuff.

OvisCaedo

2022-03-23, 01:57 PM

When it comes to multiple dice, probability also tends to bell curve strongly around the average. 2d6 has an average of 7, but you'll still only actually get a result of 7 one out of six times. But if you extend the range slightly to around the average, the chances of getting a 6/7/8 shoot up to around 48%.

This probably is not anywhere near as cleanly true when it comes to calculating for whether attack rolls will hit or miss, but there's not really anything more helpful to talk about on the numbers side than the average.

Tanarii

2022-03-23, 02:03 PM

A sample size of 40 has a margin of error of +/- 15%.

For example, 2 attacks per round over a full of combat adventuring day of 20 rounds.

So if you know your average hit is 65%, you could calculate DPR for 50 to 80% as a range over such a day.

Of course, hit rate not actually being a fixed 65% is a whole different matter :)

Definitely true that variance is something often not considered when discussing optimization.

Pildion

2022-03-23, 02:07 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

That's the main thing about "white room theory crafting" its never that way in real game play haha. Alot of people just like to crunch numbers, but averages are not wrong. You just can't normally get your optimal situation in real game play so the numbers people come up with are often "wrong" when playing out a game.

Asisreo1

2022-03-23, 02:07 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?
Yeah, it's a fair and accurate criticism. However, when comparing similar occurrences with similar variances, they're still worthwhile to compare.

Fairly simple, but knowing how much better 3d8 is than 3d6 can be important when choosing between those types of options.

I do believe variance, covariance, and standard deviations are important measures as well.

Segev

2022-03-23, 02:26 PM

I suppose a valid response question is, "what should we use instead?"

I doubt you're going to make an argument that a d4 weapon is better than a d8 weapon "in some cases" based solely on the fact that somebody might roll a 3 or 4 on every attack in a given session, while the other might roll nothing above a 3 on his d8 in that same session, just because they each only hit 4 times.

Frogreaver

2022-03-23, 02:27 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

ThatÂ’s only true if we were trying to back into the average from trial data. We arenÂ’t typically doing that. We are typically mathematically determining the average from the known distribution.

What this gives us is a very good metric for damage competence that can be compared across classes and builds. And while variance might could be a consideration, it will affect them all. There is also extremely limited control you as a player have over random variance.

Variance is one reason I say that a +1 str mod bonus in practice will rarely have an impact on outcomes, even though itÂ’s a decent amount of additional individual DPR.

To me the more important consideration than variance is party capabilities and how yours factor into the Â‘full partyÂ’ picture.

KorvinStarmast

2022-03-23, 02:34 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them? Because one can pretend to have proved a point.
Definitely true that variance is something often not considered when discussing optimization.
I have brought this up on multiple occasions at GiTP, during the first few years, and not been paid attention to. (Kryx/Krix comes to mind as one of the number crunchers who did not come off as being accepting that point in a discussion we had-sunk cost, given the spread sheets he used and had posted for others to examine/preruse).

But averages are not, I don't think, a completely 'wasted argument' since showing the mean/average on various die rolls has some helpfulness for the casual discussion.
It's dying in a ditch over them in the quest for how to optimize DPR that is such a waste of bandwidth.

Sigreid

2022-03-23, 02:34 PM

In those discussion, that's basically the best people have to work with.

PhoenixPhyre

2022-03-23, 02:46 PM

In those discussion, that's basically the best people have to work with.

And thus we get the origins of the lamppost issue. It's the problem with metrics--you measure what you can, but just because you can measure it doesn't mean it's actually a meaningful parameter.

And in the case of DPR, the assumptions made uniformly swallow the results. Sure, there are extreme outliers where it's useful, but as long as you're within a very broad range, it's all down to the assumptions made. And even then, doing more than looking at rough magnitudes is asking more of the data than it can provide--comparing 100 dpr to 10 dpr is probably meaningful under those assumptions, but comparing 12 to 15 isn't.

There's just a lot of noise in the system, and most of the calculations made are full of unwarranted precision.

On a separate, but related note--a cooking site I frequent has standardized on a "dip and sweep" style of measuring bulk goods (such as flour and sugar), despite that being pretty low accuracy (packing matters). Why? Because that's what people in their audience actually do, and trying to be more precise than that just causes recipes to fail or wouldn't matter at all because of all the other variables in the system. They feel they're better off providing robust recipes where the differences are less important. They've said that if they were going for professional chefs, they'd write and develop completely different recipes from the get-go.

False precision is a real big issue in many things. It's basically always just advertising/marketing copy, rarely actually meaningful. Shades of the cell phone spec wars of the 2000s....

Unoriginal

2022-03-23, 02:47 PM

Representing what adventurers go through in an actual adventure and how they perform via mathematics is both ridiculously hard and amazingly futile, so people don't do it.

Of course that means that one can't actually declare that X class is better than Y or that Z outcome is a "foregone conclusion", but that gets ignored because again, the actual calculation to prove X, Y or Z is ridiculously hard and amazingly futile.

Sigreid

2022-03-23, 02:50 PM

And thus we get the origins of the lamppost issue. It's the problem with metrics--you measure what you can, but just because you can measure it doesn't mean it's actually a meaningful parameter.

And in the case of DPR, the assumptions made uniformly swallow the results. Sure, there are extreme outliers where it's useful, but as long as you're within a very broad range, it's all down to the assumptions made. And even then, doing more than looking at rough magnitudes is asking more of the data than it can provide--comparing 100 dpr to 10 dpr is probably meaningful under those assumptions, but comparing 12 to 15 isn't.

There's just a lot of noise in the system, and most of the calculations made are full of unwarranted precision.

On a separate, but related note--a cooking site I frequent has standardized on a "dip and sweep" style of measuring bulk goods (such as flour and sugar), despite that being pretty low accuracy (packing matters). Why? Because that's what people in their audience actually do, and trying to be more precise than that just causes recipes to fail or wouldn't matter at all because of all the other variables in the system. They feel they're better off providing robust recipes where the differences are less important. They've said that if they were going for professional chefs, they'd write and develop completely different recipes from the get-go.

False precision is a real big issue in many things. It's basically always just advertising/marketing copy, rarely actually meaningful. Shades of the cell phone spec wars of the 2000s....

In project management we all the time have to sort through lots of facts and try to figure out which ones matter and which don't. Sometimes we're wrong.

But there's a reason for the phrase "Lies, damned lies and statistics". Statistics can be used to mislead, not even intentionally. If you pick the wrong criteria, the wrong sample size, the wrong sample; you may not get the true estimation you're looking for. You can go an entire session rolling unusually high or low, for example.

Willie the Duck

2022-03-23, 02:53 PM

That's the main thing about "white room theory crafting" its never that way in real game play haha. Alot of people just like to crunch numbers, but averages are not wrong. You just can't normally get your optimal situation in real game play so the numbers people come up with are often "wrong" when playing out a game.

That, however, is the difference between in-play experience and theoretical expectations, not the difference between average and variance.

Anyways, sample size does matter, but play experience isn't just a handful of attacks -- it is multiple attacks per round, per side of conflict, per fight (or skill check, etc.), per game day. Dice get rolled a lot, and thus things regress towards the mean (and thus the mean is, well, meaningful). So if your character is fighting with a longsword instead of a shortsword, it can be meaningful. If you are going into the lost mines of ConstantSkeletonEnemies and they all have longswords instead of a shortswords, yeah it will get meaningful fairly quickly.

It will never of course be a pure model, and as Pildion alludes to the amount of compounding factors that will come into play (especially things like overkill or ending the day with 15 hp instead of 13 not actually mattering because you will heal back to full regardless, the 36 avg cone of cold pings off the more-often resisted Con save than the 28 avg fireball, or similar things) can absolutely dwarf (heh) any contribution an average might have (especially if it is something less frequent than primary-weapon-damage-rolled or similar.

PhoenixPhyre

2022-03-23, 02:58 PM

In project management we all the time have to sort through lots of facts and try to figure out which ones matter and which don't. Sometimes we're wrong.

But there's a reason for the phrase "Lies, damned lies and statistics". Statistics can be used to mislead, not even intentionally. If you pick the wrong criteria, the wrong sample size, the wrong sample; you may not get the true estimation you're looking for. You can go an entire session rolling unusually high or low, for example.

I'm coming from a teaching background, where we have to boil student performance into numerical metrics for grading. And before we get there, we have to conceptualize exactly what we're trying to measure. Then write tests to try to capture that performance. Except we can't actually measure what we want (because we don't have direct access to the students' consciousnesses/memory), so we have to deal with proxies. And usually proxies of proxies. And debate whether what we're measuring is actually a decent proxy for what we want to measure.

My conclusion was basically that you can get whatever you want out of such an exercise. The test-writer's inputs are much more important than the test-taker's inputs in most cases. Standardized tests can work, but only for a tiny slice of anything.

And before that, I was a PhD physics (computational) student. Where the same issues popped up[1]. With the same conclusions in many cases. The model is more determinative than the actual system under test. And during that period I was active in MMO theory-crafting (although minorly). With the same conclusions--the assumptions were swallowing the signal).

So I'm horribly horribly cynical of the value of theorycraft in general.

[1] In my particular field, there were exactly two systems we could solve directly. A hydrogen atom all alone in the universe and a single H2+ (2 protons, 1 electron, 2 atoms) molecule all alone in the universe. Everything else, we had to either approximate the solution or approximate the problem...and usually both. And a good fit between theory and experiment was one that was within a factor of 2. And most of the time, the error bars were multiple orders of magnitude in size. Of course everything was plotted log-log to hide that fact.

Sigreid

2022-03-23, 03:16 PM

I'm coming from a teaching background, where we have to boil student performance into numerical metrics for grading. And before we get there, we have to conceptualize exactly what we're trying to measure. Then write tests to try to capture that performance. Except we can't actually measure what we want (because we don't have direct access to the students' consciousnesses/memory), so we have to deal with proxies. And usually proxies of proxies. And debate whether what we're measuring is actually a decent proxy for what we want to measure.

My conclusion was basically that you can get whatever you want out of such an exercise. The test-writer's inputs are much more important than the test-taker's inputs in most cases. Standardized tests can work, but only for a tiny slice of anything.

And before that, I was a PhD physics (computational) student. Where the same issues popped up[1]. With the same conclusions in many cases. The model is more determinative than the actual system under test. And during that period I was active in MMO theory-crafting (although minorly). With the same conclusions--the assumptions were swallowing the signal).

So I'm horribly horribly cynical of the value of theorycraft in general.

[1] In my particular field, there were exactly two systems we could solve directly. A hydrogen atom all alone in the universe and a single H2+ (2 protons, 1 electron, 2 atoms) molecule all alone in the universe. Everything else, we had to either approximate the solution or approximate the problem...and usually both. And a good fit between theory and experiment was one that was within a factor of 2. And most of the time, the error bars were multiple orders of magnitude in size. Of course everything was plotted log-log to hide that fact.

I can sympathize with you on all of that. One of the reasons I roll my eyes whenever someone says "Science says!" That's not the way science works. I've had some classes and have some friends with their PHDs and am well aware that you do the best you can, knowing you're not right and just hoping that the best you can helps you or someone else do a little better (but still be wrong) a little farther down the line.

sithlordnergal

2022-03-23, 03:18 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

So, they're used for a few reasons:

1) It creates a fairly easy base line that allows you to more easily compare things, especially when you're dealing with a large amount of dice

2) DnD is built off a dice based system, knowing the average of a d20 can help give you an idea of how successful something is going to be

3) While sample size does matter, you can end up with a far larger sample size than you might think. And even small sample sizes can be useful to look at.

A good example for when an average is handy is if you're trying to figure out about how much damage a max level smite on an undead will do if you crit. At that point you're rolling 12d8 Smite damage. Now yes, you do technically have a chance of rolling all 1's or all 8's on 12d8, but that's almost never going to happen. Its far more useful to know that you'll do an average of 68 damage, because then you know that you'll most likely roll somewhere around 68. It won't be exactly 68 points of damage, but usually you're going to deal something close to that damage.

Its also really handy when trying to figure out the chances of something being successful. If you have a +6 to Wisdom saves, and the average roll of a d20 is 10.5, then you should be able to succeed a DC 16 wisdom save about half the time, but if the DC is 20, then you should succeed on the save less than half the time. Its a nice way to do a quick and dirty risk assessment. And the same principal can be applied to saving throws. If you're going to cast a spell that uses a Dex save against a creature with +3 in Dex, but your DC is only a 13, then you can expect the target to succeed on their save more often than not. As such, its probably best to use a spell that doesn't target Dex.

stoutstien

2022-03-23, 03:24 PM

They're not all that useful to know as a player but as a DM it helps you set expectations where you want them. Most notably when dealing with ability checks it's good to have a rudiminary grasp of what's going on.

ZRN

2022-03-23, 03:30 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

Because "8d6" sounds like a fireball should do like 50 damage until someone tells you it should do 28.

Basically, people don't always have a good grasp of how probabilities work, so boiling everything down to one number ("DPR") is easier than trying to explain why. +2 to hit is better than +2 to damage or whatever.

PhoenixPhyre

2022-03-23, 03:34 PM

I can sympathize with you on all of that. One of the reasons I roll my eyes whenever someone says "Science says!" That's not the way science works. I've had some classes and have some friends with their PHDs and am well aware that you do the best you can, knowing you're not right and just hoping that the best you can helps you or someone else do a little better (but still be wrong) a little farther down the line.

I had a professor (teaching a computational class) say something like

"The most important part of doing numerical work is never trust your results. And if they're telling you that you have the right answer, trust them even less."

Or shorter "all models are wrong, some models are useful." Having a strong epistemic humility ("remember, you too could be wrong!") is important. And so often neglected. People get to be fans of their models, taking criticism personally and using them everywhere. And in this context, spreadsheets/averages/numbers sometimes aren't the right tools! (shocking, I know).

Elder_Basilisk

2022-03-23, 03:45 PM

Other people have pointed out the big misconception here: that we derive averages from actual play. No one does that--we use simple probability theory to calculate averages from known distributions. The results are predictive of trends in play at the table but naturally don't correlate exactly to anyone's actual play experience due to sample size. But that's not a problem: the battle may not always be to the strong, but that's the way to bet. And full plate is still better armor than splint mail even if your character somehow got hit more often in the ten rounds since he bought full plate than he did in the last ten rounds when he wore splint mail.

A more sophisticated--and actually accurate--criticism of averages is that they do not always measure what you want to know and therefore can obscure the significance of key breakpoints and important interactions. For example, in 3.x, average damage per attack calculations tended to predict that "all power attack all the time" was a terrible idea but while probably not optimal, the play style worked better in practice than DPR calculations predicted. There were a number of reasons for that, but most of them were very situational and required much more complex math to calculate. (Often they had to do with factors like increasing the probability of a one-hit kill resulting in additional cleave attacks that were not included in dpr calculations, or the outsized impact that removing a creature in the first round of combat has on the difficulty of a battle).

strangebloke

2022-03-23, 04:06 PM

I suppose a valid response question is, "what should we use instead?"

I doubt you're going to make an argument that a d4 weapon is better than a d8 weapon "in some cases" based solely on the fact that somebody might roll a 3 or 4 on every attack in a given session, while the other might roll nothing above a 3 on his d8 in that same session, just because they each only hit 4 times.

This.

There's something to be said about a build that has high averages but also high variance. Something like a rogue using sharpshooter, for example, will have pretty good average damage stats with favorable assumptions, but since you won't always be able to use Aim and you only get one attack, the truth is that your damage varies wildly between ~38 and 0 in any given turn. In statistics this is called a "bimodal distribution" and is well known to be a serious issue with using averages of probabilistic data.

But simply saying "variance means your bigger numbers don't matter" is more wrong. Everyone can have bad luck, but if my 'bad luck' is as good as your 'good luck' then overall my numbers are better, no?

Kane0

2022-03-23, 04:16 PM

Would you rather us write in ranges instead? It's a little more work to say 2d6+3 is 5-15 damage instead of 10 but AD&D did it that way (often without noting the source which could be annoying to reverse into what dice you should be rolling).

For what it's worth, I prefer doing 3 or 5 rounds worth of attacks when looking at DPR and sample size is one of the reasons why (another big one is setup time in terms of actions).

kazaryu

2022-03-23, 04:23 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

ignoring averages is a wasted argument...

sure, the guy wielding a shortsword *might* get lucky and happen to do more damage during a combat than the guy with the glaive. because glaive guy kept getting 1's and 2's, while shortsword guy never rolled lower than a 4. but on average...over the course of the campaign, glaive dude is gonna do more damage in more fights. because on average his weapon deals more damage (i mean, obviously this ignores things like the dueling fighting style which can adjust average damage).

you're not looking at a sample of just an individual combat encounter or an individual session. individual sessions are just the datapoints to the 'sample' of your entire gaming career (whether thats in various one shots across several groups, or a few long form campaigns.) averages are used because they tell us what is *likely* to happen. and when the actual results of what happens relies on RNG, the way to optimize is to find the options that are most likely to produce the results you want most consistently. so..we use averages to help with that.

Lunali

2022-03-23, 05:37 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

Because in theory, there's no difference between theory and practice. Of course in practice, there is a difference, but we deal in theory here.

PhantomSoul

2022-03-23, 05:41 PM

Because in theory, there's no difference between theory and practice. Of course in practice, there is a difference, but we deal in theory here.

It's ok -- with enough in practices, you eventually create in theories!

PhoenixPhyre

2022-03-23, 05:58 PM

Because in theory, there's no difference between theory and practice. Of course in practice, there is a difference [...]

One of my favorite quotes. I usually shorten it to "The difference between theory and practice is that in theory, there is no difference."

Corran

2022-03-23, 06:27 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?
Because TPKs are rare. Even so, averages still dont tell the whole truth, as they dont show the usefulness of being able to vary your damage (eg why the battlemaster can still be better thn a champion even on days when the champions whole day's dpr will slightly overperform the battlemater's). But they still provide you with useful information.

Dark.Revenant

2022-03-23, 11:20 PM

Because it's a pain in the ass to set up a statistically robust Monte Carlo analysis for the sake of forum ****posts. If you're actually trying to balance a piece of content you're selling online, sure, go ahead, but it's overkill otherwise. Simple averages based on a "common AC, common save value", etc. are so easy to compare to each other that it's sufficient for deciding which option is numerically "better" than another, most of the time.

Segev

2022-03-24, 12:34 AM

Because it's a pain in the ass to set up a statistically robust Monte Carlo analysis for the sake of forum ****posts. If you're actually trying to balance a piece of content you're selling online, sure, go ahead, but it's overkill otherwise. Simple averages based on a "common AC, common save value", etc. are so easy to compare to each other that it's sufficient for deciding which option is numerically "better" than another, most of the time.

Even then, "robust Monte Carlo analysis" won't tell you anything the basic math can't. You use Monte Carlo simulations when you've got an unknown distribution or need artificial sample sets to do something more than just compare performance in known circumstances.

Chaos Jackal

2022-03-24, 02:52 AM

The only wasted argument here is the one in the OP.

Yes, OK, averages don't describe reality. But they describe a reasonable version of it. It's D&D. It's a dice game, nothing is certain. But you can get a decent idea of its expected outcomes through those averages, and there's not really much else that will do the job any better.

Sample size might matter, but sample sizes in this case are typically arbitrarily big. To say that a longsword is better than a dagger might not be true in every situation, but the implied sample size is every longsword and dagger damage roll you'll make across one or more campaigns, and that sample will usually be in the thousands or even tens of thousands, a good enough size for something so simple and straightforward. What is wasted isn't saying that, on average, a fireball deals more damage than a glaive, it's saying that a glaive might outdamage a fireball because you can roll a 10 on the glaive while you call roll 1s and 2s on the fireball and the target can make its save after that. Averages aren't meant to be perfect, they're meant to be indicative.

And really, if we were calling ideas, approaches or arguments that are in general reasonable and indicative of something but aren't an 100% simulation of reality wasted, we'd still be in caves or up in trees, and that's not even hyperbole. Just because it's theorycrafting doesn't mean it's pointless. Caveats exist and need to be taken into account, but they don't translate into a wasted argument. Theory has a purpose.

And even the two parts of the premise here are disconnected - sample size playing a role doesn't translate into a wasted stat by default.

Again, this is D&D. It's a dice game with relatively straightforward mechanics of dice rolling. Averages describe it pretty well, and certainly better than the majority of other approaches, unless the intention is to describe every single possibility separately, in which case have fun to whoever's doing it, but the rest of the world doesn't have the time for that. Plus, you'd probably reach the same conclusions anyway, so why not just use the average?

So, unless the OP is just a very hyperbolic and poorly described version of "don't expect to roll 7 damage in all your greatsword attacks", which is true but also painfully obvious, I don't see any point being made here.

Willie the Duck

2022-03-24, 08:04 AM

So, unless the OP is just a very hyperbolic and poorly described version of "don't expect to roll 7 damage in all your greatsword attacks", which is true but also painfully obvious, I don't see any point being made here.

For me, the part that doesn't hold is the notion that "so many D&D arguments rely on them." I don't think that's the case. Lots of D&D arguments utilize them, but I don't think they rely upon them. DPR calculations regularly take into account to-hit chance (and crit damage on 20s, etc.). That means the person making the calculation is recognizing that the damage per hit is also gated behind a D20 roll that you can't just average to 10.5 and come back with meaningful results. Or, to stay with the D20, people talk about how "swingy" a d20 roll is and at a pretty regular frequency suggest that they should be 2d10 or 2d6 rolls instead -- that's focusing on altering the variance/number of dice rolled/effectively sample size over taking an average (which stays at 10.5 or 11 regardless).

Gignere

2022-03-24, 08:24 AM

Unsure what the OP is criticizing, sure DPR may not be the be all and end all, yes variations can exist game to game, but at the end of the day you just want a metric to compare two builds, average DPR is pretty good at that.

If OP’s suggestion is that we need to go beyond average DPR and take into variability of damage of a build, I agree. However, I think most DPR optimizing builds already take that into account, it’s why Elven Accuracy is so highly rated it is why although rogues don’t do a ton of spike or max DPR, they are still seen pretty competitive and reliable DPR wise.

There are many threads providing guidelines on when to GWM / SS and when not to, to make damage more reliable.

Frogreaver

2022-03-24, 08:26 AM

If I was going to focus on DPR limitations it would be in the following areas

1. Team DPR over individual DPR (especially when talking relative increases)

2. Overkill. Killing enemies can be up to about a 20% reduction in applied DPR. This reduction is worse for more damaging attacks and abilities.

3. DPR alone makes for poor round to round decision analysis. Often times the best action isn’t the highest DPR but a lower DPR action with a useful effect.

Elder_Basilisk

2022-03-24, 09:52 AM

For exploring the limitations of DPR, I would also want to look at:

1. Ability ability to produce large damage spikes on demand. A slight increase in 3 round or whatever DPR you measure does not capture the true value of something like action surge.
2. Ability to leverage situational benefits. For example, assuming the -5/+10 damage option is not consistently advantageous under normal circumstances, a level 4 great weapon master fighter with +7 to hit for 2d6+5 damage on his greatswords doesn't do much more DPR than a sword and board longsword fighter with +7 to hit for 1d8+7. But the GWM guy has the -5/+10 option that provides a lot more leverage in multiple situations: bless, battle master precise strike, bardic inspiration or advantage should produce much bigger DPR shifts with GWM than the longsword guy and the situational bonus action attack can potentially have an exponential effect on the damage output.
3. # of hits to kill and number of kills per round/rounds to kill. For example, a 25% chance to drop a monster each round is often better than a 50% chance to drop the monster at the end of round 2. Yes, the 50% in two rounds is 6.25% more chance to have dropped the monster by the end of round 2. But the 25% chance each round gives you that same 6.25% chance to have dropped two monsters by the end of round 2, and a 25% chance to have dropped the monster in round 1--which means that less damage was coming back.

Segev

2022-03-24, 11:24 AM

For exploring the limitations of DPR, I would also want to look at:

1. Ability ability to produce large damage spikes on demand. A slight increase in 3 round or whatever DPR you measure does not capture the true value of something like action surge.
2. Ability to leverage situational benefits. For example, assuming the -5/+10 damage option is not consistently advantageous under normal circumstances, a level 4 great weapon master fighter with +7 to hit for 2d6+5 damage on his greatswords doesn't do much more DPR than a sword and board longsword fighter with +7 to hit for 1d8+7. But the GWM guy has the -5/+10 option that provides a lot more leverage in multiple situations: bless, battle master precise strike, bardic inspiration or advantage should produce much bigger DPR shifts with GWM than the longsword guy and the situational bonus action attack can potentially have an exponential effect on the damage output.
3. # of hits to kill and number of kills per round/rounds to kill. For example, a 25% chance to drop a monster each round is often better than a 50% chance to drop the monster at the end of round 2. Yes, the 50% in two rounds is 6.25% more chance to have dropped the monster by the end of round 2. But the 25% chance each round gives you that same 6.25% chance to have dropped two monsters by the end of round 2, and a 25% chance to have dropped the monster in round 1--which means that less damage was coming back.Thanks for this post! Examination of how variance actually makes a difference in a tangible, round-by-round way is very helpful.

By this analysis, you seem to be suggesting that higher-variance for the same average is always superior in terms of damage. Is this an accurate statement of your intent? If not, what is a more accurate one?

If there are cases where this analysis would still say more consistent damage is better, could you outline some of them, please?

Gignere

2022-03-24, 11:38 AM

Thanks for this post! Examination of how variance actually makes a difference in a tangible, round-by-round way is very helpful.

By this analysis, you seem to be suggesting that higher-variance for the same average is always superior in terms of damage. Is this an accurate statement of your intent? If not, what is a more accurate one?

If there are cases where this analysis would still say more consistent damage is better, could you outline some of them, please?

I would say if you can kill something without -5/+10 damage you would opt for more consistency than damage.

If you are aiming to disrupt enemies’ concentration more consistency would be superior to variability, especially if ending the effect now can save a lot of damage or remove cc from your team mates.

Attacking high AC targets you’d likely want more consistency, especially if you have any rider effects on your attacks.

Xetheral

2022-03-24, 11:46 AM

I suppose a valid response question is, "what should we use instead?"

Having a whole curve available (rather than compressing the calculation to an average) can be useful for on-the-fly decision-making. For example, knowing your probability of doing at least X damage with an attack action against a range of ACs becomes increasingly useful as a fight goes on (and you start being able to put increasingly accurate constraints on the enemy's AC and remaining HP) for figuring out the relative value of taking the attack action.

Elder_Basilisk

2022-03-24, 12:03 PM

Thanks for this post! Examination of how variance actually makes a difference in a tangible, round-by-round way is very helpful.

By this analysis, you seem to be suggesting that higher-variance for the same average is always superior in terms of damage. Is this an accurate statement of your intent? If not, what is a more accurate one?

If there are cases where this analysis would still say more consistent damage is better, could you outline some of them, please?

I'm not really saying that high variance damage is generally better than low variance damage. There are different ways average damage could be high variance. You could have the source of high variance on the attack roll (+2 to hit for 2d6+15 damage) or the damage roll (+7 to hit for 1d20+2 damage) and in some editions, you could have it extra damage on a critical hit contribute meaningfully to average damage calcs (I guess 5e champion fighters get that, maybe there are other ways).

I'm not really a fan of really wide damage ranges like the d20 or damage that is dependent on the rare event. (I actually think that extra crit damage tends to make average damage calculations look better than those options can be expected to perform at the table). What I am confident in saying is that high damage on a hit is often better than average damage calculations would indicate because:

A. There are a lot of common situational ways to improve chance to hit. At least at low level, that's a lot easier than boosting damage. The guy with high base damage but low hit percentage can be helped out a lot by advantage, bless, bardic inspiration etc. The guy whose average damage comes from a high hit percentage has a lot less upside and might even start suffering from diminishing returns. If the pirate has 12 AC, you're already likely to hit. Going from 95% to 99.75% because of advantage doesn't make much difference.
B. There are a number of positive effects that come from the ability to one shot opponents or otherwise kill in fewer hits. Quickly eliminating opponents means less incoming damage, more room to maneuver, and can trigger abilities like the GWM bonus action attack.

As far as cases where low variance damage is better--one that comes up off the top of my head is when the monster you're attacking only has a few HP left. If the merrow has 2hp left, you would much rather have the bard attack at +8+1d4 bless for 1d8+6 than the GMW fighter (we'll give him a +1 sword to even things out) at +3 +1d4 bless for 2d6+16. If you could have something like an old edition magic missile: 1d4+1 damage but guaranteed hit, that's even better. In games like the old D&D minis game where opponent HP are a known factor it is easier to recognize and take advantage of it, but the situation does come up. Another situation where low variance (but high hit percentage) damage might be better is when you're trying to disrupt an enemy's concentration or otherwise care more about a rider effect than the total damage. You want to stop the spell and you want to maximize your chance of stopping the spell, not maximize the damage output.

PhoenixPhyre

2022-03-24, 12:03 PM

I'd say that the real issue isn't taking the average, it's the assumptions that go into any of the calculations that use the averages.

Even if you have large enough numbers to reduce the variance (arguendo), if your assumptions don't actually reflect play, then your metrics will be useless. Or worse, actively misleading.

This was a big thing in the MMO community last time I was active there, where a lot of the theorycraft assume(s|d) 100% uptime and an infinite health training dummy. Which leads to very very different results compared to actual bosses. In the FFXIV community, there's a key difference between rDPS (your own individual direct contribution) and aDPS (which is rDPS + the attributed effect of your party buffs). And that's just DPS, which is one of the more tractable things to measure. And in an MMO, such things are way easier to measure and calibrate because the fights are fixed entities that you can repeat over and over.

TTRPGs don't have that latter property. And there's a lot more possible variance in play style and monsters than in an MMO, plus lots of abilities that aren't easily broken into numbers. Yet basically all the spreadsheets I've seen assume training dummy-style "big bag of HP that just stands there" fights in white rooms. Sure, it includes miss chance (in an average sort of way), but doesn't include most of the actual things that play a major role even just in combat.

The assumptions swallow the results. Sure, you can calculate it and say "based on this metric, X is better than Y at Z"...but that's just like the smartphone spec wars, where none of those numbers actually mattered for user experience. Which is the only thing that really matters.

Evaar

2022-03-24, 12:12 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

I don’t see how this is different from claiming statistics as a concept are useless because any given individual result may not follow the statistical average.

While the latter point is true, the former does not follow from it.

Dark.Revenant

2022-03-24, 12:29 PM

I'd say that the real issue isn't taking the average, it's the assumptions that go into any of the calculations that use the averages.

Even if you have large enough numbers to reduce the variance (arguendo), if your assumptions don't actually reflect play, then your metrics will be useless. Or worse, actively misleading.

This was a big thing in the MMO community last time I was active there, where a lot of the theorycraft assume(s|d) 100% uptime and an infinite health training dummy. Which leads to very very different results compared to actual bosses. In the FFXIV community, there's a key difference between rDPS (your own individual direct contribution) and aDPS (which is rDPS + the attributed effect of your party buffs). And that's just DPS, which is one of the more tractable things to measure. And in an MMO, such things are way easier to measure and calibrate because the fights are fixed entities that you can repeat over and over.

TTRPGs don't have that latter property. And there's a lot more possible variance in play style and monsters than in an MMO, plus lots of abilities that aren't easily broken into numbers. Yet basically all the spreadsheets I've seen assume training dummy-style "big bag of HP that just stands there" fights in white rooms. Sure, it includes miss chance (in an average sort of way), but doesn't include most of the actual things that play a major role even just in combat.

The assumptions swallow the results. Sure, you can calculate it and say "based on this metric, X is better than Y at Z"...but that's just like the smartphone spec wars, where none of those numbers actually mattered for user experience. Which is the only thing that really matters.

Again, we don't have better data to use. No one is going to write a full-grid auto-combat sim for D&D just to settle forum arguments. Doing so in a manner that would pass muster in a peer-reviewed journal would be thousands of hours of effort. So we use spreadsheet-level data instead, because otherwise we'd have no data to work with at all, and no meaningful arguments could be made.

Sure, that means you can't make claims like "Cone of Cold is objectively better than Fireball", but no one is saying that.

PhoenixPhyre

2022-03-24, 12:40 PM

Again, we don't have better data to use. No one is going to write a full-grid auto-combat sim for D&D just to settle forum arguments. Doing so in a manner that would pass muster in a peer-reviewed journal would be thousands of hours of effort. So we use spreadsheet-level data instead, because otherwise we'd have no data to work with at all, and no meaningful arguments could be made.

Sure, that means you can't make claims like "Cone of Cold is objectively better than Fireball", but no one is saying that.

"Better than nothing" is a high bar to clear. Adding bad data doesn't make for good arguments.

Sure, it's brighter under the lamp-post, but looking for your keys there won't do you much good unless you really did lose your keys there. Arguments from bad assumptions are actively worse than no data or very simple arguments (that don't require math at all) because they are fundamentally lies.

There are lots of arguments that don't rely on supposedly objective data. Data from actual experience. Stylistic arguments. It's only the "X is better and you should feel bad if you do Y" arguments that require this veneer of objectivity.

PhantomSoul

2022-03-24, 12:43 PM

Again, we don't have better data to use. No one is going to write a full-grid auto-combat sim for D&D just to settle forum arguments. Doing so in a manner that would pass muster in a peer-reviewed journal would be thousands of hours of effort.

But maybe one will eventually exist!
https://www.dndcombat.com//

However, for most things, it's not really needed in practice. For most questions, anydice or just running the numbers (it's really not too bad; the data generated don't end up that massive usually) is as good or better because either you don't know every parameter (so you need to set them) or you do know the parameters (in which case you set them).

So we use spreadsheet-level data instead, because otherwise we'd have no data to work with at all, and no meaningful arguments could be made.

And quite honestly, spreadsheet or full-distribution information is pretty great! It's somewhat unfortunate that it gets collapsed into one value for DPR in so many cases (instead of minimally DPR and something like a standard deviation or a range), but for individual attacks that's often plenty because things are normally normally (heh) distributed... well, approximately (crits being the main exception) barring active abilities (e.g. choosing to do -5/+10 giving the option of a different distribution). But even that... is effectively just picking which distribution to go from, and both distributions are pretty normal (so average and standard deviation describe them pretty well). In a rare case (none come to mind offhand but they probably exist?) where a single distribution is clearly and strongly bimodal, then it would seem quite problematic to use averages and even standard deviations... but that isn't such an issue overall.

(This ends up sounding quite a bit like I think I'm disagreeing with you since you're quoted, but rather it's expanding/adding!)

EDIT for new post:

"Better than nothing" is a high bar to clear. Adding bad data doesn't make for good arguments.

Sure, it's brighter under the lamp-post, but looking for your keys there won't do you much good unless you really did lose your keys there. Arguments from bad assumptions are actively worse than no data or very simple arguments (that don't require math at all) because they are fundamentally lies.

There are lots of arguments that don't rely on supposedly objective data. Data from actual experience. Stylistic arguments. It's only the "X is better and you should feel bad if you do Y" arguments that require this veneer of objectivity.

It's a good thing the data aren't completely terrible then (usually)! Imperfect for every combat, but still usually giving useful information. I do think it's why it's REALLY important that assumptions and parameters be stated; you can tweak numbers to argue either side pretty easily (e.g. by tossing in advantage or disadvantage, by imagining a different target HP, by changing target AC or resistances, etc.), so those things are important to specify so people can interpret and use the information well.

Bad data can definitely be worse than no data -- and good data misapplied could even be worse than both.

Frogreaver

2022-03-24, 12:50 PM

I'd say that the real issue isn't taking the average, it's the assumptions that go into any of the calculations that use the averages.

Even if you have large enough numbers to reduce the variance (arguendo), if your assumptions don't actually reflect play, then your metrics will be useless. Or worse, actively misleading.

This was a big thing in the MMO community last time I was active there, where a lot of the theorycraft assume(s|d) 100% uptime and an infinite health training dummy. Which leads to very very different results compared to actual bosses. In the FFXIV community, there's a key difference between rDPS (your own individual direct contribution) and aDPS (which is rDPS + the attributed effect of your party buffs). And that's just DPS, which is one of the more tractable things to measure. And in an MMO, such things are way easier to measure and calibrate because the fights are fixed entities that you can repeat over and over.

TTRPGs don't have that latter property. And there's a lot more possible variance in play style and monsters than in an MMO, plus lots of abilities that aren't easily broken into numbers. Yet basically all the spreadsheets I've seen assume training dummy-style "big bag of HP that just stands there" fights in white rooms. Sure, it includes miss chance (in an average sort of way), but doesn't include most of the actual things that play a major role even just in combat.

The assumptions swallow the results. Sure, you can calculate it and say "based on this metric, X is better than Y at Z"...but that's just like the smartphone spec wars, where none of those numbers actually mattered for user experience. Which is the only thing that really matters.

But you are assuming that DPR is intended to measure actual in game damage performance regardless of external factors.

DPR is mostly there to measure your average damage contribution assuming you are able to attack something every turn. (There can be some variations).

PhoenixPhyre

2022-03-24, 12:53 PM

But you are assuming that DPR is intended to measure actual in game damage performance regardless of external factors.

DPR is mostly there to measure your average damage contribution assuming you are able to attack something every turn. (There can be some variations).

But it's used to say "X is OP, nerf X" or "Y is UP, buff Y" or "You should never X" or "You should always Y." All things that strip all the nuance out of it. And that's inevitable--providing numbers gives a veneer of objectivity and "truth" to your arguments. One that's unwarranted in 99.9999999999% of all the arguments on these forums.

The caveats matter. And the caveats get lost instantly.

And in any real play scenario, the noise swallows the signal. Because the "signal" requires a bunch of very specific, very unrealistic assumptions.

You can measure anything, but only some measurements are actually meaningful. And most of the measurements provided are actively misleading. As in worse than nothing, because they pretend to objectivity and meaning when they're bad at either one.

Asisreo1

2022-03-24, 01:08 PM

Well, we could playtest our ideas. What's funny is that people will say that there will always be notable differences in class effectiveness in combat, but doing a playtest is "unpredictable" and "not indicative to actual play" even though it's the closest to actual play that isn't actual play. And really, depending in how you playtest, it may as well be actual play.

If a playtest, for example, shows that the difference between 38 and 48 HP wasn't significant. While we can't say that it will always be insignificant, we now know that there are cases where they won't make a difference.

After that, it's a matter of finding the frequency.

PhoenixPhyre

2022-03-24, 01:17 PM

Well, we could playtest our ideas. What's funny is that people will say that there will always be notable differences in class effectiveness in combat, but doing a playtest is "unpredictable" and "not indicative to actual play" even though it's the closest to actual play that isn't actual play. And really, depending in how you playtest, it may as well be actual play.

If a playtest, for example, shows that the difference between 38 and 48 HP wasn't significant. While we can't say that it will always be insignificant, we now know that there are cases where they won't make a difference.

After that, it's a matter of finding the frequency.

I agree.

For me, the priority goes

Live playtest > "synthetic" play tests (ie mock battles, mock scenarios played solo) >>>>>>>> white-room theory (which includes all the spreadsheets).

I don't include simple bounds/sanity checks in the white-room theory, because those are useful throughout the creative process. Those sanity checks include things like "if your proposed 3rd level spell does 20d6 damage to your choice of targets in a 20' radius, it's probably not ok" or "if this 1st level spell does 90% of what another strong 1st level spell does, plus a bunch of other things, you might want to re-think it" or "if you only have Extra Attack 1, you need another source of damage other than a bare weapon attack to be competitive". But those only catch outliers and really don't require much in the way of assumptions.

Unoriginal

2022-03-24, 01:30 PM

Sure, that means you can't make claims like "Cone of Cold is objectively better than Fireball", but no one is saying that.

Say that to the hundreds of "martials are objectively worse than casters" threads.

Frogreaver

2022-03-24, 01:38 PM

But it's used to say "X is OP, nerf X" or "Y is UP, buff Y" or "You should never X" or "You should always Y." All things that strip all the nuance out of it. And that's inevitable--providing numbers gives a veneer of objectivity and "truth" to your arguments. One that's unwarranted in 99.9999999999% of all the arguments on these forums.

The caveats matter. And the caveats get lost instantly.

And in any real play scenario, the noise swallows the signal. Because the "signal" requires a bunch of very specific, very unrealistic assumptions.

You can measure anything, but only some measurements are actually meaningful. And most of the measurements provided are actively misleading. As in worse than nothing, because they pretend to objectivity and meaning when they're bad at either one.

Generally speaking, DPR is a great metric for talking about class strength. Especially between classes with fairly similar ranges and mobilities. It’s certainly not the only metric though.

The assumptions made around DPR do affect the final outcome somewhat but they are fairly directional. For example: Looking at 15 vs 17 target AC will lead to different DPR values for everyone but as long as character 1 is attacking the same AC as character 2 then who does better damage in real terms is probably not going to change based on calculating the DPR at either AC. The higher DPR character vs 15 AC is still expected to be the higher DPR character vs 17 AC and the higher DpR character of those is still expected to do more real DPR provided similar attack ranges and mobilities between the characters. It takes a really contrived situation for this to not be the case.

Well, we could playtest our ideas. What's funny is that people will say that there will always be notable differences in class effectiveness in combat, but doing a playtest is "unpredictable" and "not indicative to actual play" even though it's the closest to actual play that isn't actual play. And really, depending in how you playtest, it may as well be actual play.

If a playtest, for example, shows that the difference between 38 and 48 HP wasn't significant. While we can't say that it will always be insignificant, we now know that there are cases where they won't make a difference.

After that, it's a matter of finding the frequency.

Play tests without sufficient trials and specific behavioral assumptions don’t really tell much. And even then they only describe the scenario under very specific assumptions.

Play testing on the small scale is fun. But mostly useless as the noise there really does drown out the signal.

PhoenixPhyre

2022-03-24, 01:41 PM

Generally speaking, DPR is a great metric for talking about class strength. Especially between classes with fairly similar ranges and mobilities. It’s certainly not the only metric though.

The assumptions made around DPR do affect the final outcome somewhat but they are fairly directional. For example: Looking at 15 vs 17 target AC will lead to different DPR values for everyone but as long as character 1 is attacking the same AC as character 2 then who does better damage in real terms is probably not going to change based on calculating the DPR at either AC. The higher DPR character vs 15 AC is still expected to be the higher DPR character vs 17 AC and the higher DpR character of those is still expected to do more real DPR provided similar attack ranges and mobilities between the characters. It takes a really contrived situation for this to not be the case.

Except that that's not the assumption that dominates the result, and the rest of them smuggle in lots of bias.

* How many short rests?
* How many rounds of combat per combat?
* Do targets have infinite health (ie is overkill a thing)?
* Is there any target switching going on?
* How many targets? How are they arranged?
* Are there any priorities other than hitting this dummy for the biggest numbers?
* Etc.

All of these skew the applicability of the results (let alone the numbers themselves) by way more than any reasonable change to AC.

So no, I'd say that DPR isn't a useful tool for analyzing anything about a class except in extreme outlier cases (ie > 100% difference). Because the analyst's thumbs are inherently all over the scale.

Effectively, white-room DPR is a spherical cow in a vacuum approximation. And as a former physicist, one of the things you learn very quickly is that reality has very few spherical cows in a vacuum. Assumptions matter. Just because you can calculate something doesn't make it valuable or meaningful. And I've yet to see anyone defend reasonably analyzing things down to 0.01 DPR (which is a common thing) or even single-digit integer DPR differences. Yet those are held out as meaningful differences, making something "non-viable".

Frogreaver

2022-03-24, 01:49 PM

Except that that's not the assumption that dominates the result, and the rest of them smuggle in lots of bias.

* How many short rests?
* How many rounds of combat per combat?
* Do targets have infinite health (ie is overkill a thing)?
* Is there any target switching going on?
* How many targets? How are they arranged?
* Are there any priorities other than hitting this dummy for the biggest numbers?
* Etc.

All of these skew the applicability of the results (let alone the numbers themselves) by way more than any reasonable change to AC.

So no, I'd say that DPR isn't a useful tool for analyzing anything about a class except in extreme outlier cases (ie > 100% difference). Because the analyst's thumbs are inherently all over the scale.

Effectively, white-room DPR is a spherical cow in a vacuum approximation. And as a former physicist, one of the things you learn very quickly is that reality has very few spherical cows in a vacuum. Assumptions matter. Just because you can calculate something doesn't make it valuable or meaningful. And I've yet to see anyone defend reasonably analyzing things down to 0.01 DPR (which is a common thing) or even single-digit integer DPR differences. Yet those are held out as meaningful differences, making something "non-viable".

Nearly all of those considerations are directional. The higher dpr character with similar range and mobility will nearly always remain the higher dpr character in most all scenarios except in extreme outlier scenarios and as long as the characters dprs are greater than 20-30% apart.

PhantomSoul

2022-03-24, 01:50 PM

Except that that's not the assumption that dominates the result, and the rest of them smuggle in lots of bias.

* How many short rests?
* How many rounds of combat per combat?
* Do targets have infinite health (ie is overkill a thing)?
* Is there any target switching going on?
* How many targets? How are they arranged?
* Are there any priorities other than hitting this dummy for the biggest numbers?
* Etc.

All of these skew the applicability of the results (let alone the numbers themselves) by way more than any reasonable change to AC.

So no, I'd say that DPR isn't a useful tool for analyzing anything about a class except in extreme outlier cases (ie > 100% difference). Because the analyst's thumbs are inherently all over the scale.

Effectively, white-room DPR is a spherical cow in a vacuum approximation. And as a former physicist, one of the things you learn very quickly is that reality has very few spherical cows in a vacuum. Assumptions matter. Just because you can calculate something doesn't make it valuable or meaningful. And I've yet to see anyone defend reasonably analyzing things down to 0.01 DPR (which is a common thing) or even single-digit integer DPR differences. Yet those are held out as meaningful differences, making something "non-viable".

Seems like an argument that clear assumptions and more numbers/information are needed, rather than abandoning numbers altogether! (As well as not treating the numbers as being the only thing that matters, obviously!) E.g.

* How many short rests? --> Short rest vs. long rest dependency, useful thing to highlight
* How many rounds of combat per combat? --> Sustained vs. nova (and potentially time-specific bursts, like always and only getting a boost on the first turn)
* Do targets have infinite health (ie is overkill a thing)? --> Again benefits distinguishing sustained vs. nova and/or the ability to control damage if overkill comes at a cost
* Is there any target switching going on? --> Giving damage/attack or damage/instance would be useful (and it's a known point of variation that people tend to highlight as being an inter-class difference based on the number of attacks, plus something that repeatedly comes up when talking about PAM or TWF)
* How many targets? How are they arranged? --> Ability to AoE and how effective that is, mobility, etc.
* Are there any priorities other than hitting this dummy for the biggest numbers? --> Also a thing people bring up as mitigating repeatedly (e.g. when talking about Battlemasters and Monks), but sure, hard to quantify the effects of

Amechra

2022-03-24, 01:51 PM

Honestly, the annoying thing about a lot of threads that are basically just "let's calculate DPR!" is that they tend to do their calculations against a single arbitrarily-chosen AC value.

Elder_Basilisk

2022-03-24, 02:00 PM

Honestly, the annoying thing about a lot of threads that are basically just "let's calculate DPR!" is that they tend to do their calculations against a single arbitrarily-chosen AC value.

And that could readily be improved by leveling up Excel skills and plotting a DPR vs AC graph.

PhantomSoul

2022-03-24, 02:02 PM

And that could readily be improved by leveling up Excel skills and plotting a DPR vs AC graph.

And bonus upgrade in Excel or not-Excel: plotting it so you get distribution information by AC!

Frogreaver

2022-03-24, 02:03 PM

Seems like an argument that clear assumptions and more numbers/information are needed, rather than abandoning numbers altogether! (As well as not treating the numbers as being the only thing that matters, obviously!) E.g.

* How many short rests? --> Short rest vs. long rest dependency, useful thing to highlight
* How many rounds of combat per combat? --> Sustained vs. nova (and potentially time-specific bursts, like always and only getting a boost on the first turn)
* Do targets have infinite health (ie is overkill a thing)? --> Again benefits distinguishing sustained vs. nova and/or the ability to control damage if overkill comes at a cost
* Is there any target switching going on? --> Giving damage/attack or damage/instance would be useful (and it's a known point of variation that people tend to highlight as being an inter-class difference based on the number of attacks, plus something that repeatedly comes up when talking about PAM or TWF)
* How many targets? How are they arranged? --> Ability to AoE and how effective that is, mobility, etc.
* Are there any priorities other than hitting this dummy for the biggest numbers? --> Also a thing people bring up as mitigating repeatedly (e.g. when talking about Battlemasters and Monks), but sure, hard to quantify the effects of

Key Metrics

I started a thread recently on the topic. Most all of this can be addressed by adding in some additional damage metrics into the conversation to further highlight differences that straight dpr comparisons hide.

A good way to look at this might be at will DPR, total damage from short rest abilities. Total damage from long rest abilities. Non prebuffed nova damage. Prebuffed Nova damage.

This allows assumptions to be minimized while highlighting performance difference in different day structures.

Keravath

2022-03-24, 02:06 PM

Anyone else ever consider this?
A handful of attacks will never be a large enough sample size to find reliable averages, so why do so many D&D arguments rely on them?

... because averages do matter if you are trying to figure out what is a better option. It really is that simple.

Should you roll 4d6 or 3d8? Which is better? one is 4-24, the other is 3-24 - of course when you actually roll those dice the numbers could be all over the place BUT it will be selected from a probability distribution based on the dice rolled ... 4d6 averages 14 while 3d8 averages 13.5. You are equally likely to roll above or below those values so the average gives you an idea of what to expect. It doesn't tell you what will happen when you roll the die but it will tell you what is more likely to happen.

However, the other aspect that you should consider is standard deviation. If you roll a d100 - the average will be 50.5 but it will be flat - you will have as much chance of rolling a 1 as a 100. If you roll 10d10 the average is 55 with a range of 10-100 - a bit higher average but the odds of rolling a 10 or 100 are 1/ 10 ^ 10 ... never going to happen (if it does you should have bought a lottery ticket) :)

So if you need to roll only a high number then roll fewer dice since the odds of getting the large number will be greater. If you are looking for the best overall performance then look at the average. In the case of 10d10 - the distribution of results is VERY peaked around the average because you are rolling so many dice. You will never roll really low but you will also never roll really high.

However, most damage rolls involve only one or two dice so the difference isn't as large but if you compare a greatsword (2d6) to a greataxe (d12) - the greatsword has a higher average at 7 vs 6.5 for the Greataxe but the Greataxe will give much greater variation since rolling a 1 or 12 is 1 chance in 12 while rolling a 2 or 12 with a Greatsword is 1/36 each. The Greataxe rolls both the maximum and minimum damage three times more often than a Greatsword but over time the Greatsword does both a higher average damage and a higher consistent damage.

In reality, in a single game session, in Tier 1 - you might make 6-12 damage rolls in a session with your weapon assuming two-three combats that last 3-4 rounds each. Some of those will be high, some low but the odds of rolling high and low are incorporated into the average values. In addition, it doesn't take than many rolls before the average matters. If you make 12 damage rolls of 2d6 each - the distribution will be approaching the average expected distribution.

One last consideration. Including damage prevention like with Heavy Armor Master can affect these results as well. If a certain threshold of damage is needed to do any damage at all then the weapon that rolls high values more often might be preferred. However, the only damage prevention feature I am aware of in 5e is the Heavy Armor Master feat.

Frogreaver

2022-03-24, 02:07 PM

And that could readily be improved by leveling up Excel skills and plotting a DPR vs AC graph.

I usually post a single dpr value. I usually calculate vs AC 11-20

It’s not easy to post 10 different dpr values.

Dark.Revenant

2022-03-24, 02:08 PM

Playtesting with a small sample size only helps you tune the play experience; otherwise you'll just bias towards your particular testers and the particular situations they were in. Getting play experience right is important, but any balance issue large enough to be obvious in playtest is pretty easy to find through static analysis as well.

PhoenixPhyre

2022-03-24, 02:09 PM

Seems like an argument that clear assumptions and more numbers/information are needed, rather than abandoning numbers altogether! (As well as not treating the numbers as being the only thing that matters, obviously!) E.g.

* How many short rests? --> Short rest vs. long rest dependency, useful thing to highlight
* How many rounds of combat per combat? --> Sustained vs. nova (and potentially time-specific bursts, like always and only getting a boost on the first turn)
* Do targets have infinite health (ie is overkill a thing)? --> Again benefits distinguishing sustained vs. nova and/or the ability to control damage if overkill comes at a cost
* Is there any target switching going on? --> Giving damage/attack or damage/instance would be useful (and it's a known point of variation that people tend to highlight as being an inter-class difference based on the number of attacks, plus something that repeatedly comes up when talking about PAM or TWF)
* How many targets? How are they arranged? --> Ability to AoE and how effective that is, mobility, etc.
* Are there any priorities other than hitting this dummy for the biggest numbers? --> Also a thing people bring up as mitigating repeatedly (e.g. when talking about Battlemasters and Monks), but sure, hard to quantify the effects of

At that point, just playtest. It's less work and more realistic. And doesn't require a PhD to try to extract the tiny shreds of meaningful data out of the sea of multi-variate numbers. Simple calculations as sanity checks, backed up by actual experience, is much more illuminating. Those DPR numbers are a test that asserts that the output of the mock is exactly what you put in. As well as doing assert.isTrue(true).

Because any choice you make there will just result in quibbling about the assumptions, because it's the assumptions that drive the result. You'll never be able to actually use those numbers to make any unambiguous point, and there will always be people pointing to different numbers to support whatever argument they're making.

Numbers don't actually settle any arguments. They're just verbal weapons. Because we, as a community, aren't playing the same game. Fundamentally, the environments are very different.

Playtesting with a small sample size only helps you tune the play experience; otherwise you'll just bias towards your particular testers and the particular situations they were in. Getting play experience right is important, but any balance issue large enough to be obvious in playtest is pretty easy to find through static analysis as well.

My experience is just the opposite--the balance effects that matter aren't in the math at all, because the math assumes away their source. And numerical balance issues are minor anyway--play feel and jankiness is way more important for actual play. Which is the point of doing any of this. And biasing to your particular testers is what's important because that's your audience. I don't play at your tables. So balance concerns at your tables aren't any business of mine; when I'm homebrewing and considering balance, I'm only doing it for me. Your conditions on the ground are different, so you'll reach different results by nature. Trying to math it out obscures that, producing misleading information. Especially for new people. I'm arguing for epistemic humility.

Forum "numerical" work is the equivalent of string theory. Lots of wasted time and effort playing with math, no application to real life.

Frogreaver

2022-03-24, 02:13 PM

At that point, just playtest. It's less work and more realistic. And doesn't require a PhD to try to extract the tiny shreds of meaningful data out of the sea of multi-variate numbers. Simple calculations as sanity checks, backed up by actual experience, is much more illuminating. Those DPR numbers are a test that asserts that the output of the mock is exactly what you put in. As well as doing assert.isTrue(true).

Because any choice you make there will just result in quibbling about the assumptions, because it's the assumptions that drive the result. You'll never be able to actually use those numbers to make any unambiguous point, and there will always be people pointing to different numbers to support whatever argument they're making.

Numbers don't actually settle any arguments. They're just verbal weapons. Because we, as a community, aren't playing the same game. Fundamentally, the environments are very different.

Complaint: there’s too much noise in DPR calculations

Solution: play test with a small sample size.

Counterpoint: but there’s more noise in the small play test sample size than there ever was in the DPR calculation

Conclusion: the reason people are crapping on DPR while upholding play tests as a better solution isn’t logical.

strangebloke

2022-03-24, 02:15 PM

Nearly all of those considerations are directional. The higher dpr character with similar range and mobility will nearly always remain the higher dpr character in most all scenarios except in extreme outlier scenarios and as long as the characters dprs are greater than 20-30% apart.

Yeah.

Like you can have a meaningful conversation about monk vs. fighter as a melee character because the monk's mobility means it will get in melee sometimes when the fighter won't.

But on the high end, there's not really any way to dispute that a hexblade just kind of craps on everything else. One character can have 19 AC, great ranged options (among the highest in the game), great melee options(among the highest in the game), and also have cantrips and spells and invocations to spare.

PhoenixPhyre

2022-03-24, 02:15 PM

Complaint: there’s too much noise in DPR calculations

Solution: play test with a small sample size.

Counterpoint: but there’s more noise in the small play test sample size than there ever was in the DPR calculation

Conclusion: the reason people are crapping on DPR while upholding play tests as a better solution isn’t logical.

Play tests tell you different information than DPR calculations. Actually useful information, regardless of sample size. Their outputs aren't the same--you're not playtesting to find DPR. Because DPR is a proxy for something you actually care about, which play tests reveal directly. Proxies are always sub-par if you have access to the actual data you care about.

Conclusion denied.

Edit: And I'll note that the official materials agree with me--the numbers for (for instance) monster creation are presented as a 'first-pass guess' to get you into the ballpark and are intentionally not very precise. Because they go on to say that you need to playtest playtest playtest and adjust CR (and other numbers) to fit that.

Playtest is experiment. DPR numbers are theory. And theory must give way to experiment where they disagree.

PhantomSoul

2022-03-24, 02:16 PM

Complaint: there’s too much noise in DPR calculations

Solution: play test with a small sample size.

Counterpoint: but there’s more noise in the small play test sample size than there ever was in the DPR calculation

Conclusion: the reason people are crapping on DPR while upholding play tests as a better solution isn’t logical.

Plus the playtest is likely to be a bigger ordeal than running some numbers! (Obviously varies by person, but I guarantee no PhD is usually required! xD Plus you can run numbers for a lot of contexts way more easily and reliably than you can do comparable playtests, depending on the parameters you want to test.)

EDIT for new:

Play tests tell you different information than DPR calculations. Actually useful information, regardless of sample size. Their outputs aren't the same--you're not playtesting to find DPR. Because DPR is a proxy for something you actually care about, which play tests reveal directly. Proxies are always sub-par if you have access to the actual data you care about.

Conclusion denied.

If you're calculating DPRs/distributions/ranges, it's because those are something you care about comparing. So, uh, yeah, you apparently do care about it.

PhoenixPhyre

2022-03-24, 02:21 PM

If you're calculating DPRs/distributions/ranges, it's because those are something you care about comparing. So, uh, yeah, you apparently do care about it.

I can calculate the distance to Mars in terms of how long it would take an average fly to fly there. I can do lots of math. Doesn't mean that it actually has any meaning for the underlying system.

Calculate DPR all you want. Doesn't make it actually useful for anything but calculating DPR.

strangebloke

2022-03-24, 02:25 PM

Play tests tell you different information than DPR calculations. Actually useful information, regardless of sample size. Their outputs aren't the same--you're not playtesting to find DPR. Because DPR is a proxy for something you actually care about, which play tests reveal directly. Proxies are always sub-par if you have access to the actual data you care about.

Conclusion denied.

Playtests are dominated by the specific personalities and luck of your table, and its meaningless outside of those constraints. If I roll 1s constantly, I'll feel my character is weak, if I roll 20s I will feel strong. My subjective experience is valid, but if I go online and tell someone "yes well, paladins are trash"... I've no real basis to say that. I'm just saying some very off the cuff impressions.

Playtests can reveal some idiosyncrasies you wouldn't consider in a mathematical analysis. You might realize that you hadn't considered how bad it feels when you miss your sneak attack, or how uncanny dodge would be less good against a big monster with multiattack. But these are realizations that inform analysis.

Now, its fair to say that you don't need to do analysis. Playing in a carefree manner and just enjoying things is good! But this isn't the same thing as saying "all analysis is innaccurate and founded on lies because there are always more factors to simulate."

Some classes and builds simply have less and its okay to say that. 5e is pretty well balanced overall, so this isn't that eggregious, but even then there are wide gaps between a champion and a hexblade, and the hexblade comes out ahead in nearly every area.

Asisreo1

2022-03-24, 02:26 PM

Play tests without sufficient trials and specific behavioral assumptions don’t really tell much. And even then they only describe the scenario under very specific assumptions.

Play testing on the small scale is fun. But mostly useless as the noise there really does drown out the signal.
I mean, it tells us what it says. While you're not able to completely predict the future of all potential scenarios using playtests, saying there isn't anything meaningful to learn from even just a single playtest is highly inaccurate. A DM that frequently playtests their own encounters can attest to that.

A hypothesis like "A Ranger underperforms in melee," assuming adequate reasoning, can still be analyzed even with the fuzziness of dice.

For example, if a scenario comes up where there's a Horde of enemies and the Ranger has a greatsword. You might think enemies wouldn't crowd themselves into a Horde just for them to get Whirlwind Attacked, but not getting surrounded might be just as good for the Ranger where they don't get as much melee damage, they get more opportunities to control the area without their concentration being challenged. And that would likely be a consistent benefit regardless of whether the hoarding creatures were goblins, orcs, fiends, or celestials. And it would probably be apparent not by virtue of counting damage and attacks landed, but by how the creatures would need be forced to act while being moved to optimal positions.

I think a while back I playtested some powerful monsters versus different types of parties in this forum. It was apparent how effective strategies like grapple-silence were against spellcasters, even ones who have abilities outside of only spellcasting. That's something you can't really calculate, but it was still apparent. And you can easily test it out yourself by just imitating the experiment.

PhoenixPhyre

2022-03-24, 02:31 PM

Some classes and builds simply have less and its okay to say that. 5e is pretty well balanced overall, so this isn't that eggregious, but even then there are wide gaps between a champion and a hexblade, and the hexblade comes out ahead in nearly every area.

But you can get that information without doing DPR calculations. Just by reading the classes themselves. Doing all the fancy math doesn't actually add anything except a false impression of precision. Those two fail the basic sanity check calculations hard.

As far as playtests--all that matters to me is how it plays at my tables. So playtesting is real play. And that's what's important. Because I'm not trying to make any statements to the grander world. So what you call bias, I call "actually doing its job."

And that's my experience, and why I stopped doing any kind of detailed numerical analysis. I still do sanity checks, but that doesn't involve DPR, simply reading the classes and adding up dice.

Frogreaver

2022-03-24, 02:36 PM

I mean, it tells us what it says. While you're not able to completely predict the future of all potential scenarios using playtests, saying there isn't anything meaningful to learn from even just a single playtest is highly inaccurate. A DM that frequently playtests their own encounters can attest to that.

A hypothesis like "A Ranger underperforms in melee," assuming adequate reasoning, can still be analyzed even with the fuzziness of dice.

For example, if a scenario comes up where there's a Horde of enemies and the Ranger has a greatsword. You might think enemies wouldn't crowd themselves into a Horde just for them to get Whirlwind Attacked, but not getting surrounded might be just as good for the Ranger where they don't get as much melee damage, they get more opportunities to control the area without their concentration being challenged. And that would likely be a consistent benefit regardless of whether the hoarding creatures were goblins, orcs, fiends, or celestials. And it would probably be apparent not by virtue of counting damage and attacks landed, but by how the creatures would need be forced to act while being moved to optimal positions.

I think a while back I playtested some powerful monsters versus different types of parties in this forum. It was apparent how effective strategies like grapple-silence were against spellcasters, even ones who have abilities outside of only spellcasting. That's something you can't really calculate, but it was still apparent. And you can easily test it out yourself by just imitating the experiment.

A single play test isn’t meaningful at all.

5 play tests aren’t meaningful either

40 might start to be - but only for the specific encounter/day you are play-testing

To really get anything meaningful you will need hundreds or thousands of play-tests for every encounter/adventuring day you want to learn about.

Outside of well set up computerized simulations, or distributed play testing done by a 10,000 d&d groups, play testing is not particularly useful gauge of performance.

Asisreo1

2022-03-24, 02:43 PM

A single play test isn’t meaningful at all.

5 play tests aren’t meaningful either

40 might start to be - but only for the specific encounter/day you are play-testing

To really get anything meaningful you will need hundreds or thousands of play-tests for every encounter/adventuring day you want to learn about.

Outside of well set up computerized simulations, or distributed play testing done by a 10,000 d&d groups, play testing is not particularly useful gauge of performance.
The only thing a playtest can't meaningfully tell you is the average success rate of a tactic. However, a playtest can meaningfully tell you whether a tactic is useful and allows you to analyze why and how it was useful.

So even if your paladin's smites all roll ones, you'll still be able to see that they were capable of rolling more smites when they charged into the frontlines rather than waited in the backlines. And you'd be able to see whether they were more vulnerable in that position.

There's no need to simulate thousands of combats when simply dissecting one or two can provide useful nuggets of information.

Willie the Duck

2022-03-24, 02:43 PM

Play tests tell you different information than DPR calculations. Actually useful information, regardless of sample size. Their outputs aren't the same--you're not playtesting to find DPR. Because DPR is a proxy for something you actually care about, which play tests reveal directly. Proxies are always sub-par if you have access to the actual data you care about.
Regardless of sample size? If on one single playtest something happened, there's no capacity for it to have been an outlier?

Conclusion denied.
Declaring a position correct or incorrect, either instead of or in supplement to an actual argument never ends well.

Edit: And I'll note that the official materials agree with me--the numbers for (for instance) monster creation are presented as a 'first-pass guess' to get you into the ballpark and are intentionally not very precise. Because they go on to say that you need to playtest playtest playtest and adjust CR (and other numbers) to fit that.
Aside from being an appeal to authority, this only speaks to whether the numbers in the monster creation rules are hard-and-fast (which everyone recognizes they aren't), not whether using DPR calculations have any overall value.

Playtest is experiment. DPR numbers are theory. And theory must give way to experiment where they disagree.
Neither theory not practical application have a direct primacy. Both have limitations and are constrained by how well they map to the (I'm assuming actual goal of) overall experience of the overall experience of play. Theory has a limit of applicability, while playtests have a limit of how many you can feasibly run before gametime. Both of them have the capacity to draw incorrect conclusions (DPR because the theory may not map to in-game realities, playtests because you're effectively running low-sample tests and performing, at best, nonparametric comparisons).

The only thing a playtest can't meaningfully tell you is the average success rate of a tactic. However, a playtest can meaningfully tell you whether a tactic is useful and allows you to analyze why and how it was useful.

So even if your paladin's smites all roll ones, you'll still be able to see that they were capable of rolling more smites when they charged into the frontlines rather than waited in the backlines. And you'd be able to see whether they were more vulnerable in that position.

There's no need to simulate thousands of combats when simply dissecting one or two can provide useful nuggets of information.

Both have value. In the case of the paladin smites, if you can only run a few fights worth of playtest, and you do roll a lot of ones, you might come away with the conclusion that, while you are capable of rolling more smites when they charge into the frontlines rather than wait in the backlines, you might not do so because you didn't find smiting to be a good pay on your spell-slot dollars, so-to-speak. The simulating of thousands of combats (or, more practically, DPR or other statistical analysis) doesn't provide the insight about charging vs. staying back, but it will tell you the like damage output of smites when they happen, and that too is usable and useful information.

strangebloke

2022-03-24, 02:51 PM

But you can get that information without doing DPR calculations. Just by reading the classes themselves. Doing all the fancy math doesn't actually add anything except a false impression of precision. Those two fail the basic sanity check calculations hard.

As far as playtests--all that matters to me is how it plays at my tables. So playtesting is real play. And that's what's important. Because I'm not trying to make any statements to the grander world. So what you call bias, I call "actually doing its job."

And that's my experience, and why I stopped doing any kind of detailed numerical analysis. I still do sanity checks, but that doesn't involve DPR, simply reading the classes and adding up dice.

The only real play is the play I experience. The play you experience is not real (to me). If we disagree, there's simply nothing interesting to say unless you try to take things further with a little analysis. From that discussion and the framing of it, people will be forced to dig into why their subjective experience ends up feeling the way it does. Maybe someone has more short rests or more magic items than another player, and this discrepancy lead to them thinking of fighters as better than another player did. Analysis forces you to consider how the conditions of your game impact the experience of playing in your game. You'll never get perfectly precise but you can learn a lot about how to DM from doing numerical analysis.

Far more than you can learn from just saying "look I played a rogue and felt really good." That kind of talking point, above all else, is just kind of bland and uninteresting.

Frogreaver

2022-03-24, 02:59 PM

On a side note. With D&D’s state space in a specific encounter/adventuring day being larger than the state space of a game of chess, how many play tests are needed to get statistically significant results?

D&D play test results may not even form a normal distribution. So I’m really curious about this.

Elder_Basilisk

2022-03-24, 03:17 PM

I'm not even sure what it would mean to get statistically significant results from a meaningful D&D playtest.

There are a lot of non-random elements that go into a D&D combat so what would you want to do? Calculate all possible choices by each combatant and run each combination of possible choices through enough times that you have a statistically significant sample size to calculate group the outcomes into various groups? I'm sure that would tell you a lot of things like PCs lose if the spellcasters decide to run to the front lines and grapple rather than casting spells and PCs lose if they all spend their move actions running around to provoke OAs from every monster they can, but in general, it doesn't seem like playtesting lends itself to statistically significant samples any more than regular season NFL games do. That doesn't mean playtesting is not useful--evidence doesn't all need to be statistically significant to be useful. Most people can still gather enough info from the NFL records to know not to bet on the Jaguars winning.

PhoenixPhyre

2022-03-24, 03:39 PM

On a side note. With D&D’s state space in a specific encounter/adventuring day being larger than the state space of a game of chess, how many play tests are needed to get statistically significant results?

D&D play test results may not even form a normal distribution. So I’m really curious about this.

Proxies are proxies. And are sub-par.

Statistics aren't actually valuable in and of themselves when we can actually get at the actual items of interest. Statistics are only needed when we can't do a direct measurement. And not all elements of interest are amenable to statistical analysis--in fact, very few of the things that go into a game of D&D have any real bearing on or relevance to statistical measures. Because no one is playing a statistical game of D&D--they're playing a real one.

Just because it can be measured doesn't make it meaningful. And the more abstracted from the system of interest, the less likely to be meaningful unless you have a strong model backing up why that particular measure is important. Even beyond that, metrics that become targets cease to be good metrics. Because metrics are gameable (by definition). Additionally, answering a question about balance with "the DPRs differ by X" is entirely missing the mark, unless your (subjective) definition of balance is "DPRs as calculated by Y measure differ by less than Z". Which is rather smuggling in the conclusion into the definition.

In the end, the only meaningful measure is "do I have fun playing this at this table." All the calculations in the world can't tell you the answer to that. And pretending they can is hubris and scientism. It's confusing Is with Ought in a bad way. Being "data driven" is a buzzword, used to smuggle in assumptions and biases. Because, as they say, there are lies, d**n lies, and statistics.

Asisreo1

2022-03-24, 04:00 PM

Both have value. In the case of the paladin smites, if you can only run a few fights worth of playtest, and you do roll a lot of ones, you might come away with the conclusion that, while you are capable of rolling more smites when they charge into the frontlines rather than wait in the backlines, you might not do so because you didn't find smiting to be a good pay on your spell-slot dollars, so-to-speak. The simulating of thousands of combats (or, more practically, DPR or other statistical analysis) doesn't provide the insight about charging vs. staying back, but it will tell you the like damage output of smites when they happen, and that too is usable and useful information.
If you roll a lot of one's and conclude anything other than poor luck based on those ones, you aren't analyzing the data correctly. It'd be as foolish as think that because you won the lottery after 10 attempts that you'll win the lottery again on your 20th.

Drawing wrong conclusions from data isn't playtest/simulation exclusive. I could do that with statistical data as well. I could say since 2d8 and 2d4+4 share an average, that they are exactly identical. They are not. But that's why you show your work so if you get something wrong, someone else can check. The equivalent would be giving an accurate account of what happened during the playtest.

You can actually completely separate dice rolls from your playtest completely and get accurate conclusions. If you take the accuracy-adjusted crit-adjusted DPR and all attacks "hit" using that, then you'll get a simulation of how the average of combat might go. Then, you just analyze the tactics. Again, it's not about "when did this character die" or "did the enemy make their save?" It's about how effective the tactic is.

Frogreaver

2022-03-24, 04:03 PM

Proxies are proxies. And are sub-par.

Statistics aren't actually valuable in and of themselves when we can actually get at the actual items of interest. Statistics are only needed when we can't do a direct measurement. And not all elements of interest are amenable to statistical analysis--in fact, very few of the things that go into a game of D&D have any real bearing on or relevance to statistical measures. Because no one is playing a statistical game of D&D--they're playing a real one.

Just because it can be measured doesn't make it meaningful. And the more abstracted from the system of interest, the less likely to be meaningful unless you have a strong model backing up why that particular measure is important. Even beyond that, metrics that become targets cease to be good metrics. Because metrics are gameable (by definition). Additionally, answering a question about balance with "the DPRs differ by X" is entirely missing the mark, unless your (subjective) definition of balance is "DPRs as calculated by Y measure differ by less than Z". Which is rather smuggling in the conclusion into the definition.

In the end, the only meaningful measure is "do I have fun playing this at this table." All the calculations in the world can't tell you the answer to that. And pretending they can is hubris and scientism. It's confusing Is with Ought in a bad way. Being "data driven" is a buzzword, used to smuggle in assumptions and biases. Because, as they say, there are lies, d**n lies, and statistics.

There’s some grains of truth in here, but they are so muddied in faulty analysis and grand proclamations that all of it’s ironically absurd. Ironic because the accusation against dpr is that too many grand pronouncements are made from it while you perform the same sin but worse. Absurd because your conclusions ignore how the entire world functions in pursuit of some ideological purity that will never come to pass nor even be helpful if it did so. As it amounts to ‘calculate nothing’. Thank goodness our engineers don’t just go on feels.

You can actually completely separate dice rolls from your playtest completely and get accurate conclusions. If you take the accuracy-adjusted crit-adjusted DPR and all attacks "hit" using that, then you'll get a simulation of how the average of combat might go. Then, you just analyze the tactics. Again, it's not about "when did this character die" or "did the enemy make their save?" It's about how effective the tactic is.

trying to take an accuracy adjusted average and work that into a ‘mock’ play test as some kind of average encounter is such a bad idea. You are just compounding error upon error. You are taking the worst of both worlds.

I'm not even sure what it would mean to get statistically significant results from a meaningful D&D playtest.

There are a lot of non-random elements that go into a D&D combat so what would you want to do? Calculate all possible choices by each combatant and run each combination of possible choices through enough times that you have a statistically significant sample size to calculate group the outcomes into various groups? I'm sure that would tell you a lot of things like PCs lose if the spellcasters decide to run to the front lines and grapple rather than casting spells and PCs lose if they all spend their move actions running around to provoke OAs from every monster they can, but in general, it doesn't seem like playtesting lends itself to statistically significant samples any more than regular season NFL games do. That doesn't mean playtesting is not useful--evidence doesn't all need to be statistically significant to be useful. Most people can still gather enough info from the NFL records to know not to bet on the Jaguars winning.

I’m not the one arguing anything important can be gleaned from a single play-test.

Maybe ask them what important things they learn and then determine how many play-tests are needed to know those things are statistically true as opposed to just a random variance data point.

PhoenixPhyre

2022-03-24, 04:27 PM

There’s some grains of truth in here, but they are so muddied in faulty analysis and grand proclamations that all of it’s ironically absurd. Ironic because the accusation against dpr is that too many grand pronouncements are made from it while you perform the same sin but worse. Absurd because your conclusions ignore how the entire world functions in pursuit of some ideological purity that will never come to pass nor even be helpful if it did so. As it amounts to ‘calculate nothing’. Thank goodness our engineers don’t just go on feels.

Strawman. I've said (repeatedly) that doing smoke-test calculations is useful. But that requires a tiny fraction of the assumptions and is only looking for outliers. Which don't need statistical rigor, because we're only looking for things that dramatically breach the noise frontier. Basically just looking for huge signals, which doesn't need any kind of statistical power.

Engineers also know what can be calculated and what can't (epistemic humility). Basically, D&D, as with all games, for the most part comes down to taste. What does it mean for something to be balanced? That's a value/definition question, not a mathematical one. Is this fun to play? Again, that's a value and subjective question. Not a mathematical one. None of the interesting questions about D&D are mathematical or amenable to statistics. Because statistics (by their nature) can't tell you about the individual case. They can only provide general information about the general case. But no one is playing the general case--we're only playing at individual tables.

I used to be all in for the "lots of calculations" thing. But then I looked at what I was learning from it and realized I'd fallen prey to a strong case of physics envy and was expending tons of energy that was leading me down the wrong path, because what I was measuring wasn't a good proxy for what I wanted to know. Rather the opposite--it was an actively bad proxy. Calculate that which is meaningful to calculate. But most of D&D isn't meaningful to calculate. At least in my experience.

My games, worldbuilding, and homebrew got a whole lot better (as measured by player satisfaction and my own feelings, which is all that matters) when I gave up trying to analyze the numbers and focused on producing a good experience for those particular people in that particular case. The general case doesn't matter unless you're trying to write things for sale. In which case, you can do actual playtesting and you're the one setting the assumptions.

Doing calculations so we can argue about something on the forums is utterly pointless--there are tons of things to argue about without the false precision that these numbers provide.

Yakk

2022-03-24, 09:15 PM

For a swing with hit chance H, crit chance C, damage distribution D and crit bonus damage B, the attack's distribution is (H-C)D+C(D+B). Variance is then V(HD)+V(CB). As V(AB) = V(A)V(B) + V(A)E(B)^2 + V(B)E(A)^2 we get:

V(H)V(D)+V(C)V(B)+V(H)E(D)^2+V(D)E(H)^2+V(C)E(B)^2 +V(B)E(C)^2

Now, V(H) is h(1-h) and similarly for C. E(H) is h^2, ditto.

h(1-h)V(D)
+c(1-c)V(B)
+h(1-h)E(D)^2
+h^2V(D)
+c(1-c)E(B)^2
+c^2V(B)

Factoring we get:

hV(D)
+h(1-h)E(D)^2
+cV(B)
+c(1-c)E(B)^2

Now, suppose a 50% hit rate, 5% crit rate, 1d12 damage die, +5 damage. E(D)=11.5, E(B)=6.5, V(D)=V(B)=143/12.

1/2 143/12
+1/4 23^2/4
+1/20 143/12
+19/400 169/4

Or Var is about 41.6. E is 6.0625.

Now this is a SD as big as E; 6.45. But Var is linear, SD is the square root.

At 2 attacks/round over a 10 round day, E becomes 120 and Var becomes a bit under 130. The sqrt is 11.6, so the 95% confidende interval is +/-22 and change.

Over a day, this character does 98 to 142 damage 19 times out of 20.

You can do this with any D&D DPR calculation you want. I'd advise against doing it manually; I did the above basically by hand (ok, sqrts and one sum or products by copy paste into calculator), so I probably had typos.

This character would be described as 12.2 DPR. Saying 12.2 DPR with SD of 9.2 would probably be better; then you could distinguish between highly reliable and less so choices.

Here, a +20% damage and SD (imagine, say, +1d6 damage per swing) eould be something like 15 DPR and 11.0 SD per round. On a 10 round adventuring day, 19/20 it does 115-185 damage. This overlaps with the above range their damage difference over 10 rounds is +18 with SD of 41. The low damage person outdamaging the high is a 0.5 sigma event (not that rare; 40% ish?)

Crits, unless you get fancy with smites and fishing, don't contribute a lot to the expected value. And not that much to the variance either I think. With fishing and smites it gets more interesting.

Now, if you go from the above 20% difference to a 50% or 100% difference, suddenly the overlap vanishes.

At 100% boost - hits that do 2d12+10 - the delta is 122 with SD in the 60ish. So being outdamaged is a 2 sigma event.

A big problem is that averages are far simpler. I bet hslf the people in this thread can audit average calculations; and I'd guess 0-2 could audit this variance math I'm doing off the top of their head. So even if I don't make a mistake (and I probably did), the result is less convincing to people.

Frogreaver

2022-03-25, 12:30 AM

For a swing with hit chance H, crit chance C, damage distribution D and crit bonus damage B, the attack's distribution is (H-C)D+C(D+B). Variance is then V(HD)+V(CB). As V(AB) = V(A)V(B) + V(A)E(B)^2 + V(B)E(A)^2 we get:

V(H)V(D)+V(C)V(B)+V(H)E(D)^2+V(D)E(H)^2+V(C)E(B)^2 +V(B)E(C)^2

Now, V(H) is h(1-h) and similarly for C. E(H) is h^2, ditto.

h(1-h)V(D)
+c(1-c)V(B)
+h(1-h)E(D)^2
+h^2V(D)
+c(1-c)E(B)^2
+c^2V(B)

Factoring we get:

hV(D)
+h(1-h)E(D)^2
+cV(B)
+c(1-c)E(B)^2

Now, suppose a 50% hit rate, 5% crit rate, 1d12 damage die, +5 damage. E(D)=11.5, E(B)=6.5, V(D)=V(B)=143/12.

1/2 143/12
+1/4 23^2/4
+1/20 143/12
+19/400 169/4

Or Var is about 41.6. E is 6.0625.

Now this is a SD as big as E; 6.45. But Var is linear, SD is the square root.

At 2 attacks/round over a 10 round day, E becomes 120 and Var becomes a bit under 130. The sqrt is 11.6, so the 95% confidende interval is +/-22 and change.

Over a day, this character does 98 to 142 damage 19 times out of 20.

You can do this with any D&D DPR calculation you want. I'd advise against doing it manually; I did the above basically by hand (ok, sqrts and one sum or products by copy paste into calculator), so I probably had typos.

This character would be described as 12.2 DPR. Saying 12.2 DPR with SD of 9.2 would probably be better; then you could distinguish between highly reliable and less so choices.

Here, a +20% damage and SD (imagine, say, +1d6 damage per swing) eould be something like 15 DPR and 11.0 SD per round. On a 10 round adventuring day, 19/20 it does 115-185 damage. This overlaps with the above range their damage difference over 10 rounds is +18 with SD of 41. The low damage person outdamaging the high is a 0.5 sigma event (not that rare; 40% ish?)

Crits, unless you get fancy with smites and fishing, don't contribute a lot to the expected value. And not that much to the variance either I think. With fishing and smites it gets more interesting.

Now, if you go from the above 20% difference to a 50% or 100% difference, suddenly the overlap vanishes.

At 100% boost - hits that do 2d12+10 - the delta is 122 with SD in the 60ish. So being outdamaged is a 2 sigma event.

A big problem is that averages are far simpler. I bet hslf the people in this thread can audit average calculations; and I'd guess 0-2 could audit this variance math I'm doing off the top of their head. So even if I don't make a mistake (and I probably did), the result is less convincing to people.

I don't think many attacks over an adventuring day will yield a fully normal distribution and the analysis you are doing requires a normal distribution I think?

I would expect it to be skewed to some degree I think?

Malkavia

2022-03-25, 08:17 AM

I'm seeing a lot of "Don't do any mathematical calculations before sending a rocket to space. Those calculations are just theories. What you really need to do is build let engineers just build rockets and send them to space if you want to get real info."

Now, I won't deny that theory has been wrong and that actually sending rockets to space has taught us much. In fact, sometimes tragic lessons learned often influence theory. However, it would completely silly to say the theory has no value.

I expect game designers to understand theory and run thorough simulations on different game builds. Fortunately, since creating a game has less consequences than sending people to space on rockets, game designers should also play test things. However, the fact they will eventually play test things does not invalidate the theory. Math and theory is the foundation of so many technological fields and advances we've seen in our lives. It's flat wrong to say that the math and theory is wasted, pointless, incorrect, what have you.

In short, if you have 2 classes with essentially the same role to mean creature with pointy stick, and the math says that one will on average hurt the mean creature 30% more, that is meaningful and should be fixed before it ever even reaches playtesting.

PhantomSoul

2022-03-25, 08:29 AM

I don't think many attacks over an adventuring day will yield a fully normal distribution and the analysis you are doing requires a normal distribution I think?

I would expect it to be skewed to some degree I think?

You're still sampling from a normal distribution for the expectations :)
(Though once you add in accuracy, it's a normal distribution plus a bunch of zeros, but that's manageable really)

I'm seeing a lot of "Don't do any mathematical calculations before sending a rocket to space. Those calculations are just theories. What you really need to do is build let engineers just build rockets and send them to space if you want to get real info."

Now, I won't deny that theory has been wrong and that actually sending rockets to space has taught us much. In fact, sometimes tragic lessons learned often influence theory. However, it would completely silly to say the theory has no value.

I expect game designers to understand theory and run thorough simulations on different game builds. Fortunately, since creating a game has less consequences than sending people to space on rockets, game designers should also play test things. However, the fact they will eventually play test things does not invalidate the theory. Math and theory is the foundation of so many technological fields and advances we've seen in our lives. It's flat wrong to say that the math and theory is wasted, pointless, incorrect, what have you.

In short, if you have 2 classes with essentially the same role to mean creature with pointy stick, and the math says that one will on average hurt the mean creature 30% more, that is meaningful and should be fixed before it ever even reaches playtesting.

Exactly! Expected damage is just one factor and how a class feels in play matters without being nicely conveyed by the math, but that doesn't mean that DPR is worthless... it's just not everything, even for damage. (And for this thread, DPR is seen as a starting point for damage, not even the only metric for damage you'd want to know, with "pro-math" people repeatedly saying it's that we need more information and better parameters for the comparisons, not to dump a valid and useful tool.)

Malkavia

2022-03-25, 08:41 AM

Exactly! Expected damage is just one factor and how a class feels in play matters without being nicely conveyed by the math, but that doesn't mean that DPR is worthless... it's just not everything, even for damage. (And for this thread, DPR is seen as a starting point for damage, not even the only metric for damage you'd want to know, with "pro-math" people repeatedly saying it's that we need more information and better parameters for the comparisons, not to dump a valid and useful tool.)

I agree with this completely. While I think that using averages to understand DPR of a build is perfectly acceptable, I also don't think it's the only way to judge a build's overall effectiveness. If we did that, Wizards would clearly be the worst class, and we know that isn't true.

I'm personally a big nerd and use six sigma decision matrices when deciding which build to play, and DPR is only one of the columns (though I admittedly do give it a pretty heavy weight). I also include things like mobility, control, exploration pillar utility, social pillar utility, option variety, group role need, etc.

Yakk

2022-03-25, 09:10 AM

I don't think many attacks over an adventuring day will yield a fully normal distribution and the analysis you are doing requires a normal distribution I think?

I would expect it to be skewed to some degree I think?
Naw, the central limit theorem is gonna win by n=20 in my experience.

We can go all out and do a full counting polynomial.
This describes a single attack with a 50% hit 5% crit 1d12+5 damage:
.5 + .45 *( (x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^10+x^11+x^12) * x^5 / 12 ) + .05 * ( ((x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^10+x^11+x^12 )/12)^2 * x^5 )

Using wolfram alpha, we get:
0.000347222 x^29 + 0.000694444 x^28 + 0.00104167 x^27 + 0.00138889 x^26 + 0.00173611 x^25 + 0.00208333 x^24 + 0.00243056 x^23 + 0.00277778 x^22 + 0.003125 x^21 + 0.00347222 x^20 + 0.00381944 x^19 + 0.00416667 x^18 + 0.0413194 x^17 + 0.0409722 x^16 + 0.040625 x^15 + 0.0402778 x^14 + 0.0399306 x^13 + 0.0395833 x^12 + 0.0392361 x^11 + 0.0388889 x^10 + 0.0385417 x^9 + 0.0381944 x^8 + 0.0378472 x^7 + 0.0375 x^6 + 0.5
(25 terms)
which is the probability of each result times x^(damage of that result).

We can then raise this to the power 20 to describe *every single possibility*, resulting in a 576 term polynomial.

9.5367431640625*^-7 + 1.430511474609375*^-6 x^6 + 1.4437569512261284*^-6 x^7 + 1.4570024278428819*^-6 x^8 + 1.4702479044596355*^-6 x^9 + 1.483493381076389*^-6 x^10 + 1.4967388576931423*^-6 x^11 + 2.5292237599690757*^-6 x^12 + 3.580583466423882*^-6 x^13 + 4.650905360410242*^-6 x^14 + 5.740276825280837*^-6 x^15 + 6.848785244388346*^-6 x^16 + 7.976518001085445*^-6 x^17 + 8.151708745662076*^-6 x^18 + 0.0000102217 x^19 + 0.0000127955 x^20 + 0.000015886 x^21 + 0.0000195066 x^22 + 0.0000236704 x^23 + 0.0000264987 x^24 + 0.0000301384 x^25 + 0.00003476 x^26 + 0.0000405395 x^27 + 0.0000476588 x^28 + 0.0000563057 x^29 + 0.0000653328 x^30 + 0.0000749738 x^31 + 0.0000854799 x^32 + 0.0000971608 x^33 + 0.000110374 x^34 + 0.000125526 x^35 + 0.000143515 x^36 + 0.000163298 x^37 + 0.000184843 x^38 + 0.000208173 x^39 + 0.000233382 x^40 + 0.000260638 x^41 + 0.000291399 x^42 + 0.000325741 x^43 + 0.00036366 x^44 + 0.000405082 x^45 + 0.000449868 x^46 + 0.000497823 x^47 + 0.000549547 x^48 + 0.000605561 x^49 + 0.00066632 x^50 + 0.000732159 x^51 + 0.000803274 x^52 + 0.000879685 x^53 + 0.000961098 x^54 + 0.00104775 x^55 + 0.00113998 x^56 + 0.0012382 x^57 + 0.00134286 x^58 + 0.00145437 x^59 + 0.00157262 x^60 + 0.00169751 x^61 + 0.001829 x^62 + 0.00196716 x^63 + 0.00211216 x^64 + 0.00226435 x^65 + 0.00242391 x^66 + 0.00259087 x^67 + 0.00276517 x^68 + 0.00294666 x^69 + 0.00313516 x^70 + 0.00333055 x^71 + 0.00353285 x^72 + 0.00374203 x^73 + 0.00395801 x^74 + 0.00418061 x^75 + 0.00440954 x^76 + 0.00464439 x^77 + 0.00488479 x^78 + 0.00513043 x^79 + 0.005381 x^80 + 0.0056362 x^81 + 0.00589566 x^82 + 0.00615892 x^83 + 0.00642545 x^84 + 0.00669471 x^85 + 0.0069661 x^86 + 0.00723909 x^87 + 0.00751311 x^88 + 0.00778761 x^89 + 0.00806199 x^90 + 0.00833561 x^91 + 0.00860779 x^92 + 0.00887785 x^93 + 0.00914509 x^94 + 0.00940884 x^95 + 0.00966846 x^96 + 0.00992329 x^97 + 0.0101727 x^98 + 0.010416 x^99 + 0.0106525 x^100 + 0.0108816 x^101 + 0.0111026 x^102 + 0.0113149 x^103 + 0.0115179 x^104 + 0.0117112 x^105 + 0.0118941 x^106 + 0.0120661 x^107 + 0.0122268 x^108 + 0.0123757 x^109 + 0.0125123 x^110 + 0.0126364 x^111 + 0.0127476 x^112 + 0.0128456 x^113 + 0.0129302 x^114 + 0.0130011 x^115 + 0.0130581 x^116 + 0.0131012 x^117 + 0.0131303 x^118 + 0.0131453 x^119 + 0.0131463 x^120 + 0.0131332 x^121 + 0.0131063 x^122 + 0.0130656 x^123 + 0.0130113 x^124 + 0.0129436 x^125 + 0.0128629 x^126 + 0.0127693 x^127 + 0.0126633 x^128 + 0.0125451 x^129 + 0.0124153 x^130 + 0.0122742 x^131 + 0.0121222 x^132 + 0.01196 x^133 + 0.0117879 x^134 + 0.0116064 x^135 + 0.0114162 x^136 + 0.0112178 x^137 + 0.0110118 x^138 + 0.0107987 x^139 + 0.0105791 x^140 + 0.0103536 x^141 + 0.0101229 x^142 + 0.00988744 x^143 + 0.00964791 x^144 + 0.00940488 x^145 + 0.00915892 x^146 + 0.00891062 x^147 + 0.00866053 x^148 + 0.00840921 x^149 + 0.0081572 x^150 + 0.007905 x^151 + 0.00765314 x^152 + 0.0074021 x^153 + 0.00715233 x^154 + 0.0069043 x^155 + 0.00665841 x^156 + 0.00641507 x^157 + 0.00617466 x^158 + 0.00593753 x^159 + 0.00570401 x^160 + 0.00547442 x^161 + 0.00524901 x^162 + 0.00502807 x^163 + 0.00481181 x^164 + 0.00460045 x^165 + 0.00439417 x^166 + 0.00419313 x^167 + 0.00399747 x^168 + 0.00380731 x^169 + 0.00362274 x^170 + 0.00344385 x^171 + 0.00327067 x^172 + 0.00310325 x^173 + 0.00294161 x^174 + 0.00278574 x^175 + 0.00263562 x^176 + 0.00249123 x^177 + 0.00235252 x^178 + 0.00221943 x^179 + 0.00209189 x^180 + 0.00196981 x^181 + 0.0018531 x^182 + 0.00174165 x^183 + 0.00163536 x^184 + 0.0015341 x^185 + 0.00143775 x^186 + 0.00134618 x^187 + 0.00125925 x^188 + 0.00117681 x^189 + 0.00109874 x^190 + 0.00102487 x^191 + 0.00095506 x^192 + 0.000889165 x^193 + 0.000827032 x^194 + 0.000768512 x^195 + 0.000713456 x^196 + 0.000661715 x^197 + 0.000613145 x^198 + 0.0005676 x^199 + 0.000524939 x^200 + 0.000485024 x^201 + 0.000447717 x^202 + 0.000412886 x^203 + 0.000380402 x^204 + 0.00035014 x^205 + 0.000321977 x^206 + 0.000295797 x^207 + 0.000271486 x^208 + 0.000248935 x^209 + 0.000228038 x^210 + 0.000208695 x^211 + 0.00019081 x^212 + 0.00017429 x^213 + 0.000159047 x^214 + 0.000144997 x^215 + 0.000132061 x^216 + 0.000120163 x^217 + 0.000109231 x^218 + 0.0000991976 x^219 + 0.0000899984 x^220 + 0.0000815731 x^221 + 0.0000738647 x^222 + 0.0000668195 x^223 + 0.0000603874 x^224 + 0.0000545212 x^225 + 0.0000491766 x^226 + 0.0000443126 x^227 + 0.0000398904 x^228 + 0.0000358743 x^229 + 0.0000322307 x^230 + 0.0000289286 x^231 + 0.0000259392 x^232 + 0.0000232357 x^233 + 0.0000207933 x^234 + 0.0000185891 x^235 + 0.0000166021 x^236 + 0.0000148127 x^237 + 0.0000132029 x^238 + 0.0000117563 x^239 + 0.0000104577 x^240 + 9.293140926254316*^-6 x^241 + 8.24997896380197*^-6 x^242 + 7.316519008577054*^-6 x^243 + 6.482112850243499*^-6 x^244 + 5.737043563492147*^-6 x^245 + 5.072455341016117*^-6 x^246 + 4.48028741291018*^-6 x^247 + 3.953211941240733*^-6 x^248 + 3.48457576671896*^-6 x^249 + 3.0683458747873583*^-6 x^250 + 2.6990584408241408*^-6 x^251 + 2.3717713084107174*^-6 x^252 + 2.0820197505284838*^-6 x^253 + 1.8257753609907625*^-6 x^254 + 1.5994079222168811*^-6 x^255 + 1.399650095467634*^-6 x^256 + 1.223564780741543*^-6 x^257 + 1.0685149955442654*^-6 x^258 + 9.321361245623013*^-7 x^259 + 8.12310395778534*^-7 x^260 + 7.071434426511068*^-7 x^261 + 6.149428165373187*^-7 x^262 + 5.34198318487006*^-7 x^263 + 4.63564024769685*^-7 x^264 + 4.018418859586919*^-7 x^265 + 3.479667850035489*^-7 x^266 + 3.00992945416228*^-7 x^267 + 2.600815864225624*^-7 x^268 + 2.2448972763826144*^-7 x^269 + 1.935600514776457*^-7 x^270 + 1.6671173705609565*^-7 x^271 + 1.4343218477352792*^-7 x^272 + 1.2326955604029727*^-7 x^273 + 1.0582605770773995*^-7 x^274 + 9.075190567634996*^-8 x^275 + 7.773990686237697*^-8 x^276 + 6.65206031989987*^-8 x^277 + 5.685792562478165*^-8 x^278 + 4.8545310066267896*^-8 x^279 + 4.140223125197978*^-8 x^280 + 3.527111380277214*^-8 x^281 + 3.001458343093102*^-8 x^282 + 2.5513024251910847*^-8 x^283 + 2.1662411173588164*^-8 x^284 + 1.8372389084920687*^-8 x^285 + 1.5564573126308905*^-8 x^286 + 1.3171046695792045*^-8 x^287 + 1.1133036036985586*^-8 x^288 + 9.399742275234198*^-9 x^289 + 7.927313626865803*^-9 x^290 + 6.677942211834463*^-9 x^291 + 5.6190714615655485*^-9 x^292 + 4.722701540507356*^-9 x^293 + 3.96478150062318*^-9 x^294 + 3.324678071477075*^-9 x^295 + 2.784712063050503*^-9 x^296 + 2.329754332039072*^-9 x^297 + 1.9468741428391694*^-9 x^298 + 1.6250335491005577*^-9 x^299 + 1.3548221375708616*^-9 x^300 + 1.1282271195824373*^-9 x^301 + 9.384343331314253*^-10 x^302 + 7.796562358835452*^-10 x^303 + 6.469834320374607*^-10 x^304 + 5.362566888341279*^-10 x^305 + 4.4395676630379903*^-10 x^306 + 3.6710971092316374*^-10 x^307 + 3.0320555420742483*^-10 x^308 + 2.5012861455774724*^-10 x^309 + 2.060978282897372*^-10 x^310 + 1.6961573676126202*^-10 x^311 + 1.3942493370467816*^-10 x^312 + 1.1447093280238661*^-10 x^313 + 9.38705525418812*^-11 x^314 + 7.688503554368899*^-11 x^315 + 6.289722476593108*^-11 x^316 + 5.139221095812516*^-11 x^317 + 4.194094599719777*^-11 x^318 + 3.41863866651731*^-11 x^319 + 2.7831794250780506*^-11 x^320 + 2.26308681746402*^-11 x^321 + 1.8379437628380306*^-11 x^322 + 1.4908474852126417*^-11 x^323 + 1.207822792432144*^-11 x^324 + 9.773300486379445*^-12 x^325 + 7.898531276327435*^-12 x^326 + 6.375548233074845*^-12 x^327 + 5.139890725154496*^-12 x^328 + 4.1386095665790106*^-12 x^329 + 3.328268268952683*^-12 x^330 + 2.673280759142703*^-12 x^331 + 2.1445308416018926*^-12 x^332 + 1.7182272443995723*^-12 x^333 + 1.3749553674530828*^-12 x^334 + 1.0988930321202385*^-12 x^335 + 8.771627707676656*^-13 x^336 + 6.99297629493282*^-13 x^337 + 5.568012045869039*^-13 x^338 + 4.427857950564299*^-13 x^339 + 3.5167521696598517*^-13 x^340 + 2.7896106544668706*^-13 x^341 + 2.2100309137366296*^-13 x^342 + 1.7486593690833303*^-13 x^343 + 1.3818579441883907*^-13 x^344 + 1.0906165682879539*^-13 x^345 + 8.59667483571693*^-14 x^346 + 6.767649192428121*^-14 x^347 + 5.3210007833887767*^-14 x^348 + 4.178266855225572*^-14 x^349 + 3.276767412170475*^-14 x^350 + 2.5664976846922795*^-14 x^351 + 2.0076184912623173*^-14 x^352 + 1.56843230735049*^-14 x^353 + 1.2237533353652774*^-14 x^354 + 9.535967221631139*^-15 x^355 + 7.421259177447781*^-15 x^356 + 5.7680852837777605*^-15 x^357 + 4.477403220471452*^-15 x^358 + 3.471046538167882*^-15 x^359 + 2.6874079254332724*^-15 x^360 + 2.0779969663179724*^-15 x^361 + 1.6046991074068988*^-15 x^362 + 1.2375960752640735*^-15 x^363 + 9.532351899335355*^-16 x^364 + 7.332570643355548*^-16 x^365 + 5.633090144737274*^-16 x^366 + 4.3218591286965566*^-16 x^367 + 3.3115183842422404*^-16 x^368 + 2.53405234604367*^-16 x^369 + 1.936578132536299*^-16 x^370 + 1.4780348328530114*^-16 x^371 + 1.1265842703482872*^-16 x^372 + 8.575732389870605*^-17 x^373 + 6.519381925281671*^-17 x^374 + 4.9495809063081335*^-17 x^375 + 3.75280805186755*^-17 x^376 + 2.841641616771242*^-17 x^377 + 2.148851386489075*^-17 x^378 + 1.6228062135097*^-17 x^379 + 1.2239092402956668*^-17 x^380 + 9.218347745111055*^-18 x^381 + 6.933895894870583*^-18 x^382 + 5.208598975011174*^-18 x^383 + 3.9073552671894174*^-18 x^384 + 2.9272663229482903*^-18 x^385 + 2.1900694787002797*^-18 x^386 + 1.636322164221359*^-18 x^387 + 1.220938869530597*^-18 x^388 + 9.09771055964057*^-19 x^389 + 6.769900412545019*^-19 x^390 + 5.030872028471522*^-19 x^391 + 3.7334807787176676*^-19 x^392 + 2.7668972915027795*^-19 x^393 + 2.047761671789294*^-19 x^394 + 1.513462949496859*^-19 x^395 + 1.1170405042374278*^-19 x^396 + 8.233215761002825*^-20 x^397 + 6.059994038365553*^-20 x^398 + 4.454261076557436*^-20 x^399 + 3.269478825597054*^-20 x^400 + 2.3965106562167057*^-20 x^401 + 1.7541903979989447*^-20 x^402 + 1.2822394844851456*^-20 x^403 + 9.359552896959526*^-21 x^404 + 6.822335214684362*^-21 x^405 + 4.9659394466652046*^-21 x^406 + 3.609595867654517*^-21 x^407 + 2.620009606869745*^-21 x^408 + 1.8990335435155193*^-21 x^409 + 1.3745043087763439*^-21 x^410 + 9.934397216334329*^-22 x^411 + 7.169965541334219*^-22 x^412 + 5.167385868485304*^-22 x^413 + 3.7187852203303723*^-22 x^414 + 2.6724262841851514*^-22 x^415 + 1.9177094982850487*^-22 x^416 + 1.374137890825717*^-22 x^417 + 9.832096644468629*^-23 x^418 + 7.024706084643299*^-23 x^419 + 5.0115744918870915*^-23 x^420 + 3.570113316540782*^-23 x^421 + 2.5395063532774635*^-23 x^422 + 1.803739255030021*^-23 x^423 + 1.2792427496143703*^-23 x^424 + 9.059087856735303*^-24 x^425 + 6.405687780679738*^-24 x^426 + 4.522663522494608*^-24 x^427 + 3.188360148215044*^-24 x^428 + 2.2443074560939618*^-24 x^429 + 1.5773804638800508*^-24 x^430 + 1.106947086181906*^-24 x^431 + 7.75622981252834*^-25 x^432 + 5.426313916317755*^-25 x^433 + 3.7904132646683535*^-25 x^434 + 2.643580292054428*^-25 x^435 + 1.840854895255871*^-25 x^436 + 1.2798658608353227*^-25 x^437 + 8.88431316034395*^-26 x^438 + 6.15735714505695*^-26 x^439 + 4.260616128067777*^-26 x^440 + 2.9434350218387775*^-26 x^441 + 2.0301905495166687*^-26 x^442 + 1.3980273972296186*^-26 x^443 + 9.611412803056387*^-27 x^444 + 6.59701520956805*^-27 x^445 + 4.520561573130122*^-27 x^446 + 3.0925574858528905*^-27 x^447 + 2.112123933188634*^-27 x^448 + 1.4401007174355041*^-27 x^449 + 9.802429015550475*^-28 x^450 + 6.660967575126378*^-28 x^451 + 4.5185489795164195*^-28 x^452 + 3.0599469448934825*^-28 x^453 + 2.068602611905306*^-28 x^454 + 1.3959935942212037*^-28 x^455 + 9.4043276045205*^-29 x^456 + 6.324186780876476*^-29 x^457 + 4.2453035371993444*^-29 x^458 + 2.844685701043371*^-29 x^459 + 1.9027228197945745*^-29 x^460 + 1.2703591384785837*^-29 x^461 + 8.466056359981016*^-30 x^462 + 5.6316178566561436*^-30 x^463 + 3.739178040995043*^-30 x^464 + 2.478012220231129*^-30 x^465 + 1.639110678878634*^-30 x^466 + 1.0821405458077426*^-30 x^467 + 7.130540248990159*^-31 x^468 + 4.689399133881582*^-31 x^469 + 3.0779415697049183*^-31 x^470 + 2.016248917365368*^-31 x^471 + 1.3181368360839425*^-31 x^472 + 8.600053558036529*^-32 x^473 + 5.59960790376361*^-32 x^474 + 3.638489489062847*^-32 x^475 + 2.3592972272960122*^-32 x^476 + 1.5266269548748858*^-32 x^477 + 9.857399567024466*^-33 x^478 + 6.35127765473641*^-33 x^479 + 4.083372258591749*^-33 x^480 + 2.619542826848528*^-33 x^481 + 1.6767560153617348*^-33 x^482 + 1.070880290422096*^-33 x^483 + 6.823813387862281*^-34 x^484 + 4.338270487624973*^-34 x^485 + 2.7516747081030256*^-34 x^486 + 1.7412287955718012*^-34 x^487 + 1.0992076622483086*^-34 x^488 + 6.922376874827756*^-35 x^489 + 4.3487884553002635*^-35 x^490 + 2.7252374585051253*^-35 x^491 + 1.7035245948828885*^-35 x^492 + 1.062147636758299*^-35 x^493 + 6.605377654267229*^-36 x^494 + 4.0970378687260914*^-36 x^495 + 2.5344545668952177*^-36 x^496 + 1.5635912763305206*^-36 x^497 + 9.619831369697976*^-37 x^498 + 5.901980072749858*^-37 x^499 + 3.6107205590777038*^-37 x^500 + 2.2025961458298863*^-37 x^501 + 1.3396732199069484*^-37 x^502 + 8.12387321994808*^-38 x^503 + 4.911384899784073*^-38 x^504 + 2.9600339157811503*^-38 x^505 + 1.7783430190175944*^-38 x^506 + 1.0649608116163369*^-38 x^507 + 6.356570545480698*^-39 x^508 + 3.781411018256971*^-39 x^509 + 2.2417950960558013*^-39 x^510 + 1.3243925972693534*^-39 x^511 + 7.796193462020704*^-40 x^512 + 4.572546055891115*^-40 x^513 + 2.671812767925846*^-40 x^514 + 1.5552040336180613*^-40 x^515 + 9.016980398247898*^-41 x^516 + 5.206958512167849*^-41 x^517 + 2.9944074091114816*^-41 x^518 + 1.7147210830653204*^-41 x^519 + 9.776451681563059*^-42 x^520 + 5.5490751335704365*^-42 x^521 + 3.135124128550289*^-42 x^522 + 1.7628826906030847*^-42 x^523 + 9.864217244607313*^-43 x^524 + 5.491668944590179*^-43 x^525 + 3.0414185266267573*^-43 x^526 + 1.675332720925059*^-43 x^527 + 9.176918404816086*^-44 x^528 + 4.997766633298889*^-44 x^529 + 2.7054828329753123*^-44 x^530 + 1.4554695250927653*^-44 x^531 + 7.779353247418055*^-45 x^532 + 4.130002186595516*^-45 x^533 + 2.1772020325485196*^-45 x^534 + 1.1393413649381201*^-45 x^535 + 5.916560181301456*^-46 x^536 + 3.0477961388660633*^-46 x^537 + 1.556792423516322*^-46 x^538 + 7.881618850188048*^-47 x^539 + 3.953078835439981*^-47 x^540 + 1.963219568901592*^-47 x^541 + 9.648846896543735*^-48 x^542 + 4.6902497714169663*^-48 x^543 + 2.2534710447353033*^-48 x^544 + 1.0694120590993741*^-48 x^545 + 5.009066206719119*^-49 x^546 + 2.313919145403321*^-49 x^547 + 1.0533183256249055*^-49 x^548 + 4.720741092607062*^-50 x^549 + 2.08111803614089*^-50 x^550 + 9.015562742120478*^-51 x^551 + 3.8339569312719637*^-51 x^552 + 1.598753117679621*^-51 x^553 + 6.52960455564495*^-52 x^554 + 2.6086844596329712*^-52 x^555 + 1.018134598771066*^-52 x^556 + 3.8762697647744647*^-53 x^557 + 1.437387535123015*^-53 x^558 + 5.1826576678260274*^-54 x^559 + 1.8136345333498934*^-54 x^560 + 6.147323774860345*^-55 x^561 + 2.0136715013535133*^-55 x^562 + 6.358788977848527*^-56 x^563 + 1.930321921460901*^-56 x^564 + 5.615451891515952*^-57 x^565 + 1.5598447522224016*^-57 x^566 + 4.120342333255783*^-58 x^567 + 1.0300854630155115*^-58 x^568 + 2.4237304688463607*^-59 x^569 + 5.332207031461993*^-60 x^570 + 1.0882055166248965*^-60 x^571 + 2.040385343671681*^-61 x^572 + 3.472996329653925*^-62 x^573 + 5.284994414690754*^-63 x^574 + 7.046659219587672*^-64 x^575 + 8.00756729498599*^-65 x^576 + 7.448899809289294*^-66 x^577 + 5.3206427209209234*^-67 x^578 + 2.595435473619963*^-68 x^579 + 6.488588684049908*^-70 x^580
(*^-7) means "times 10^-7" in this notation).

You can get the average by taking the derivative and plugging in x=1. More calculus and algebra can get the higher moments, exactly (I think second moment is f'' - f' or something? I forget). I don't remember the formula for skewness, but it should also be relatively simple derivatives and algebra.

In any case, the odds you do exactly 173 damage over that day is the coefficient of the x^173 term: 0.00310325 x^173.

Frogreaver

2022-03-25, 09:19 AM

You're still sampling from a normal distribution for the expectations :)
(Though once you add in accuracy, it's a normal distribution plus a bunch of zeros, but that's manageable really)

The 0’s may be manageable but they make it into a non-normal distribution. You can’t just use normal distribution probability on a non-normal distribution.

strangebloke

2022-03-25, 09:28 AM

I agree with this completely. While I think that using averages to understand DPR of a build is perfectly acceptable, I also don't think it's the only way to judge a build's overall effectiveness. If we did that, Wizards would clearly be the worst class, and we know that isn't true.

Right, and what I think is missed here is that DPR calcs do help you to understand aspects of the game. They highlight how truly dependent on short rests monks are, even compared with classes like Warlock or fighter. DPR calcs can show how much magic items can impact game balance.

Nobody is saying its all that matters. Heck, most of the discussions around DPR are framed form the perspective of "this will be fun to do."

But ultimately I think the question the anti-math people have to answer is, if Math is a completely terrible approach to determining how a character will play, how are so many of us doing that and being happy with the results? If it was a flawed approach, surely this forum would be full of people complaining about how GWM doesn't actually work and how its really terrible and doesn't let you deal massive damage like everyone said it did...

...instead, overwhelmingly, people agree with DPR calcs as being reflective of how things are in real play. And if it isn't, they'll point out what about the model is flawed.

Frogreaver

2022-03-25, 09:35 AM

Naw, the central limit theorem is gonna win by n=20 in my experience.

We can go all out and do a full counting polynomial.
This describes a single attack with a 50% hit 5% crit 1d12+5 damage:
.5 + .45 *( (x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^10+x^11+x^12) * x^5 / 12 ) + .05 * ( ((x+x^2+x^3+x^4+x^5+x^6+x^7+x^8+x^9+x^10+x^11+x^12 )/12)^2 * x^5 )

Using wolfram alpha, we get:
0.000347222 x^29 + 0.000694444 x^28 + 0.00104167 x^27 + 0.00138889 x^26 + 0.00173611 x^25 + 0.00208333 x^24 + 0.00243056 x^23 + 0.00277778 x^22 + 0.003125 x^21 + 0.00347222 x^20 + 0.00381944 x^19 + 0.00416667 x^18 + 0.0413194 x^17 + 0.0409722 x^16 + 0.040625 x^15 + 0.0402778 x^14 + 0.0399306 x^13 + 0.0395833 x^12 + 0.0392361 x^11 + 0.0388889 x^10 + 0.0385417 x^9 + 0.0381944 x^8 + 0.0378472 x^7 + 0.0375 x^6 + 0.5
(25 terms)
which is the probability of each result times x^(damage of that result).

We can then raise this to the power 20 to describe *every single possibility*, resulting in a 576 term polynomial.

9.5367431640625*^-7 + 1.430511474609375*^-6 x^6 + 1.4437569512261284*^-6 x^7 + 1.4570024278428819*^-6 x^8 + 1.4702479044596355*^-6 x^9 + 1.483493381076389*^-6 x^10 + 1.4967388576931423*^-6 x^11 + 2.5292237599690757*^-6 x^12 + 3.580583466423882*^-6 x^13 + 4.650905360410242*^-6 x^14 + 5.740276825280837*^-6 x^15 + 6.848785244388346*^-6 x^16 + 7.976518001085445*^-6 x^17 + 8.151708745662076*^-6 x^18 + 0.0000102217 x^19 + 0.0000127955 x^20 + 0.000015886 x^21 + 0.0000195066 x^22 + 0.0000236704 x^23 + 0.0000264987 x^24 + 0.0000301384 x^25 + 0.00003476 x^26 + 0.0000405395 x^27 + 0.0000476588 x^28 + 0.0000563057 x^29 + 0.0000653328 x^30 + 0.0000749738 x^31 + 0.0000854799 x^32 + 0.0000971608 x^33 + 0.000110374 x^34 + 0.000125526 x^35 + 0.000143515 x^36 + 0.000163298 x^37 + 0.000184843 x^38 + 0.000208173 x^39 + 0.000233382 x^40 + 0.000260638 x^41 + 0.000291399 x^42 + 0.000325741 x^43 + 0.00036366 x^44 + 0.000405082 x^45 + 0.000449868 x^46 + 0.000497823 x^47 + 0.000549547 x^48 + 0.000605561 x^49 + 0.00066632 x^50 + 0.000732159 x^51 + 0.000803274 x^52 + 0.000879685 x^53 + 0.000961098 x^54 + 0.00104775 x^55 + 0.00113998 x^56 + 0.0012382 x^57 + 0.00134286 x^58 + 0.00145437 x^59 + 0.00157262 x^60 + 0.00169751 x^61 + 0.001829 x^62 + 0.00196716 x^63 + 0.00211216 x^64 + 0.00226435 x^65 + 0.00242391 x^66 + 0.00259087 x^67 + 0.00276517 x^68 + 0.00294666 x^69 + 0.00313516 x^70 + 0.00333055 x^71 + 0.00353285 x^72 + 0.00374203 x^73 + 0.00395801 x^74 + 0.00418061 x^75 + 0.00440954 x^76 + 0.00464439 x^77 + 0.00488479 x^78 + 0.00513043 x^79 + 0.005381 x^80 + 0.0056362 x^81 + 0.00589566 x^82 + 0.00615892 x^83 + 0.00642545 x^84 + 0.00669471 x^85 + 0.0069661 x^86 + 0.00723909 x^87 + 0.00751311 x^88 + 0.00778761 x^89 + 0.00806199 x^90 + 0.00833561 x^91 + 0.00860779 x^92 + 0.00887785 x^93 + 0.00914509 x^94 + 0.00940884 x^95 + 0.00966846 x^96 + 0.00992329 x^97 + 0.0101727 x^98 + 0.010416 x^99 + 0.0106525 x^100 + 0.0108816 x^101 + 0.0111026 x^102 + 0.0113149 x^103 + 0.0115179 x^104 + 0.0117112 x^105 + 0.0118941 x^106 + 0.0120661 x^107 + 0.0122268 x^108 + 0.0123757 x^109 + 0.0125123 x^110 + 0.0126364 x^111 + 0.0127476 x^112 + 0.0128456 x^113 + 0.0129302 x^114 + 0.0130011 x^115 + 0.0130581 x^116 + 0.0131012 x^117 + 0.0131303 x^118 + 0.0131453 x^119 + 0.0131463 x^120 + 0.0131332 x^121 + 0.0131063 x^122 + 0.0130656 x^123 + 0.0130113 x^124 + 0.0129436 x^125 + 0.0128629 x^126 + 0.0127693 x^127 + 0.0126633 x^128 + 0.0125451 x^129 + 0.0124153 x^130 + 0.0122742 x^131 + 0.0121222 x^132 + 0.01196 x^133 + 0.0117879 x^134 + 0.0116064 x^135 + 0.0114162 x^136 + 0.0112178 x^137 + 0.0110118 x^138 + 0.0107987 x^139 + 0.0105791 x^140 + 0.0103536 x^141 + 0.0101229 x^142 + 0.00988744 x^143 + 0.00964791 x^144 + 0.00940488 x^145 + 0.00915892 x^146 + 0.00891062 x^147 + 0.00866053 x^148 + 0.00840921 x^149 + 0.0081572 x^150 + 0.007905 x^151 + 0.00765314 x^152 + 0.0074021 x^153 + 0.00715233 x^154 + 0.0069043 x^155 + 0.00665841 x^156 + 0.00641507 x^157 + 0.00617466 x^158 + 0.00593753 x^159 + 0.00570401 x^160 + 0.00547442 x^161 + 0.00524901 x^162 + 0.00502807 x^163 + 0.00481181 x^164 + 0.00460045 x^165 + 0.00439417 x^166 + 0.00419313 x^167 + 0.00399747 x^168 + 0.00380731 x^169 + 0.00362274 x^170 + 0.00344385 x^171 + 0.00327067 x^172 + 0.00310325 x^173 + 0.00294161 x^174 + 0.00278574 x^175 + 0.00263562 x^176 + 0.00249123 x^177 + 0.00235252 x^178 + 0.00221943 x^179 + 0.00209189 x^180 + 0.00196981 x^181 + 0.0018531 x^182 + 0.00174165 x^183 + 0.00163536 x^184 + 0.0015341 x^185 + 0.00143775 x^186 + 0.00134618 x^187 + 0.00125925 x^188 + 0.00117681 x^189 + 0.00109874 x^190 + 0.00102487 x^191 + 0.00095506 x^192 + 0.000889165 x^193 + 0.000827032 x^194 + 0.000768512 x^195 + 0.000713456 x^196 + 0.000661715 x^197 + 0.000613145 x^198 + 0.0005676 x^199 + 0.000524939 x^200 + 0.000485024 x^201 + 0.000447717 x^202 + 0.000412886 x^203 + 0.000380402 x^204 + 0.00035014 x^205 + 0.000321977 x^206 + 0.000295797 x^207 + 0.000271486 x^208 + 0.000248935 x^209 + 0.000228038 x^210 + 0.000208695 x^211 + 0.00019081 x^212 + 0.00017429 x^213 + 0.000159047 x^214 + 0.000144997 x^215 + 0.000132061 x^216 + 0.000120163 x^217 + 0.000109231 x^218 + 0.0000991976 x^219 + 0.0000899984 x^220 + 0.0000815731 x^221 + 0.0000738647 x^222 + 0.0000668195 x^223 + 0.0000603874 x^224 + 0.0000545212 x^225 + 0.0000491766 x^226 + 0.0000443126 x^227 + 0.0000398904 x^228 + 0.0000358743 x^229 + 0.0000322307 x^230 + 0.0000289286 x^231 + 0.0000259392 x^232 + 0.0000232357 x^233 + 0.0000207933 x^234 + 0.0000185891 x^235 + 0.0000166021 x^236 + 0.0000148127 x^237 + 0.0000132029 x^238 + 0.0000117563 x^239 + 0.0000104577 x^240 + 9.293140926254316*^-6 x^241 + 8.24997896380197*^-6 x^242 + 7.316519008577054*^-6 x^243 + 6.482112850243499*^-6 x^244 + 5.737043563492147*^-6 x^245 + 5.072455341016117*^-6 x^246 + 4.48028741291018*^-6 x^247 + 3.953211941240733*^-6 x^248 + 3.48457576671896*^-6 x^249 + 3.0683458747873583*^-6 x^250 + 2.6990584408241408*^-6 x^251 + 2.3717713084107174*^-6 x^252 + 2.0820197505284838*^-6 x^253 + 1.8257753609907625*^-6 x^254 + 1.5994079222168811*^-6 x^255 + 1.399650095467634*^-6 x^256 + 1.223564780741543*^-6 x^257 + 1.0685149955442654*^-6 x^258 + 9.321361245623013*^-7 x^259 + 8.12310395778534*^-7 x^260 + 7.071434426511068*^-7 x^261 + 6.149428165373187*^-7 x^262 + 5.34198318487006*^-7 x^263 + 4.63564024769685*^-7 x^264 + 4.018418859586919*^-7 x^265 + 3.479667850035489*^-7 x^266 + 3.00992945416228*^-7 x^267 + 2.600815864225624*^-7 x^268 + 2.2448972763826144*^-7 x^269 + 1.935600514776457*^-7 x^270 + 1.6671173705609565*^-7 x^271 + 1.4343218477352792*^-7 x^272 + 1.2326955604029727*^-7 x^273 + 1.0582605770773995*^-7 x^274 + 9.075190567634996*^-8 x^275 + 7.773990686237697*^-8 x^276 + 6.65206031989987*^-8 x^277 + 5.685792562478165*^-8 x^278 + 4.8545310066267896*^-8 x^279 + 4.140223125197978*^-8 x^280 + 3.527111380277214*^-8 x^281 + 3.001458343093102*^-8 x^282 + 2.5513024251910847*^-8 x^283 + 2.1662411173588164*^-8 x^284 + 1.8372389084920687*^-8 x^285 + 1.5564573126308905*^-8 x^286 + 1.3171046695792045*^-8 x^287 + 1.1133036036985586*^-8 x^288 + 9.399742275234198*^-9 x^289 + 7.927313626865803*^-9 x^290 + 6.677942211834463*^-9 x^291 + 5.6190714615655485*^-9 x^292 + 4.722701540507356*^-9 x^293 + 3.96478150062318*^-9 x^294 + 3.324678071477075*^-9 x^295 + 2.784712063050503*^-9 x^296 + 2.329754332039072*^-9 x^297 + 1.9468741428391694*^-9 x^298 + 1.6250335491005577*^-9 x^299 + 1.3548221375708616*^-9 x^300 + 1.1282271195824373*^-9 x^301 + 9.384343331314253*^-10 x^302 + 7.796562358835452*^-10 x^303 + 6.469834320374607*^-10 x^304 + 5.362566888341279*^-10 x^305 + 4.4395676630379903*^-10 x^306 + 3.6710971092316374*^-10 x^307 + 3.0320555420742483*^-10 x^308 + 2.5012861455774724*^-10 x^309 + 2.060978282897372*^-10 x^310 + 1.6961573676126202*^-10 x^311 + 1.3942493370467816*^-10 x^312 + 1.1447093280238661*^-10 x^313 + 9.38705525418812*^-11 x^314 + 7.688503554368899*^-11 x^315 + 6.289722476593108*^-11 x^316 + 5.139221095812516*^-11 x^317 + 4.194094599719777*^-11 x^318 + 3.41863866651731*^-11 x^319 + 2.7831794250780506*^-11 x^320 + 2.26308681746402*^-11 x^321 + 1.8379437628380306*^-11 x^322 + 1.4908474852126417*^-11 x^323 + 1.207822792432144*^-11 x^324 + 9.773300486379445*^-12 x^325 + 7.898531276327435*^-12 x^326 + 6.375548233074845*^-12 x^327 + 5.139890725154496*^-12 x^328 + 4.1386095665790106*^-12 x^329 + 3.328268268952683*^-12 x^330 + 2.673280759142703*^-12 x^331 + 2.1445308416018926*^-12 x^332 + 1.7182272443995723*^-12 x^333 + 1.3749553674530828*^-12 x^334 + 1.0988930321202385*^-12 x^335 + 8.771627707676656*^-13 x^336 + 6.99297629493282*^-13 x^337 + 5.568012045869039*^-13 x^338 + 4.427857950564299*^-13 x^339 + 3.5167521696598517*^-13 x^340 + 2.7896106544668706*^-13 x^341 + 2.2100309137366296*^-13 x^342 + 1.7486593690833303*^-13 x^343 + 1.3818579441883907*^-13 x^344 + 1.0906165682879539*^-13 x^345 + 8.59667483571693*^-14 x^346 + 6.767649192428121*^-14 x^347 + 5.3210007833887767*^-14 x^348 + 4.178266855225572*^-14 x^349 + 3.276767412170475*^-14 x^350 + 2.5664976846922795*^-14 x^351 + 2.0076184912623173*^-14 x^352 + 1.56843230735049*^-14 x^353 + 1.2237533353652774*^-14 x^354 + 9.535967221631139*^-15 x^355 + 7.421259177447781*^-15 x^356 + 5.7680852837777605*^-15 x^357 + 4.477403220471452*^-15 x^358 + 3.471046538167882*^-15 x^359 + 2.6874079254332724*^-15 x^360 + 2.0779969663179724*^-15 x^361 + 1.6046991074068988*^-15 x^362 + 1.2375960752640735*^-15 x^363 + 9.532351899335355*^-16 x^364 + 7.332570643355548*^-16 x^365 + 5.633090144737274*^-16 x^366 + 4.3218591286965566*^-16 x^367 + 3.3115183842422404*^-16 x^368 + 2.53405234604367*^-16 x^369 + 1.936578132536299*^-16 x^370 + 1.4780348328530114*^-16 x^371 + 1.1265842703482872*^-16 x^372 + 8.575732389870605*^-17 x^373 + 6.519381925281671*^-17 x^374 + 4.9495809063081335*^-17 x^375 + 3.75280805186755*^-17 x^376 + 2.841641616771242*^-17 x^377 + 2.148851386489075*^-17 x^378 + 1.6228062135097*^-17 x^379 + 1.2239092402956668*^-17 x^380 + 9.218347745111055*^-18 x^381 + 6.933895894870583*^-18 x^382 + 5.208598975011174*^-18 x^383 + 3.9073552671894174*^-18 x^384 + 2.9272663229482903*^-18 x^385 + 2.1900694787002797*^-18 x^386 + 1.636322164221359*^-18 x^387 + 1.220938869530597*^-18 x^388 + 9.09771055964057*^-19 x^389 + 6.769900412545019*^-19 x^390 + 5.030872028471522*^-19 x^391 + 3.7334807787176676*^-19 x^392 + 2.7668972915027795*^-19 x^393 + 2.047761671789294*^-19 x^394 + 1.513462949496859*^-19 x^395 + 1.1170405042374278*^-19 x^396 + 8.233215761002825*^-20 x^397 + 6.059994038365553*^-20 x^398 + 4.454261076557436*^-20 x^399 + 3.269478825597054*^-20 x^400 + 2.3965106562167057*^-20 x^401 + 1.7541903979989447*^-20 x^402 + 1.2822394844851456*^-20 x^403 + 9.359552896959526*^-21 x^404 + 6.822335214684362*^-21 x^405 + 4.9659394466652046*^-21 x^406 + 3.609595867654517*^-21 x^407 + 2.620009606869745*^-21 x^408 + 1.8990335435155193*^-21 x^409 + 1.3745043087763439*^-21 x^410 + 9.934397216334329*^-22 x^411 + 7.169965541334219*^-22 x^412 + 5.167385868485304*^-22 x^413 + 3.7187852203303723*^-22 x^414 + 2.6724262841851514*^-22 x^415 + 1.9177094982850487*^-22 x^416 + 1.374137890825717*^-22 x^417 + 9.832096644468629*^-23 x^418 + 7.024706084643299*^-23 x^419 + 5.0115744918870915*^-23 x^420 + 3.570113316540782*^-23 x^421 + 2.5395063532774635*^-23 x^422 + 1.803739255030021*^-23 x^423 + 1.2792427496143703*^-23 x^424 + 9.059087856735303*^-24 x^425 + 6.405687780679738*^-24 x^426 + 4.522663522494608*^-24 x^427 + 3.188360148215044*^-24 x^428 + 2.2443074560939618*^-24 x^429 + 1.5773804638800508*^-24 x^430 + 1.106947086181906*^-24 x^431 + 7.75622981252834*^-25 x^432 + 5.426313916317755*^-25 x^433 + 3.7904132646683535*^-25 x^434 + 2.643580292054428*^-25 x^435 + 1.840854895255871*^-25 x^436 + 1.2798658608353227*^-25 x^437 + 8.88431316034395*^-26 x^438 + 6.15735714505695*^-26 x^439 + 4.260616128067777*^-26 x^440 + 2.9434350218387775*^-26 x^441 + 2.0301905495166687*^-26 x^442 + 1.3980273972296186*^-26 x^443 + 9.611412803056387*^-27 x^444 + 6.59701520956805*^-27 x^445 + 4.520561573130122*^-27 x^446 + 3.0925574858528905*^-27 x^447 + 2.112123933188634*^-27 x^448 + 1.4401007174355041*^-27 x^449 + 9.802429015550475*^-28 x^450 + 6.660967575126378*^-28 x^451 + 4.5185489795164195*^-28 x^452 + 3.0599469448934825*^-28 x^453 + 2.068602611905306*^-28 x^454 + 1.3959935942212037*^-28 x^455 + 9.4043276045205*^-29 x^456 + 6.324186780876476*^-29 x^457 + 4.2453035371993444*^-29 x^458 + 2.844685701043371*^-29 x^459 + 1.9027228197945745*^-29 x^460 + 1.2703591384785837*^-29 x^461 + 8.466056359981016*^-30 x^462 + 5.6316178566561436*^-30 x^463 + 3.739178040995043*^-30 x^464 + 2.478012220231129*^-30 x^465 + 1.639110678878634*^-30 x^466 + 1.0821405458077426*^-30 x^467 + 7.130540248990159*^-31 x^468 + 4.689399133881582*^-31 x^469 + 3.0779415697049183*^-31 x^470 + 2.016248917365368*^-31 x^471 + 1.3181368360839425*^-31 x^472 + 8.600053558036529*^-32 x^473 + 5.59960790376361*^-32 x^474 + 3.638489489062847*^-32 x^475 + 2.3592972272960122*^-32 x^476 + 1.5266269548748858*^-32 x^477 + 9.857399567024466*^-33 x^478 + 6.35127765473641*^-33 x^479 + 4.083372258591749*^-33 x^480 + 2.619542826848528*^-33 x^481 + 1.6767560153617348*^-33 x^482 + 1.070880290422096*^-33 x^483 + 6.823813387862281*^-34 x^484 + 4.338270487624973*^-34 x^485 + 2.7516747081030256*^-34 x^486 + 1.7412287955718012*^-34 x^487 + 1.0992076622483086*^-34 x^488 + 6.922376874827756*^-35 x^489 + 4.3487884553002635*^-35 x^490 + 2.7252374585051253*^-35 x^491 + 1.7035245948828885*^-35 x^492 + 1.062147636758299*^-35 x^493 + 6.605377654267229*^-36 x^494 + 4.0970378687260914*^-36 x^495 + 2.5344545668952177*^-36 x^496 + 1.5635912763305206*^-36 x^497 + 9.619831369697976*^-37 x^498 + 5.901980072749858*^-37 x^499 + 3.6107205590777038*^-37 x^500 + 2.2025961458298863*^-37 x^501 + 1.3396732199069484*^-37 x^502 + 8.12387321994808*^-38 x^503 + 4.911384899784073*^-38 x^504 + 2.9600339157811503*^-38 x^505 + 1.7783430190175944*^-38 x^506 + 1.0649608116163369*^-38 x^507 + 6.356570545480698*^-39 x^508 + 3.781411018256971*^-39 x^509 + 2.2417950960558013*^-39 x^510 + 1.3243925972693534*^-39 x^511 + 7.796193462020704*^-40 x^512 + 4.572546055891115*^-40 x^513 + 2.671812767925846*^-40 x^514 + 1.5552040336180613*^-40 x^515 + 9.016980398247898*^-41 x^516 + 5.206958512167849*^-41 x^517 + 2.9944074091114816*^-41 x^518 + 1.7147210830653204*^-41 x^519 + 9.776451681563059*^-42 x^520 + 5.5490751335704365*^-42 x^521 + 3.135124128550289*^-42 x^522 + 1.7628826906030847*^-42 x^523 + 9.864217244607313*^-43 x^524 + 5.491668944590179*^-43 x^525 + 3.0414185266267573*^-43 x^526 + 1.675332720925059*^-43 x^527 + 9.176918404816086*^-44 x^528 + 4.997766633298889*^-44 x^529 + 2.7054828329753123*^-44 x^530 + 1.4554695250927653*^-44 x^531 + 7.779353247418055*^-45 x^532 + 4.130002186595516*^-45 x^533 + 2.1772020325485196*^-45 x^534 + 1.1393413649381201*^-45 x^535 + 5.916560181301456*^-46 x^536 + 3.0477961388660633*^-46 x^537 + 1.556792423516322*^-46 x^538 + 7.881618850188048*^-47 x^539 + 3.953078835439981*^-47 x^540 + 1.963219568901592*^-47 x^541 + 9.648846896543735*^-48 x^542 + 4.6902497714169663*^-48 x^543 + 2.2534710447353033*^-48 x^544 + 1.0694120590993741*^-48 x^545 + 5.009066206719119*^-49 x^546 + 2.313919145403321*^-49 x^547 + 1.0533183256249055*^-49 x^548 + 4.720741092607062*^-50 x^549 + 2.08111803614089*^-50 x^550 + 9.015562742120478*^-51 x^551 + 3.8339569312719637*^-51 x^552 + 1.598753117679621*^-51 x^553 + 6.52960455564495*^-52 x^554 + 2.6086844596329712*^-52 x^555 + 1.018134598771066*^-52 x^556 + 3.8762697647744647*^-53 x^557 + 1.437387535123015*^-53 x^558 + 5.1826576678260274*^-54 x^559 + 1.8136345333498934*^-54 x^560 + 6.147323774860345*^-55 x^561 + 2.0136715013535133*^-55 x^562 + 6.358788977848527*^-56 x^563 + 1.930321921460901*^-56 x^564 + 5.615451891515952*^-57 x^565 + 1.5598447522224016*^-57 x^566 + 4.120342333255783*^-58 x^567 + 1.0300854630155115*^-58 x^568 + 2.4237304688463607*^-59 x^569 + 5.332207031461993*^-60 x^570 + 1.0882055166248965*^-60 x^571 + 2.040385343671681*^-61 x^572 + 3.472996329653925*^-62 x^573 + 5.284994414690754*^-63 x^574 + 7.046659219587672*^-64 x^575 + 8.00756729498599*^-65 x^576 + 7.448899809289294*^-66 x^577 + 5.3206427209209234*^-67 x^578 + 2.595435473619963*^-68 x^579 + 6.488588684049908*^-70 x^580
(*^-7) means "times 10^-7" in this notation).

You can get the average by taking the derivative and plugging in x=1. More calculus and algebra can get the higher moments, exactly (I think second moment is f'' - f' or something? I forget). I don't remember the formula for skewness, but it should also be relatively simple derivatives and algebra.

In any case, the odds you do exactly 173 damage over that day is the coefficient of the x^173 term: 0.00310325 x^173.

I just plotted in any dice. I used the dice {0,0,0,0,6,7,8,9,10,11} to represent the potentials results of rolling 1d6+5 at 60% chance to hit and added that together 20 times.

The result was a distribution with a mean of 102 which is 12*8.5 as expected. This mean does correspond to the most frequent value of the distribution as expected. However, the distribution only has a range of 0-220. Meaning the mean is not in the middle and the distribution and is thus not actually symmetrical around the mean. There’s a longer tail to the right of the mean than to the left.

So it’s not a actually a normal distribution, but it is fairly close.

KorvinStarmast

2022-03-25, 10:27 AM

...instead, overwhelmingly, people agree with DPR calcs as being reflective of how things are in real play. And if it isn't, they'll point out what about the model is flawed. Using the white room analysis to declare that X martial feat is OP is where my distaste for overreliance on DPR figuring begins ... and it's a symptom of a 'martials can't have nice things' attitude.

I have watched SS, as but one example, in play provide both big nova spikes of damage and be an utter disaster when the rogue/archer kept missing.
During the same session.

I do understand how advantage helps, so declaring me 'anti math' due to my distaste for the DPR zealotry is a mistake.

Here's the deal. I am a good teammate. I got with my nephew before the next session and we worked through ways I could set up advantage for his PC; and of course that made for more nova spikes, and that was an artifact of teamwork, not this empty DPR au outrance stuff. I say that again, we made it an explicit point to work together to leverage the potential for nova, not the guarantee of some average damage accrual.

Elder_Basilisk

2022-03-25, 10:51 AM

DPR analysis is very helpful for analyzing exactly the scenarios you described though:

Is the big nova DPR spike the result of teamwork or lucky rolls? How much was the average value with and without the teamwork? How does it compare to both characters just doing their best thing independently and not trying to work together?

And for the other, was the epic SS fail just the result of unlucky rolls or was SS actually a bad call in that scenario? You can compare DPR for the same actions with and without SS and see.

In actual play, sometimes you roll well and sometimes you roll poorly, that's the game. But if you want to understand whether it was a good idea that didn't work out or a bad idea that went pretty much as expected, you will probably want to apply some theory.

Frogreaver

2022-03-25, 11:17 AM

Using the white room analysis to declare that X martial feat is OP is where my distaste for overreliance on DPR figuring begins ... and it's a symptom of a 'martials can't have nice things' attitude.

I have watched SS, as but one example, in play provide both big nova spikes of damage and be an utter disaster when the rogue/archer kept missing.
During the same session.

I do understand how advantage helps, so declaring me 'anti math' due to my distaste for the DPR zealotry is a mistake.

Here's the deal. I am a good teammate. I got with my nephew before the next session and we worked through ways I could set up advantage for his PC; and of course that made for more nova spikes, and that was an artifact of teamwork, not this empty DPR au outrance stuff. I say that again, we made it an explicit point to work together to leverage the potential for nova, not the guarantee of some average damage accrual.

Do you know the reason teamwork isn’t typically accounted for in traditional DPR comparisons?

PhantomSoul

2022-03-25, 11:45 AM

Do you know the reason teamwork isn’t typically accounted for in traditional DPR comparisons?

People are building characters and comparing characters, rather than parties! (But even then, there are often caveats about how things are very different with a reliable source of advantage... which is implicitly a comment about teamwork potential!)

JNAProductions

2022-03-25, 11:56 AM

Using the white room analysis to declare that X martial feat is OP is where my distaste for overreliance on DPR figuring begins ... and it's a symptom of a 'martials can't have nice things' attitude.

I have watched SS, as but one example, in play provide both big nova spikes of damage and be an utter disaster when the rogue/archer kept missing.
During the same session.

I do understand how advantage helps, so declaring me 'anti math' due to my distaste for the DPR zealotry is a mistake.

Here's the deal. I am a good teammate. I got with my nephew before the next session and we worked through ways I could set up advantage for his PC; and of course that made for more nova spikes, and that was an artifact of teamwork, not this empty DPR au outrance stuff. I say that again, we made it an explicit point to work together to leverage the potential for nova, not the guarantee of some average damage accrual.

The issue with that is "Sharpshooter Rogue".

Rogues can benefit a LOT from that feat... But the -5/+10 part of it is generally not worth it.

At level five, you're better off not using Sharpshooter at AC 14+, assuming 16 Dex.
At level eleven, you're better off not using Sharpshooter at AC 10+, assuming 18 Dex.
At level seventeen, you're better off not using Sharpshooter at AC 11+, assuming 20 Dex.

Spreadsheet (https://docs.google.com/spreadsheets/d/161wJ_WmQLBz-UbhfZTx89XyUgTQdLgH5gJKJqDBn4D8/edit?usp=sharing) for my math, assumed use of a shortbow.

Unoriginal

2022-03-25, 12:02 PM

I'm seeing a lot of "Don't do any mathematical calculations before sending a rocket to space. Those calculations are just theories. What you really need to do is build let engineers just build rockets and send them to space if you want to get real info."

Now, I won't deny that theory has been wrong and that actually sending rockets to space has taught us much. In fact, sometimes tragic lessons learned often influence theory. However, it would completely silly to say the theory has no value.

I expect game designers to understand theory and run thorough simulations on different game builds. Fortunately, since creating a game has less consequences than sending people to space on rockets, game designers should also play test things. However, the fact they will eventually play test things does not invalidate the theory. Math and theory is the foundation of so many technological fields and advances we've seen in our lives. It's flat wrong to say that the math and theory is wasted, pointless, incorrect, what have you.

In short, if you have 2 classes with essentially the same role to mean creature with pointy stick, and the math says that one will on average hurt the mean creature 30% more, that is meaningful and should be fixed before it ever even reaches playtesting.

It's more "don't do any calculations using pre-Newtonian mathematics before sending a rocket to space.

Take your example: even if we imagine Class A and Class By as strictly being all about hurting creatures with pointy sticks, what if Class A does 60% more damage to creatures with an AC of 13 or less, while Class B does 60% more damage to creatures with an AC of 17 or more, with both of them doing equivalent damage against AC 15-16? Should it have been fixed before reaching playtesting?

Now what if Class A can only do damage to hostile creatures, and does 30% more than Class B, but Class B can impose the blinded condition as part of their attack? Should it have been fixed before reaching playtesting, too?

What if both A and B do the same damage, but A can negate one attack per turn while B is in general harder to hit but doesn't have any special protection if hit?

What if both A and B do the same damage, but A requires a Concentration effect they can do three times per long rest, while B just needs to use a two-handed weapon rather than a shield?

Etc.

Damage calculations are way more complex than one expect, and even then DPR is only a *part* of a class's toolkit. Saying that one class hurting a mean creature 30% more than its counterpart should have been fixed before playtest is ignoring how the rest of the toolkit exists and matters.

It also ignores that there is no "mean creature" in D&D (unless "mean" is to described their mindset rather than the mathematical meaning of the word), and that depending of circumstances you may be facing easily-hit-but-tough ogres or kiting goblin archers or little para-elementals made of magma, which will all have vastly different reaction to the toolkit the different classes bring.

Do you know the reason teamwork isn’t typically accounted for in traditional DPR comparisons?

Because it's very hard to account for it?

Frogreaver

2022-03-25, 12:06 PM

People are building characters and comparing characters, rather than parties! (But even then, there are often caveats about how things are very different with a reliable source of advantage... which is implicitly a comment about teamwork potential!)

I’d say parties are mostly impossible to compare because the decision tree becomes far too complex due primarily to how magic works and we don’t actually control the other characters.

I mean it’s hard enough to get people to see the potential in the dodge action or in combat healing or grappling with a tanky character and those are relatively simple alternate actions

Because it's very hard to account for it?

It’s not really ‘difficult’. It’s more that there are countless variations for what characters can do. That variation leads toward significantly varied results depending on the party, their spells, and their round to round decisions.

So when it comes to party considerations there’s just not a clear answer anymore.

MoiMagnus

2022-03-25, 12:09 PM

Do you know the reason teamwork isn’t typically accounted for in traditional DPR comparisons?

Teamwork doesn't work well if you come with the mindset of expecting demanding some specific bonuses to be granted to your character. When optimising a team, unilateral decisions tends to make you a jerk.

It make sense that when discussing optimisation on a forum, peoples try to avoid accounting for external factors like:

Teamwork, both with PCs and NPCs.
Homebrew or weird GM rulings
Magic objects to a certain degree (Some peoples try to take them into account, but you can't really take them in account fully. For example, the DPR comparisons rarely include the possibility of finding one of those books that permanently increase your ability scores.)

KorvinStarmast

2022-03-25, 12:15 PM

I’d say parties are mostly impossible to compare because the decision tree becomes far too complex due primarily to how magic works and we don’t actually control the other characters. We tried to put together a party optimization thread and it didn't get very far. Our problem was in trying to establish the rubric/criteria, but it was a bit of an interesting discussion.
I proposed that each optimization case needed to be bounded by party size.
One case for 3, one for4, one for 5, but we also had to account for a number of other variables (published adventure, how much exploration pillar, how much social pillar, how much combat pillar) and as you mention, the number of variables gets to be unwieldy.

For JNA: Rogue/ranger MC, retired at the beginning of Rogue(Scout) 5 Ranger 3(Hunter) for RP/Player preference reasons.

JNAProductions

2022-03-25, 12:18 PM

For JNA: Rogue/ranger MC, retired a Rogue 4 Ranger 3 for RP/Player preference reasons.

You can still use the spreadsheet to see what ACs are better to use it at. I get a feeling it still won't be very high.

KorvinStarmast

2022-03-25, 12:20 PM

You can still use the spreadsheet to see what ACs are better to use it at. I get a feeling it still won't be very high. That's not very useful if you aren't fighting ACs. Our DM didn't reveal AC's until round two or three, if ever. We had to find out by guesswork. We didn't need a spread sheet to know that hitting is the most important thing to do, hence my deliberate search for advantage yielding choices.

(Also, the no cover is a great feature of that feat, as is the no disad at long range).

PhantomSoul

2022-03-25, 12:21 PM

That's not very useful if you aren't fighting ACs. Our DM didn't reveal AC's until round two or three, if ever. We had to find out by guesswork.

All the better -- probably more fun that way instead of strictly just fighting a mechanic with another mechanic with no discovery or uncertainty! :)

JNAProductions

2022-03-25, 12:23 PM

That's not very useful if you aren't fighting ACs. Our DM didn't reveal AC's until round two or three, if ever. We had to find out by guesswork. We didn't need a spread sheet to know that hitting is the most important thing to do, hence my deliberate search for advantage yielding choices.

Is the opponent in Chain Mail or better? Probably not worth using -5/+10.
Is the opponent a sluggish sponge of HP, a blob who's main defense is gads of hit points? Use it.

Also, given how little extra damage you actually get on average (plus potential overkill) it's better to err on the side of not using it for Rogues.

I agree you generally won't have perfect info, but you can still make an educated guess.

Edit: Yeah, Sharpshooter is good for Rogues. Just not for the -5/+10 bit.

Tanarii

2022-03-25, 12:28 PM

I just plotted in any dice. I used the dice {0,0,0,0,6,7,8,9,10,11} to represent the potentials results of rolling 1d6+5 at 60% chance to hit and added that together 20 times.

The result was a distribution with a mean of 102 which is 12*8.5 as expected. This mean does correspond to the most frequent value of the distribution as expected. However, the distribution only has a range of 0-220. Meaning the mean is not in the middle and the distribution and is thus not actually symmetrical around the mean. There’s a longer tail to the right of the mean than to the left.

So it’s not a actually a normal distribution, but it is fairly close.
https://anydice.com/program/27d82
output 20d((1d20 >= 9)*(1d6+5))
Avg 102.00 / Std Dev 19.54

PhoenixPhyre

2022-03-25, 12:34 PM

It’s not really ‘difficult’. It’s more that there are countless variations for what characters can do. That variation leads toward significantly varied results depending on the party, their spells, and their round to round decisions.

So when it comes to party considerations there’s just not a clear answer anymore.

So if I model a cow as a uniform spherical mass in a vacuum...

And this is the core of the issue. Claiming that your model measures what matters while leaving out the largest sources of variation because "computationally intractable" or because there isn't a "clear answer" if you include it is the same as saying that your model doesn't accurately model the space.

And this is a major problem in the world of mathematical modeling. Knowing what things you can simplify and which you can't, which approximations you can make safely is the entire hard part. And the entire valuable part.

My view of DPR calculations is that while they give clear answers, those answers aren't actually applicable to real play. And don't provide actionable information outside of a white room. As soon as you introduce back in any of the rejected complexity, those clear answers and actionable information go away. Leaving you having wasted most of your time and gotten a false impression of the actual space at issue.

I've done the white-room analysis for a lot of my homebrew. But when it hit the table, it turned out that none of the math that relies on those artificial and constraining assumptions actually meant anything. It doesn't matter if you can prove that X does 0.5 DPR more than Y in a certain set of circumstances if
1) the natural, inescapable, non-random (ie not caused by the dice, so it's not gotten rid of by taking more samples) variation in the system produces swings of +- 20 DPR in ways that aren't correlated with X or Y
2) that set of circumstances only comes up when more DPR doesn't matter. And there are lots of cases where DPR just flat doesn't matter. Any encounter more complex than "we fight until one of us is dead" (which should be the vast majority of encounters in a good game) is only mildly sensitive to changes in DPR.

In my experience now running multiple games a week for a range of players over the last 6-7 years in 5e, cases #1 and #2 dominate.

DPR analysis can help to reject strong outliers. Cases where X does 100% more (or less) DPR with a "realistic" set of assumptions. But often you can get those same results with 1% of the work. If you like doing DPR math, more power to you. Just don't claim it is necessary. When most of what it does is tell new players that they need to follow the net-builds or they'll be bad and have a bad time.

The wonder of 5e D&D is that balance is about perception, not reality, in the ultra-vast majority of cases. And perception is individual. When the community is full of claims that X is OP (because it does N% more DPR under assumptions ABC), the parentheticals and nuance gets lost really fast. All that remains is a zeitgeist of "X is OP" or "Y is UP". And focuses the playerbase on the things that matter if and only if people put heavy subjective weight on them.

D&D is not engineering. It's closer to an art than a science. Techniques that work really well in one area need to be evaluated before being used in another. More math =/= more knowledge; not all areas of the world are reducible to numbers. And in many cases, throwing math at something that isn't reducible to such numbers in a clean way just makes it worse. It's cargo cult science--having the form thereof but denying the power. I'm speaking as someone who has a PhD in a hard science and who taught science (chemistry and physics) for a decade. It's physics envy--the idea that cloaking your subjective topic in a shroud of math makes it more rigorous by trying to borrow the prestige of the few sciences that really are reducible almost entirely to math[1]. When what it does is provide a veneer of objectivity and conceal all of the real power.

[1] even chemistry isn't really reducible to math in the same way physics is. Because once you start moving beyond the simplest systems, the interactions make the math intractable. So most of chemistry relies on empirical evidence and phenomenology rather than first-principles mathematical analysis. Chemical "laws" are more "well, when we did it, it worked this way. Except for cases X, Y, Z, ..., where it worked very differently." And going beyond chemistry, first-principles analysis is less and less valuable. Even most engineering uses established rules of thumb, heuristics, and books and books of empirically-derived numbers rather than actually doing the real math.

Frogreaver

2022-03-25, 01:05 PM

So if I model a cow as a uniform spherical mass in a vacuum...

And this is the core of the issue. Claiming that your model measures what matters while leaving out the largest sources of variation because "computationally intractable" or because there isn't a "clear answer" if you include it is the same as saying that your model doesn't accurately model the space.

DpR best works when examining fairly similar builds. ThatÂ’s a generally understood principle. The further the builds diverge in terms of range, mobility, survivability, etc the less valuable straight DPR comparisons are.

To highlight this consider a sword and shield Paladin vs a PAM spear and shield Paladin. Most everything is the same (or can be). DPR differences should then be a very meaningful comparison point (not perfect, but useful)

And this is a major problem in the world of mathematical modeling. Knowing what things you can simplify and which you can't, which approximations you can make safely is the entire hard part. And the entire valuable part.

Or which approximations can be useful despite being imperfect.

My view of DPR calculations is that while they give clear answers, those answers aren't actually applicable to real play. And don't provide actionable information outside of a white room. As soon as you introduce back in any of the rejected complexity, those clear answers and actionable information go away. Leaving you having wasted most of your time and gotten a false impression of the actual space at issue.

I donÂ’t think you actually pay attention to actual DPR discussions. Consider a longbow using character doing 15 DPR and a rapier using character with 20 DPR. DPR doesnÂ’t tell us which character is better or will do more damage in a campaign because it doesnÂ’t factor in the melee vs range difference. This is well known and something often discussed. DPR has never been the end all be all comparative tool you act like others believe it is. Even itÂ’s advocates donÂ’t agree with that straw man.

https://anydice.com/program/27d82
output 20d((1d20 >= 9)*(1d6+5))
Avg 102.00 / Std Dev 19.54

Much simpler than my input. Same graph.

Tanarii

2022-03-25, 01:10 PM

Much simpler than my input. Same graph.
Yup the first was my the reason I shared. And the second I wasn't sure about. Good to have confirmed.

ZRN

2022-03-25, 01:12 PM

Take your example: even if we imagine Class A and Class By as strictly being all about hurting creatures with pointy sticks, what if Class A does 60% more damage to creatures with an AC of 13 or less, while Class B does 60% more damage to creatures with an AC of 17 or more, with both of them doing equivalent damage against AC 15-16? Should it have been fixed before reaching playtesting?

The implication here is that it's a bad design policy to rely on average DPR as a primary metric of how "good" a class is. I mean, sure, fine. Are any professional designers I should care about actually doing this?

As a player, precisely because my own rolls are only going to give me a small and unrepresentative sample size to work from, knowing the averages can be helpful in figuring out what classes/powers/weapons/techniques/etc. are effective, and when.

Malkavia

2022-03-25, 01:14 PM

It's more "don't do any calculations using pre-Newtonian mathematics before sending a rocket to space.

Take your example: even if we imagine Class A and Class By as strictly being all about hurting creatures with pointy sticks, what if Class A does 60% more damage to creatures with an AC of 13 or less, while Class B does 60% more damage to creatures with an AC of 17 or more, with both of them doing equivalent damage against AC 15-16? Should it have been fixed before reaching playtesting?

It should be weighed against what the game designer expects the PCs to fight. If we put level 1 targets against something with 40 AC, then the Hexblade will be doing double the damage of anyone else because they can hit on 19-20 instead of just 20. That's of course not what I'm suggesting. However, if the expected AC range is 12-14 at that level, then you do the math for that AC range.

Now what if Class A can only do damage to hostile creatures, and does 30% more than Class B, but Class B can impose the blinded condition as part of their attack? Should it have been fixed before reaching playtesting, too?

What if both A and B do the same damage, but A can negate one attack per turn while B is in general harder to hit but doesn't have any special protection if hit?

We don't disagree here. Comparing DPR is only a measure of DPR, not other utility. I tried to limit my comparison to 2 characters who's primary job is to stab things with pointy sticks.

What if both A and B do the same damage, but A requires a Concentration effect they can do three times per long rest, while B just needs to use a two-handed weapon rather than a shield?

I think a lot of mathy people here actually do a good job of accounting for this by assuming X number of encounters that last for approximately Y number of rounds with Z number of short rests per long rest. The game designers at the very least pretend that there's 6-8 encounters per day with 2-3 short rests. I'll ignore that I think they're way off with that, but that's how they'd base their math (and maybe they did, and this is the source of much of the imbalance we see today).

Damage calculations are way more complex than one expect, and even then DPR is only a *part* of a class's toolkit. Saying that one class hurting a mean creature 30% more than its counterpart should have been fixed before playtest is ignoring how the rest of the toolkit exists and matters.

It also ignores that there is no "mean creature" in D&D (unless "mean" is to described their mindset rather than the mathematical meaning of the word), and that depending of circumstances you may be facing easily-hit-but-tough ogres or kiting goblin archers or little para-elementals made of magma, which will all have vastly different reaction to the toolkit the different classes bring.

Sorry yes, mean meant their mindset. Have you looked at LudicSavant's DPR calculator? It actually gives you the ability to compare many of the things you're talking about such as different AC, advantage, chance to hit, bless or bane, etc.

I do take what I think your point is about how overly "white room" many DPR calculations can be. So, let me agree with you that game designers should test a few different scenarios like you've listed here. However, I'm pretty sure if you took a Crossbow Expert Sharpshooter Battle Master through all of the examples you just gave, you'd find it grossly outperforming most of the contenders (on DPR. Again, I'm just suggesting that DPR should be compared against utility). I think if the game designers had done any sort of DPR comparisons using those feats, they wouldn't be the powerhouses they are today.

All that to say, I think DPR comparisons using averages are an effective tool as comparing the ability of different build to do damage. Using averages in no way makes these tools ineffective.

Amechra

2022-03-25, 01:20 PM

even chemistry isn't really reducible to math in the same way physics is. Because once you start moving beyond the simplest systems, the interactions make the math intractable. So most of chemistry relies on empirical evidence and phenomenology rather than first-principles mathematical analysis. Chemical "laws" are more "well, when we did it, it worked this way. Except for cases X, Y, Z, ..., where it worked very differently." And going beyond chemistry, first-principles analysis is less and less valuable. Even most engineering uses established rules of thumb, heuristics, and books and books of empirically-derived numbers rather than actually doing the real math.

And we mathematicians like it that way! Stop trying to drag our work into that mucky "real world" you keep telling us about!

Mathematics is more of an art than a science, anyways. :smallwink:

PhoenixPhyre

2022-03-25, 01:50 PM

And we mathematicians like it that way! Stop trying to drag our work into that mucky "real world" you keep telling us about!

Mathematics is more of an art than a science, anyways. :smallwink:

Exactly.

To be honest, part of my resistance to this whole thing comes from many years being told we were supposed to be "data driven" (and by data they meant numbers and metrics) in my teaching. Which is great. I believe in measuring things, seeing what is working, and then adjusting. Actually running (small-scale) experiments. Knowing that it wouldn't necessarily generalize--what improves my teaching, for my subjects and my classes and my students may or may not work for someone else. But if I can improve, that's great.

But what the administration[1] meant by "data driven" is "swallow whatever numbers were being pumped by the studies done by people with particular agendas and strong biases, under circumstances that didn't look anything like what I was using and that fit the administration's prior beliefs and radically upend my teaching with every new fad." And then not actually try to decide how we'd know if it worked for us. In fact, I got one of the most weaselly, BS answers I've ever gotten (and I spent lots of time in academia, where BS answers are the norm) when I asked "how will we measure if <change at hand> worked? How will we prevent endlessly changing things just to change things?" It was clear that the administration didn't care one bit for doing a better job, what they cared about was being able to say that we were "following the research" and "being at the cutting edge."

Numbers and metrics have their place. But a bad metric isn't just a waste of time, it's actively harmful. And knowing if you have a good metric is way harder than assessing whether you can calculate the metric.

To quote a famous scientist:

“Your scientists were so preoccupied with whether they could, they didn't stop to think if they should.”

strangebloke

2022-03-25, 03:40 PM

Using the white room analysis to declare that X martial feat is OP is where my distaste for overreliance on DPR figuring begins ... and it's a symptom of a 'martials can't have nice things' attitude.

I have watched SS, as but one example, in play provide both big nova spikes of damage and be an utter disaster when the rogue/archer kept missing.
During the same session.

I do understand how advantage helps, so declaring me 'anti math' due to my distaste for the DPR zealotry is a mistake.

Here's the deal. I am a good teammate. I got with my nephew before the next session and we worked through ways I could set up advantage for his PC; and of course that made for more nova spikes, and that was an artifact of teamwork, not this empty DPR au outrance stuff. I say that again, we made it an explicit point to work together to leverage the potential for nova, not the guarantee of some average damage accrual.

I mean SS and GWM are op, compared to other options available to martials. It's also very good for casters, they just have better options overall so it stands out less. I'm in favor of making other options like say Shield Master better, or at least going with more favorable readings of them, and I'm also in favor of giving martials more options in general.

But while I pretty much agree with you that fixating on DPR to the exclusion of all else is really dumb, that's not the same thing as saying that DPR calculation is inherently without value. If nothing else people enjoy doing it.

Yakk

2022-03-25, 03:50 PM

I just plotted in any dice. I used the dice {0,0,0,0,6,7,8,9,10,11} to represent the potentials results of rolling 1d6+5 at 60% chance to hit and added that together 20 times.

The result was a distribution with a mean of 102 which is 12*8.5 as expected. This mean does correspond to the most frequent value of the distribution as expected. However, the distribution only has a range of 0-220. Meaning the mean is not in the middle and the distribution and is thus not actually symmetrical around the mean. There’s a longer tail to the right of the mean than to the left.

So it’s not a actually a normal distribution, but it is fairly close.
Yes, the central limit theorem doesn't say "it will be a normal distribution", just "you won't care because it gets close".

(And yes, that isn't what the CLT says; but it is really, really hard to avoid it happening.)

The mental model I like to use is "how hard would it be if you could repeat the scenario as many times as you want to tell it wasn't a normal distribution". Ie, how many repeats would it typically take to detect (with a 2 sigma certainty) that the observed distribution (total damage done) wasn't normal.

I'd guess many, many, many 1000s of games here.

You can make it even harder to detect if all you are allowed to see is the narrative results; "Character attacked" and "monster killed" times, no "hit/miss" or even damage rolls.

That is my gold standard; how hard is it to detect the narrative impact of something?

Like, imagine you gave fighters +1d6 damage on every weapon swing at level 5. Could you tell the difference between a fight with that fighter in it, and one without it, just looking at when/if monsters die?

Frogreaver

2022-03-25, 04:19 PM

Yes, the central limit theorem doesn't say "it will be a normal distribution", just "you won't care because it gets close".

The mean of each d6+5 at 60% is 5.1. The range increases by 0 to 11 each time. (Average =5.5)

No matter how many times you run it, this will never get closer to a normal distribution. It will smooth out. But it will never get closer to the symmetry needed. It’s skewed.

CLT is about the smoothness of the curve, not about asymmetrical things getting more symmetrical.

Dark.Revenant

2022-03-25, 04:27 PM

Generally, in play, players notice when the imbalance of power is greater than a +/- 25% error margin. I use that 25% figure as a rule of thumb. 25% more defenses, 25% more damage, etc. (when comparing apples-to-apples) is the smallest meaningful margin in real play, since the trends will become apparent over time. I have a whole system for scaling and balancing encounters that uses that 25% metric, and it works wonderfully.

Lots of smaller factors can, in congregate, combine to >25%. +1 attack or damage is a blip. +1 HP per level is a blip. +1 AC is a blip. All of those combined? Now you might be getting somewhere. It's hard to "notice" that sort of thing during play, not without a ton of sessions, but you can estimate the effects with modeling.

strangebloke

2022-03-25, 06:06 PM

At some point this just comes back to "Don't tell people they are Playing Wrong."

DPR calcs are fun for people. People who do DPR calcs broadly know that napkin math isn't getting you a complete picture, but they like thinking about what the approximate differences are under idealized conditions anyway, and how various factors play into it. You can inspect how important short rests are to monks vs. warlocks vs. fighters. You can math out when the best time to sharphooter is. And while play conditions make the results vary by a lot, and while variance means that even a strong build can vastly underperform (or a weak build can underperform) and while everyone knows there are loads of things that matter besides DPR (everyone agree bards, who don't deal good damage, are strong.) this doesn't mean that people, particularly those who play martials, don't enjoy optimizing and theorycrafting.

There is nobody coming to your house, beating you upside the head with an abacus, and saying "Pay more attention to DPR Math!!!" People talking about DPR math are on the internet in their own threads and if you interact with them its by your choice.

It's fun.

Let people have fun.

PhoenixPhyre

2022-03-25, 06:24 PM

At some point this just comes back to "Don't tell people they are Playing Wrong."

DPR calcs are fun for people. People who do DPR calcs broadly know that napkin math isn't getting you a complete picture, but they like thinking about what the approximate differences are under idealized conditions anyway, and how various factors play into it. You can inspect how important short rests are to monks vs. warlocks vs. fighters. You can math out when the best time to sharphooter is. And while play conditions make the results vary by a lot, and while variance means that even a strong build can vastly underperform (or a weak build can underperform) and while everyone knows there are loads of things that matter besides DPR (everyone agree bards, who don't deal good damage, are strong.) this doesn't mean that people, particularly those who play martials, don't enjoy optimizing and theorycrafting.

There is nobody coming to your house, beating you upside the head with an abacus, and saying "Pay more attention to DPR Math!!!" People talking about DPR math are on the internet in their own threads and if you interact with them its by your choice.

It's fun.

Let people have fun.

Except much of the time, it's DPR Math people shoving their way into other threads demanding that we accept their infallable judgement that DPR proves that X is the only way to go.

If you want to play with your math, do it like everyone else does. In private.

Unoriginal

2022-03-25, 06:28 PM

At some point this just comes back to "Don't tell people they are Playing Wrong."

DPR calcs are fun for people. People who do DPR calcs broadly know that napkin math isn't getting you a complete picture, but they like thinking about what the approximate differences are under idealized conditions anyway, and how various factors play into it. You can inspect how important short rests are to monks vs. warlocks vs. fighters. You can math out when the best time to sharphooter is. And while play conditions make the results vary by a lot, and while variance means that even a strong build can vastly underperform (or a weak build can underperform) and while everyone knows there are loads of things that matter besides DPR (everyone agree bards, who don't deal good damage, are strong.) this doesn't mean that people, particularly those who play martials, don't enjoy optimizing and theorycrafting.

There is nobody coming to your house, beating you upside the head with an abacus, and saying "Pay more attention to DPR Math!!!" People talking about DPR math are on the internet in their own threads and if you interact with them its by your choice.

It's fun.

Let people have fun.

Except much of the time, it's DPR Math people shoving their way into other threads demanding that we accept their infallable judgement that DPR proves that X is the only way to go.

If you want to play with your math, do it like everyone else does. In private.

Also, while it's not true for everyone's work obviously, several of those calculations are plainly misinformation.

Misinformation should be called out, even if "it's for fun".

Gignere

2022-03-25, 06:44 PM

Except much of the time, it's DPR Math people shoving their way into other threads demanding that we accept their infallable judgement that DPR proves that X is the only way to go.

If you want to play with your math, do it like everyone else does. In private.

Except there are posters pushing obviously wrong things like the -5/+10 SS feature is great on a rogue.

When you tell them it’s a DPR loss they said well their rogue at the table does great damage with SS. I feel this is worse than the DPR math guys just showing you their work when they make a recommendation.

PhoenixPhyre

2022-03-25, 06:49 PM

Also, while it's not true for everyone's work obviously, several of those calculations are plainly misinformation.

Misinformation should be called out, even if "it's for fun".

Or are based on very biased sets of assumptions.

Except there are posters pushing obviously wrong things like the -5/+10 SS feature is great on a rogue.

When you tell them it’s a DPR loss they said well their rogue at the table does great damage with SS. I feel this is worse than the DPR math guys just showing you their work when they make a recommendation.

Under what assumptions? Are those assumptions generalizable? Because you're narrowing the whole thing down and insisting that your concept of what is great is the only possible valid one and anyone who doesn't bow to the spreadsheet gods is playing wrong/having fun wrong.

I know a bunch of rogues who felt more powerful when they hit with that feature up. Perception, in D&D, is more powerful than reality. Because of a lot of things. Basically, the math doesn't matter for how most people play. And trying to shove that in their face and tell them they're wrong for feeling that way is exactly what I was talking about. It's the cult of (bad) math triumphant.

Gignere

2022-03-25, 07:01 PM

Or are based on very biased sets of assumptions.

Under what assumptions? Are those assumptions generalizable? Because you're narrowing the whole thing down and insisting that your concept of what is great is the only possible valid one and anyone who doesn't bow to the spreadsheet gods is playing wrong/having fun wrong.

I know a bunch of rogues who felt more powerful when they hit with that feature up. Perception, in D&D, is more powerful than reality. Because of a lot of things. Basically, the math doesn't matter for how most people play. And trying to shove that in their face and tell them they're wrong for feeling that way is exactly what I was talking about. It's the cult of (bad) math triumphant.

No they are making a DPR recommendation based on perception and I feel that is wrong. Especially when someone is making a thread like how to improve my rogue’s damage, should a poster recommend SS for the -5/+10 just because they perceive it to be better damage? I think this is wrong when someone is coming to ask for help for better damage.

strangebloke

2022-03-25, 07:24 PM

Except much of the time, it's DPR Math people shoving their way into other threads demanding that we accept their infallable judgement that DPR proves that X is the only way to go.

If you want to play with your math, do it like everyone else does. In private.

It's a public forum dude, people are going to talk about things they find interesting, and DPR calcs and theory crafting are basically the foundation of discussion on this site. It's one of the few ways to move a discussion onward after people compared anecdotes and still disagree. Usually after a few exchanges its easy to figure out why people had such different results, because in the process of doing math, basic assumptions like 'how many encounters' or 'how many rests' or 'how high of AC' will come up, and those assumptions will tell you what you need to know.

In some cases people are obtuse and overly confident about their points, or are bringing in DPR calcs when they're unwarranted...

But at some point this is just remixed stormwind fallacy. "People optimizing aren't roleplaying" -> "People talking DPR aren't weighting real play experience." It's kind of arrogant to assume that isn't it? I play in three campaigns a week, one of which I DM, and that's been more or less normal for me since 5e came out. I do weight my play experience very highly, and if there's a discrepancy between my play experience and my math, that's interesting and I'll try to find out why that is. You're just arbitrarily assuming the worst of other posters. You're assuming that your anecdotes are superior to other people's anecdotes + other people's math.

Good math is informed heavily by play experience, and math can explain to you why things played out the way they did in campaign.

Also, while it's not true for everyone's work obviously, several of those calculations are plainly misinformation.

Misinformation should be called out, even if "it's for fun".

I mean yeah, obviously, who's disagreeing with that? Nobody's saying you can't call out people who are wrong. But math is sorta how you do that? If someone says "bro, I played a champion and I was awesome" they're pretty obviously wrong, but without some math how can you actually say they're wrong.

PhoenixPhyre

2022-03-25, 07:34 PM

No they are making a DPR recommendation based on perception and I feel that is wrong. Especially when someone is making a thread like how to improve my rogue’s damage, should a poster recommend SS for the -5/+10 just because they perceive it to be better damage? I think this is wrong when someone is coming to ask for help for better damage.
Maybe they value different things? Not everyone prioritizes average values. It's totally rational to prioritize maximum values instead, in which case -5/+10 is great. You don't do that, but that doesn't make them obviously wrong. Unless you believe there is an objective set of values. Which is exactly the problem here.

Unoriginal

2022-03-25, 07:40 PM

If someone says "bro, I played a champion and I was awesome" they're pretty obviously wrong, but without some math how can you actually say they're wrong.

I'll let someone other than me respond to that more eloquently than I could...

At some point this just comes back to "Don't tell people they are Playing Wrong."

[...]

It's fun.

Let people have fun.

strangebloke

2022-03-25, 07:45 PM

Maybe they value different things? Not everyone prioritizes average values. It's totally rational to prioritize maximum values instead, in which case -5/+10 is great. You don't do that, but that doesn't make them obviously wrong. Unless you believe there is an objective set of values. Which is exactly the problem here.

I mean people will specifically make the claim that this raises average damage, which, yeah, depending on the level may be provably wrong.

Frogreaver

2022-03-25, 07:46 PM

Maybe they value different things? Not everyone prioritizes average values. It's totally rational to prioritize maximum values instead, in which case -5/+10 is great. You don't do that, but that doesn't make them obviously wrong. Unless you believe there is an objective set of values. Which is exactly the problem here.

It's not wrong to value whatever they want to. It's wrong to believe something like maximum values correlates better to damage performance than treating damage as an expected value problem. That obviously wrong misconception can be pointed out without telling someone they should value one thing over the other.

PhoenixPhyre

2022-03-25, 07:54 PM

I mean people will specifically make the claim that this raises average damage, which, yeah, depending on the level may be provably wrong.

If and only iff you're making the same assumptions they do about what goes into things. Context matters, and the DPR calculations are notoriously context blind on purpose. Yes, if you match scenarios exactly (and you're in a scenario that the spreadsheets can handle), you can test that hypothesis. But saying it's "obviously wrong" is hubris.

It's not wrong to value whatever they want to. It's wrong to believe something like maximum values correlates better to damage performance than treating damage as an expected value problem. That obviously wrong misconception can be pointed out without telling someone they should value one thing over the other.

Depends on what you mean by "damage performance". That's a values/definition question. And a subjective one at that. Telling someone that they should use your definition (with all the smuggled-in assumptions about what matters) is telling someone they should value one thing over another. By its very nature.

strangebloke

2022-03-25, 08:00 PM

If and only iff you're making the same assumptions they do about what goes into things. Context matters, and the DPR calculations are notoriously context blind on purpose. Yes, if you match scenarios exactly (and you're in a scenario that the spreadsheets can handle), you can test that hypothesis. But saying it's "obviously wrong" is hubris.

The bolded is only true if you're really really bad at DPR calcs. In a good calc, assumptions will be stated and agreed upon between various parties. If one side disagrees with those assumptions or thinks they're unfavorable or biased, they'll bring it up. The math itself is usually very simple compared to establishing the context.

Also is there a reason you're ignoring all my previous posts???

Malkavia

2022-03-25, 09:16 PM

We’re reaching a point where if someone asks “Is A or B better?” we should only reply with anecdotes and perceptions. We should keep our rude math private.

/sarcasm

Unoriginal

2022-03-25, 09:38 PM

We’re reaching a point where if someone asks “Is A or B better?” we should only reply with anecdotes and perceptions. We should keep our rude math private.

/sarcasm

Every single question of "is A or B better?" on this forum or any 5e forum has only ever been replied to with anecdotes and perceptions.

Because what is better is a perception question, and math that does not take every relevant factors into account (which given how many factors there can be in the game, is not possible without a ridiculous wide research project) is an anecdote.

I'm sure you can find people who will argue that d8s for hit points are better than d12s, because X or Y.

Malkavia

2022-03-25, 09:52 PM

Every single question of "is A or B better?" on this forum or any 5e forum has only ever been replied to with anecdotes and perceptions.

Because what is better is a perception question, and math that does not take every relevant factors into account (which given how many factors there can be in the game, is not possible without a ridiculous wide research project) is an anecdote.

I'm sure you can find people who will argue that d8s for hit points are better than d12s, because X or Y.

At risk of giving too simple of an example, are you suggesting that if someone asks the question “I’m a level 3 fighter with 16 strength. Would it be better for me to use a long sword or a short sword with my shield?”, that we cannot answer this question with math? That perceptions are better?

“Last session I rolled all 6’s on with my short sword whereas my friend never rolled above 5 with his long sword. Short swords are the way to go!” I must be misunderstanding your point.

Asisreo1

2022-03-25, 10:02 PM

Every single question of "is A or B better?" on this forum or any 5e forum has only ever been replied to with anecdotes and perceptions.

Because what is better is a perception question, and math that does not take every relevant factors into account (which given how many factors there can be in the game, is not possible without a ridiculous wide research project) is an anecdote.

I'm sure you can find people who will argue that d8s for hit points are better than d12s, because X or Y.

At risk of giving too simple of an example, are you suggesting that if someone asks the question “I’m a level 3 fighter with 16 strength. Would it be better for me to use a long sword or a short sword with my shield?”, that we cannot answer this question with math? That perceptions are better?

“Last session I rolled all 6’s on with my short sword whereas my friend never rolled above 5 with his long sword. Short swords are the way to go!”
These discussions tend to forgo any reasonable nuance.

Math is good when you're making quantitative comparisons. Things like "which weapon does more damage with the same cost" or "How much more powerful am I with this ability?"

Anecdotes/playtests are good when you're looking at qualitative comparisons or abstract concepts. Things like "Which utility spell should I take" or "How should I approach this encounter?"

In general, both have their place in these discussions and neither should be completely dismissed. But it's important to use them when applicable and not try to mix them too much.

For a more real example, imagine a Bard choosing between a rapier and a mace. The bard will probably want a rapier if their dex is a higher number since it does more damage and has better to-hit. No need to playtest that.

Now imagine discussing whether bard is a better support than cleric. You can quantify some things like healing, but ultimately, if you want to know whether Font of Inspiration or the more powerful cleric support spells are better for the purposes of support, you need to see them in action and test their effectiveness.

Hael

2022-03-26, 05:56 AM

There is this notion going around that b/c we need to make assumptions before making a calculation, that therefore its all invalid.. That its all just pie in the sky mental masturbation divorced from reality.

There is nothing further from the truth. Just b/c we use spherical cows to make calculations simpler, doesn't mean that those solutions are any less useful, or that they don't capture something truly fundamental about the nature of the problem.

Properly evaluating some of these metrics is sometimes exactly solvable with minimal amounts of assumptions. For instance, a battlemaster is going to outdamage a champion in all but the most extreme scenarios. The few exceptions to the above are easy to identify and to catalogue, and despite what people say this does tell us something about the nature of the class's relative power. When the metrics aren't exactly solvable, well that doesn't mean there isn't an actual answer. It just means that you would need to get a little more sophisticated with the mathematics, and indeed that's often not even that hard to do.

Corran

2022-03-26, 06:01 AM

Math is good when you're making quantitative comparisons. Things like "which weapon does more damage with the same cost" or "How much more powerful am I with this ability?"

Anecdotes/playtests are good when you're looking at qualitative comparisons or abstract concepts. Things like "Which utility spell should I take" or "How should I approach this encounter?"

In general, both have their place in these discussions and neither should be completely dismissed. But it's important to use them when applicable and not try to mix them too much.
Mixing them is where it gets tricky, but you definitely want to do that. (Playtesting) experience is what helps you set better parameteres so that your math gets closer to what you'll get in play. Modeling and approximating things is definitely not inherently wrong, you just need to be careful and leave appropriate room for error. And know when it's time to abandom your model. People have been doing this for a long time in economics, where economic models try to take into account even human behaviour. The fact that a model may not handle well extreme situations does not mean that it will not work well enough the rest of the time for example, and every failure can either inform a model (even if it's just by finding its limits) or give us the tools to replace it with a new and an improved one at that.

In the example of the cleric and the bard, I can use analysis to inform my opinion. For example, I could pick 3 encounters. The first one is the kind of encounter I would have in mind when thinking about the usefulness of bardic inspiration. The second encounter would be the kind of encounter I would have in mind when thinking about the usefulness of whatever cleric spell I wouldn't have on the bard and I think I would miss that. And finally a third encounter for which the bard and cleric dont have a direct counter but is still difficult (so maybe it's testing tankiness, healing and damage). Then I run a simulation and I get some results about the effectiveness of whatever it is I am looking at. Perhaps I even vary resources by pseudo-randomly removing some at the start of every encounter simulation. Experience informs the types of encounters you'll pick, because they need to be representative of what you want them to be. Experience also informs the action sequence with which your encounters were structured, because once again it has to be representative of what you want it to be (are you modelling your actual group, or a hypothetical one; how much room for human error does your model allow?). And experience also informs how much weight you will give to each one of the results. You dont need to live and die by the results you'll get, but you will probably draw some valuable conclusions from them. Perhaps I'll find that holy aura is not as better than bardic inspiration as I thought it to be, or maybe I'll find that the cleric's extra healing does not amount to as much as I thought it would under the parameters set, etc.

Asisreo1

2022-03-26, 06:06 AM

There is this notion going around that b/c we need to make assumptions before making a calculation, that therefore its all invalid.. That its all just pie in the sky mental masturbation divorced from reality.

There is nothing further from the truth. Just b/c we use spherical cows to make calculations simpler, doesn't mean that those solutions are any less useful, or that they don't capture something truly fundamental about the nature of the problem.

Properly evaluating some of these metrics is sometimes exactly solvable with minimal amounts of assumptions. For instance, a battlemaster is going to outdamage a champion in all but the most extreme scenarios. The few exceptions to the above are easy to identify and to catalogue, and despite what people say this does tell us something about the nature of the class's relative power. When the metrics aren't exactly solvable, well that doesn't mean there isn't an actual answer. It just means that you would need to get a little more sophisticated with the mathematics, and indeed that's often not even that hard to do.
Yep. But it's important not to stray beyond the conclusions that the evidence can provide and to use the data appropriately.

If someone were to ask the question "Which subclass should I choose?" Then saying "battlemaster does more damage than champion, statistically" is 100% true and fair. But that doesn't mean champion is no longer the answer. Because while it underperforms, it still gives the fighter playstyle but it does so with simplicity. So if a player wants a high-damage fighter, they know battlemaster is a good solution. But if they want a simple fighter, then champion might be what they want.

Giving anecdotes are good for these cases too. Because someone saying "as a power gamer, I prefer battlemaster because they feel better." Or "as a busy person/beginner that isn't tactically-minded, champion felt really good." Then the person asking can determine who they align with more and make similar decisions.

Yakk

2022-03-26, 07:07 AM

The mean of each d6+5 at 60% is 5.1. The range increases by 0 to 11 each time. (Average =5.5)

No matter how many times you run it, this will never get closer to a normal distribution. It will smooth out. But it will never get closer to the symmetry needed. It’s skewed.

CLT is about the smoothness of the curve, not about asymmetrical things getting more symmetrical.

The average of many independent random variables converges to a normal distribution with no skew.

So yes, it is about asymmetrical things getting more symmetrical.

The skew shrinks. It never reaches 0 in a finite series of operations, but I said converges to, not reaches. And a nearly zero skew is nearly impossible to detect from observation.

Malkavia

2022-03-26, 07:40 AM

These discussions tend to forgo any reasonable nuance.

Math is good when you're making quantitative comparisons. Things like "which weapon does more damage with the same cost" or "How much more powerful am I with this ability?"

Anecdotes/playtests are good when you're looking at qualitative comparisons or abstract concepts. Things like "Which utility spell should I take" or "How should I approach this encounter?"

In general, both have their place in these discussions and neither should be completely dismissed. But it's important to use them when applicable and not try to mix them too much.

For a more real example, imagine a Bard choosing between a rapier and a mace. The bard will probably want a rapier if their dex is a higher number since it does more damage and has better to-hit. No need to playtest that.

Now imagine discussing whether bard is a better support than cleric. You can quantify some things like healing, but ultimately, if you want to know whether Font of Inspiration or the more powerful cleric support spells are better for the purposes of support, you need to see them in action and test their effectiveness.

I’m not sure that we disagree here. I disagree with the idea that “averages are wasted” and that math is useless. Only our perceptions of what is effective matters.

Frogreaver

2022-03-26, 10:02 AM

The average of many independent random variables converges to a normal distribution with no skew.

So yes, it is about asymmetrical things getting more symmetrical.

The skew shrinks. It never reaches 0 in a finite series of operations, but I said converges to, not reaches. And a nearly zero skew is nearly impossible to detect from observation.

The Mean of a single dice is 5.1. The average of the range of possible values is 5.5.

This means the mean after N dice is 5.1*N and the average range is 5.5*N.

The absolute difference between mean and average range increases by 0.4N. The relative difference 5.1 N / 5.5 N stays the same.

If more dice took away the skewness in this case then the mean would get closer to the average of the range (it doesn't, not absolutely or relatively as shown above). So whatever you think CLT is telling you about this you are incorrect. We have a clear and concrete counterexample that CLT doesn't produce the behavior you believe it should.

Perhaps this would help. Normally when working with dice 1d6+5 (no miss chance). Repeatedly doing this would produce a mean of 8.5 and a range of 6-11 (8.5 avg). This distribution will be symmetrical around the mean and will produce a normal distribution. Adding in the miss chance is what skews the distribution.

Yakk

2022-03-26, 07:24 PM

The Mean of a single dice is 5.1. The average of the range of possible values is 5.5.
When I say "skew" I mean the 3 moment of the distribution, not the difference between the average and the average of extreme values.

Skew is a term in statistics, and when talking about statistics I will use the term from statistics.

You appear not to be using this term. That is ok, but you should know that your definition of skew isn't very useful. You are free to use it. But I won't.

...

So when I saw skew goes away, I mean the average of samples from many uniform distributions has its 3rd moment converge to zero.

In practice, this means if you observe the average (or sum) of many random distributions, you will almost certainly not be able to distiguish it from a normal distribution without an insanely large number of samples.

When you measure skew, you care not only about what values are *possible* but what the probability of those possibilities are. And the chance your average achieves the extreme values falls off with repeated sums or averaging. What more, the assymmetry in the odds (above/below the mean) also falls off!

The curve (of probabilities of results) converges to a skew-free normal distribution.

Typical ways to measure convergence are the L1 (integral of absolute differenced), L2 (integral of squares of abdolute differences) and L infinity (highest absolute difference) norms. If I recall correctly, all 3 norms give convergence to a normal distribution (almost everywhere; ie, except on a set of measure zero).

This is 1st year university statistics if I remember right; maybe 2nd year for non-stats majors?

Frogreaver

2022-03-26, 11:41 PM

When I say "skew" I mean the 3 moment of the distribution, not the difference between the average and the average of extreme values.

Skew is a term in statistics, and when talking about statistics I will use the term from statistics.

You appear not to be using this term. That is ok, but you should know that your definition of skew isn't very useful. You are free to use it. But I won't.

I can read up and get back to that level of precision but it's likely unneeded.

Consider your next statement.

So when I saw skew goes away, I mean the average of samples from many uniform distributions has its 3rd moment converge to zero.

The distribution generated by hit 60% deal 1d6+5 damage on hit = {0,0,0,0,6,7,8,9,10,11}. This is not a uniform distribution, even though a component of it was. It sounds to me like you are basing your conclusions around this premise of a uniform distribution, which isn't true in our case.

I'll let you clarify here before moving on to the rest.