Power, Frequency, Versatility, Reliability -- four ability balance metrics

**PhoenixPhyre** · 2021-08-18, 01:54 PM (ISO 8601)

Thinking about balancing abilities (read broadly, anything from basic things anybody can do to class features to abilities you can buy with points to spells to...), it seems like there're basically four major parameters at any particular "power level" (which isn't necessarily D&D-style levels):

Power -- when a character uses <ability>, how much does it do? A "deal 1d4 damage" ability is (all else equal) weaker than a "deal 10d10 damage" ability.

Frequency -- how often can a character use this ability? This might be due to a cooldown, limited resources, or just it being a niche power that only is enabled when the moon is full and the day name starts with T, during the summer, while standing on a glyph drawn in unicorn blood.

Versatility -- how many situations is this useful for? A power that can damage objects and creatures is more versatile than one that can only damage one or the other; a power that is both a dessert topping and a floor wax is more versatile.

Reliability -- when you use this power, does it <do thing> all the time? Or just some of the time. A power that requires some form of attack or defensive or casting check is less reliable than one that just happens; a power that bypasses immunities or "just works" is more reliable than one that has an element of random chance.

-----------

My basic sense is that (comparing powers of equivalent "power level"), every power should have some kind of bounded aggregate total on those four areas. If (ranking things on a 1-10 scale, higher meaning higher), one power gets 10/10/10/10 and another gets 1/1/1/1, those two either aren't of equivalent power or are unbalanced. But one that totals 20 (7/7/3/3) and another that totals 20 (5/5/5/5) are, if not totally equivalent or equal, at least in the same ballpark.

And the same thing goes for power-sets for a character--if a character's power set is full of super-powerful things, that character shouldn't also have high marks in all the other areas. You could have a powerful and versatile, but unreliable and infrequent power set, or a frequent, powerful, but limited versatility and unreliable power set.

Does this framework make sense? Are there things I'm missing.

**KorvinStarmast** · 2021-08-18, 01:59 PM (ISO 8601)

Originally Posted by PhoenixPhyre

Does this framework make sense? Are there things I'm missing.

The reliability one changes a bit if you use a different dice mechanic than the d20 system does.
As an example, 3d6 and 2d10 (and other bell curve approaches) make for more predictable results and thus lean toward higher reliability. Dice pools and exploding dice are another approach that can play havoc with reliability. (Tunnels and Trolls offering but one example). The framework looks like a good start, though how one assigns the values of 1 through 10 might be tricky to nail down.

I played a variant of Dungeon World (Fellowship) a while back. Applying your idea to that would be an interesting test case. The DW resolution (2d6 + mods) has a 'degrees of success' where 10+ gives a wonderful result, 7-9 a success, and 6 or less a not great result. The flat modifiers were usually 0, 1, or 2, though it could go higher if the game went on a lot longer than ours did) and there was some in game currency for a team mate to assist the player attempting a success.

Not sure how that system would fit into your concept, but I think it would. I always had to think through my options.

**PhoenixPhyre** · 2021-08-18, 02:15 PM (ISO 8601)

Originally Posted by KorvinStarmast

The reliability one changes a bit if you use a different dice mechanic than the d20 system does.
As an example, 3d6 and 2d10 (and other bell curve approaches) make for more predictable results and thus lean toward higher reliability. Dice pools and exploding dice are another approach that can play havoc with reliability. (Tunnels and Trolls offering but one example). The framework looks like a good start, though how one assigns the values of 1 through 10 might be tricky to nail down.

I played a variant of Dungeon World (Fellowship) a while back. Applying your idea to that would be an interesting test case. The DW resolution (2d6 + mods) has a 'degrees of success' where 10+ gives a wonderful result, 7-9 a success, and 6 or less a not great result. The flat modifiers were usually 0, 1, or 2, though it could go higher if the game went on a lot longer than ours did) and there was some in game currency for a team mate to assist the player attempting a success.

Not sure how that system would fit into your concept, but I think it would. I always had to think through my options.

The actual quantification would be highly system-dependent and system-relative. So what counts as being "10/10 reliable" in D&D may be very different (mechanically) than a 10/10 reliable ability in a d100-roll-under skill-based system or an exploding dice 3d6 system.

**Xervous** · 2021-08-18, 02:15 PM (ISO 8601)

While conceptually this is nice and neat I’m not sure how you could practically apply it for design purposes as a rule. Is targeting their mental shield rather than targeting their armor worth a +1 or a +2? What’s the cap in any given category?

**PhoenixPhyre** · 2021-08-18, 02:18 PM (ISO 8601)

Originally Posted by Xervous

While conceptually this is nice and neat I’m not sure how you could practically apply it for design purposes as a rule. Is targeting their mental shield rather than targeting their armor worth a +1 or a +2? What’s the cap in any given category?

Honestly, I'm not so much worried (or interested) in the quantification (at this stage), but in the idea of balancing based on trying to consider the four metrics in aggregate. But I'm not a quantify-type person myself.

My big question is "is thinking in these terms complete? Are there pieces that heavily overlap or are redundant? Is it useful?" (that last being the biggest question).

**Xervous** · 2021-08-18, 02:43 PM (ISO 8601)

Well if you’re not quantifying then these are great baskets to lump concerns into. The trick will be asking the right questions that leads you to the appropriate baskets.

The haziest categories are either frequency of use or reliability. At least with how they relate to abilities with a limited range of valid targets. Sure you can assume there might be X every so often, but Joe’s campaign has X all day while Sally’s never does.

**PhoenixPhyre** · 2021-08-18, 03:05 PM (ISO 8601)

Originally Posted by Xervous

Well if you’re not quantifying then these are great baskets to lump concerns into. The trick will be asking the right questions that leads you to the appropriate baskets.

The haziest categories are either frequency of use or reliability. At least with how they relate to abilities with a limited range of valid targets. Sure you can assume there might be X every so often, but Joe’s campaign has X all day while Sally’s never does.

I agree on that last paragraph. It's something for DMs (assuming some form of tuning of adventures to people or vice versa) to be aware of, which could be "warn players that their anti-undead character is unlikely to face undead" or "add some undead" or "make sure that having heavy anti undead powers doesn't break things for the party".

It's ideally a framework for GMs, not just system designers. In my mind, there's always some level of "system design" that has to be done at the GM level to have a satisfying game. May only be communication, not actual changes. But having a set of buckets and the idea of balancing the total/balancing via tradeoffs is, I think, useful.

**Satinavian** · 2021-08-18, 03:14 PM (ISO 8601)

What is with "Cost of use" ? That is not really the same as frequency as it is more about "what else could i do in that time/with those components".

**PhoenixPhyre** · 2021-08-18, 04:04 PM (ISO 8601)

Originally Posted by Satinavian

What is with "Cost of use" ? That is not really the same as frequency as it is more about "what else could i do in that time/with those components".

Hmmm. Not sure. It's certainly a relevant factor--an ability that eats 1k gp and 1k XP each time isn't the same as one that only costs an action. Or an ability that has a chance of summoning a warp demon to eat you vs one that only costs a slot.

Maybe it fits (poorly) into frequency? As in--you can use it whenever, but instead of a resource cost or a limit on uses, it's a [money|time|risk] cost.

**Telok** · 2021-08-18, 04:19 PM (ISO 8601)

Originally Posted by Satinavian

What is with "Cost of use" ? That is not really the same as frequency as it is more about "what else could i do in that time/with those components".

There's a cost to aquire abilities too, in most games. Taking power A means you didn't take power B. Sometimes it could be very very temporary like d&d casters who swap out different spells by day or by level. Or it can be permanent like with a point buy system or d&d 5s resilent feat.

**RandomPeasant** · 2021-08-18, 04:31 PM (ISO 8601)

Originally Posted by PhoenixPhyre

Does this framework make sense? Are there things I'm missing.

I don't think it's the right way to approach a problem. You don't want to try to balance all the abilities against each other, or even the classes against each other directly. What you want to do is think about the balance point you want, then ensure that people hit that. What is a character who has 100 Karma or 5 levels or however much of whatever you system uses to measure advancement supposed to be able to deal with? Can the characters your system outputs (at whatever level of granularity you're testing at) all handle those things?

Trying to go directly to "power" or "versatility" will confuse things. It is very, very hard to tell, a priori, a set of random abilities that don't add up to anything worth caring about from a collection of silver bullets that trivialize everything a character is going to be asked to deal with. Don't try to solve that problem. Don't try to come up with a complete list of properties abilities can have. Don't try to figure out whether "four enemies become unconscious" is better than "six enemies become nauseated" or "one enemy takes seven boxes of damage" or "two enemies become stunned" (let alone how vastly more complicated those comparisons become when you start talking about non-combat abilities). Just figure out the problems characters are supposed to solve and if the total packages of what they get add up to solving those problems.

**PhoenixPhyre** · 2021-08-18, 05:10 PM (ISO 8601)

Originally Posted by RandomPeasant

I don't think it's the right way to approach a problem. You don't want to try to balance all the abilities against each other, or even the classes against each other directly. What you want to do is think about the balance point you want, then ensure that people hit that. What is a character who has 100 Karma or 5 levels or however much of whatever you system uses to measure advancement supposed to be able to deal with? Can the characters your system outputs (at whatever level of granularity you're testing at) all handle those things?

Trying to go directly to "power" or "versatility" will confuse things. It is very, very hard to tell, a priori, a set of random abilities that don't add up to anything worth caring about from a collection of silver bullets that trivialize everything a character is going to be asked to deal with. Don't try to solve that problem. Don't try to come up with a complete list of properties abilities can have. Don't try to figure out whether "four enemies become unconscious" is better than "six enemies become nauseated" or "one enemy takes seven boxes of damage" or "two enemies become stunned" (let alone how vastly more complicated those comparisons become when you start talking about non-combat abilities). Just figure out the problems characters are supposed to solve and if the total packages of what they get add up to solving those problems.

That's ok...at a full system design level (ie if you have control over both inputs and outputs). That's not what this is for as much. If I'm homebrewing a creature, or a class, or a spell, or an item, I don't determine what the system assumptions are. But I do care "is this out of band compared to existing features/abilities/powers". Or "what level of spell should this be?" Or "what rarity should this item be?"

Or when I'm making a campaign and looking at the characters my players have submitted, I need to be able to evaluate whether the campaign as designed will work for those characters as designed.

Or when I'm building a character and trying to balance to the table (hi Quertus!). If I've taken a bunch of high-power abilities, I should probably evaluate whether they're also high reliability, versatility, and frequency. Or the inverse.

That's the primary use for this framework, because very few of us are actually ab initio game developers. But many more of us are players and GMs.

------
And if we're only looking at benchmarks, "does this hit the benchmark" is a lousy way of telling overall balance. Because there aren't many benchmark campaigns. Unless you strongly constrain the adventure or party design or make everything homogenous (so everyone has a set of abilities that does X, a set of abilities that does Y, etc). Interactions between powers and campaign elements mean that a group of characters that all meet the benchmark in a white-room scenario may either drastically overshoot (meaning trivialize, which isn't much fun) campaigns that differ or drastically undershoot. And without an analytical framework, there's no way to tell why or where the benchmarking failed. This framework intends to provide some backing for thinking about why things work or don't work, rather than just the binary works/doesn't work.

**OldTrees1** · 2021-08-18, 10:43 PM (ISO 8601)

Originally Posted by PhoenixPhyre

My basic sense is that (comparing powers of equivalent "power level"), every power should have some kind of bounded aggregate total on those four areas. If (ranking things on a 1-10 scale, higher meaning higher), one power gets 10/10/10/10 and another gets 1/1/1/1, those two either aren't of equivalent power or are unbalanced. But one that totals 20 (7/7/3/3) and another that totals 20 (5/5/5/5) are, if not totally equivalent or equal, at least in the same ballpark.

Does this framework make sense? Are there things I'm missing.

The basic framework makes sense. The way to aggregate is unlikely to be additive but the concept of 4 metrics + aggregation makes sense.

Originally Posted by PhoenixPhyre

Power -- when a character uses <ability>, how much does it do? A "deal 1d4 damage" ability is (all else equal) weaker than a "deal 10d10 damage" ability.

Frequency -- how often can a character use this ability? This might be due to a cooldown, limited resources, or just it being a niche power that only is enabled when the moon is full and the day name starts with T, during the summer, while standing on a glyph drawn in unicorn blood.

Power and Frequency have an interesting non intuitive relationship where we need to factor in the opportunity cost.

Say there is an at-will ability that does 4 damage and 2 limited use abilities (one does 2 damage, one does 6 damage). We can immediately tell the 2 damage limited ability is irrelevant because the opportunity cost is the baseline 4 damage. We can also tell that the 6 damage limited ability is +2 damage a limited number of times rather than +6 damage ever.

Now what if you received a functional duplicate of that 6 damage limited ability. It was 6 fire damage for some N number of uses. Now you get 6 ice damage for some N number of uses. Is the second ability worth roughly the same as the first? It depends on if you can use all 2N uses. Generally you can, so generally limited abilities have small diminishing returns.

Now what if you received a functional duplicate of the 4 damage at will ability. It was 4 acid damage at will. Now you get 4 electric damage at will. Is the second ability worth roughly the same as the first? No. The second at will ability expanded the versatility but did not impact the power (outside of cases where the versatility creates power*).

Even more oddly, the overlap of at will abilities extends further than one might expect. Since you can only use one at will ability at the opportunity cost of using another at will ability, even at will abilities that are unrelated but equally applicable to the scenario (attack XOR heal ally for example) will still have this overlap.

* In cases where the at will abilities are not equally applicable, the net benefit of using ability A (the benefit of ability A - the opportunity cost of not using ability B) can be non zero. This is the effect of versatility on power.

Cost to Use is readily factored into Power and Frequency in a similar way the opportunity cost is factored into Power.

Don't worry about quantifying. The main benefit is understanding the different aspects exist and how the different aspects interplay.

**Glorthindel** · 2021-08-19, 04:02 AM (ISO 8601)

Originally Posted by PhoenixPhyre

Does this framework make sense? Are there things I'm missing.

I think this is a very solid way to look at things, and definitely would be a good approach in my opinion for a rebalance of things.

One of the things to consider is how perception can differ from reality, and how playstyle can create different results for the same value.

Power is straightforward - a player can look at different damage or bonus values and immediately get a clear impression of their value "weight". But Frequency and Reliability are more woolly, particularly at the lower end.

For example, I personally do not value 1/day abilities, even the really powerful stuff, because of "use remorse". If I have a super powerful 1/day ability, 9 times out of 10 it wont get used, because using it means I wont have it for the next encounter, when I might really need it. So even if its low Frequency is offset by high Power and Reliability, that low frequency weights higher than it should, because its Frequency is in reality far lower than expected. This was one of my issues with 4th ed - I found Daily abilities to just not work right, as I witnessed players hold on to them as long as they could, frequently overlooking ideal opportunities to use them, then as soon as one person used their Daily, everyone else unloaded theirs as it was assumed that the party would call it for a rest immediately afterwards.

Likewise, I personally value spells that require a Saving Throw much lower than an equivalent-odds spell that has a To-Hit roll. Why? Well, I control the To-Hit roll. I roll the dice, and I can arrange as best to stack the odds. Meanwhile, the DM controls the Saving Throw. He rolls the dice (in secret, usually), and any modifiers are provided by effects that are in place that I will be unaware of. And he might have Legendary Resistance anyway. So an attack that 'by book' has a 50% chance to hit, and an attack that 'by book' has a 50% chance to be saved are not equal, as the 50% save might be modified by things I am unaware of, whilst I am aware of everything modifying the 50% to hit, and even if the save is failed, the DM might apply Legendary Resistance or just straight cheat and call a save when it failed. So, despite having the same Reliability value, one will be perceived by me as more Reliable than the other.

**False God** · 2021-08-19, 06:08 AM (ISO 8601)

I swear there was an edition that did this, a four-something edition.

The versatility one is always a tricky one depending on the setting, adventure, or campaign in question. Sure, we can say an ability has one or more clear uses, but there will always be corner cases that may crop up more at any given table. So at best all you can do (from a design perspective) if see how well it applies to all aspects of the game.

Personally I'd add a very hard category Synergy. One thing that is often overlooked, and IMO leads to power-builds, multiclassing shenanigans and questionable splat releases is how well any given ability synergizes with any other. Even on a basic level, how well two classes work together. It requires a lot of real-gameplay theorycrafting rather than whiteroom simulation.

Your current number of categories works for any individual power on its own, but IMO it's an incomplete view without considering any given power's interaction with any other power.

**Cluedrew** · 2021-08-19, 07:39 AM (ISO 8601)

To False God: I was actually struggling with that last night and my solution to the synergy issue is a bit different: This evaluation needs to be done on sets of powers as well as individual powers. Eventually adding up to the evaluation of a character's ability set, but you can break it down so that analysis is more manageable.

**RandomPeasant** · 2021-08-19, 08:54 AM (ISO 8601)

Originally Posted by PhoenixPhyre

That's ok...at a full system design level (ie if you have control over both inputs and outputs). That's not what this is for as much. If I'm homebrewing a creature, or a class, or a spell, or an item, I don't determine what the system assumptions are.

No. But you have to work within those assumptions. So your benchmarks should be based on whatever the system you're working in uses to define appropriate challenges for characters.

Or when I'm building a character and trying to balance to the table (hi Quertus!). If I've taken a bunch of high-power abilities, I should probably evaluate whether they're also high reliability, versatility, and frequency. Or the inverse.

Once you are talking about a specific table, any kind of general rubric is of extremely marginal utility. If you want to match three other data points, just match those datapoints. Don't try to build a general framework, map those three data points onto it, then find another data point that does the same thing.

And if we're only looking at benchmarks, "does this hit the benchmark" is a lousy way of telling overall balance. Because there aren't many benchmark campaigns.

Benchmarks are the only way to tell overall balance. "How does this fit into every possible campaign" is an unanswerable question. The way to produce a balanced system is to balance it against robust benchmarks, and to understand how deviating from those benchmarks will effect balance. And you should not be benchmarking at the level of campaigns. You should not even be benchmarking at the level of adventures. You should be benchmarking at the level of challenges, so that when those challenges are composed into an adventure or campaign, balance is preserved, or at least broken in predictable ways individual DMs can compensate for.

This framework intends to provide some backing for thinking about why things work or don't work, rather than just the binary works/doesn't work.

If your test suite produces only binary feedback, it is a bad test suite.

**Quertus** · 2021-08-19, 08:49 PM (ISO 8601)

Originally Posted by PhoenixPhyre

Or when I'm building a character and trying to balance to the table (hi Quertus!). If I've taken a bunch of high-power abilities, I should probably evaluate whether they're also high reliability, versatility, and frequency. Or the inverse.

Hi

So, your main question(s) seem to be

Originally Posted by PhoenixPhyre

My big question is "is thinking in these terms complete? Are there pieces that heavily overlap or are redundant? Is it useful?" (that last being the biggest question).

Is it complete? Absolutely not.

Suppose I decided to make a sport, and made the teams perfectly balanced according to a perfect version of your rules.

But then you find that the sport is "team hunger games". And, on another team, I've placed your mom / your child / the love of your life.

You'd definitely have cause to complain about the teams, despite them being perfectly balanced.

But that's probably too abstract. Let's try a more concrete example.

Let's say you bring a character to a mid-level D&D 3e game: a highly optimized 2-weapon SA DPS skill monkey Rogue, complete with UMD. You've even had the forethought to set money aside to buy consumables to UMD as needed, in addition to starting with a few eternal wands, partially charged wands, and scrolls.

What were your categories? Oh, look - you put them in the title (kudos!): Power, Frequency, Versatility, Reliability.

Your SA is 10/10 Power, able to turn anything even remotely CR-appropriate into chunky salsa. Similarly, being usable at-will, and having tricks (like flanking with yourself, or one of several others I've seen) (and you were smart enough to have a source of flight), it's 10/10 on Frequency. It's really only useful in combat… which would be 3/10 Versatility if all pillars were equal; let's say that this group calls it 5/10. Reliability? You're highly optimized, so you'll hit most everything, but… you didn't take anything to let you use SA on plants / undead / constructs / etc, so maybe 5/10 there. Total: 30 points.

General Rogue skill monkey? Although it lets you bypass the epic challenge of the locked door, let's say that everything but your Diplomacy is only a 3/10 on the Power scale. But it's 10/10 Frequency, being at-will, and, as you can use those everywhere (you can Hide and Tumble (and more) in combat), it's 10/10 Versatility, where you're generally left wondering which skill to use, rather than having nothing to do. Reliability? You're pretty optimized to hit expected DCs (and have an Eternal Wand of Wield Skill for those hard to reach places) - let's call it 7/20. Total: 30 points.

Your UMD? Boy, it depends. You could definitely hit above your weight class with higher level spells than the party Arcanist / Divine Caster / Psion / whatever could cast / manifest / invoke / whatever. And they're generally considered the powerhouses, right? So it's hard not to give it 10/10 on Power. But it's all one-shots, with a few 2/day effects. So we'll say 1/10 on Frequency. You can do absolutely any spell/power, and you've set aside the money to do so, so it's 10/10 Versatility. But you can fail checks on a 1, or when you punch above your pay grade, so not perfect Reliability - maybe 5/10? Total: 26 points.

Great. We've got 3 main lines of abilities, worth a level-scaled 30, 30, and 26 points. What does this tell us? How does that compare to a character with one ability at 35, or another with a handful in the low 20's?

More importantly, let's say that the GM decides to run Necromancy on Bone Hill. There's no real opportunity to buy items to adapt to the scenario, and your SA is nearly useless. Looking at the adventure, there's almost no chance for you to get much spotlight time - and certainly not to show off the bits of the character that you're proudest of.

So, if your numbers look better than those of the übercharger, the Mailman, and the turning specialist Cleric, what should the GM do?

-----

Is it useful? It's a great tool for Players to have, to develop the lingo to use to explain why they aren't having fun. It's a terrible tool for GMs (game refs, scenario designers) or system designers to use, because it will fill them with false confidence that they know what's going on.

I know of no shortcut beyond evaluating "how can this character participate in each of these scenes" to accurately measure such things.

Or, if you prefer,

Originally Posted by RandomPeasant

I don't think it's the right way to approach a problem. You don't want to try to balance all the abilities against each other, or even the classes against each other directly. What you want to do is think about the balance point you want, then ensure that people hit that. What is a character who has 100 Karma or 5 levels or however much of whatever you system uses to measure advancement supposed to be able to deal with? Can the characters your system outputs (at whatever level of granularity you're testing at) all handle those things?

Trying to go directly to "power" or "versatility" will confuse things. It is very, very hard to tell, a priori, a set of random abilities that don't add up to anything worth caring about from a collection of silver bullets that trivialize everything a character is going to be asked to deal with. Don't try to solve that problem. Don't try to come up with a complete list of properties abilities can have. Don't try to figure out whether "four enemies become unconscious" is better than "six enemies become nauseated" or "one enemy takes seven boxes of damage" or "two enemies become stunned" (let alone how vastly more complicated those comparisons become when you start talking about non-combat abilities). Just figure out the problems characters are supposed to solve and if the total packages of what they get add up to solving those problems.

Originally Posted by RandomPeasant

And you should not be benchmarking at the level of campaigns. You should not even be benchmarking at the level of adventures. You should be benchmarking at the level of challenges, so that when those challenges are composed into an adventure or campaign, balance is preserved, or at least broken in predictable ways individual DMs can compensate for.

This says something balanced with my opinion (although I doubt I could compare their Power, Frequency, Versatility, and Reliability).

**Lorsa** · 2021-08-21, 02:18 PM (ISO 8601)

This is incredibly interesting and I will try to read through the thread and answer in more detail during next week (when I should be working, most likely).

If I understood correctly, you are mostly interested in this conceptually so:

In regards to the completeness, I doubt it is complete. While I can't think directly of any one category that is missing, there seems to be a lot of overlap that you have to look into.

For example, what really differentiates frequency from versatility or reliability? Perhaps frequency isn't really what you are after, but rather something like "complicatedness of use". Otherwise when trying to quantify one has to ask: "what frequency are you looking for?". Is it frequency in game time? Frequency in real time? Frequency compared to other abilities? Frequency per session?

Likewise, power seems to have quite a bit of overlap with reliability as well. Even if you clearly think of "power" in numerical terms (i.e. 1d4 vs. 10d4), the reliability metric would only affect this value into an "average score" anyway.

Unfortunately I have to stop now, but I think you really should think of only two main categories, one thing "power" and the other "versatility", where this reliability and frequency can be sub-categories (among other subcategories).

Lastly, is it useful. Most theoretical frameworks are useful to some degree. Even the wrong ones. So yes.

**OldTrees1** · 2021-08-21, 03:06 PM (ISO 8601)

Originally Posted by Lorsa

For example, what really differentiates frequency from versatility or reliability? Perhaps frequency isn't really what you are after, but rather something like "complicatedness of use". Otherwise when trying to quantify one has to ask: "what frequency are you looking for?". Is it frequency in game time? Frequency in real time? Frequency compared to other abilities? Frequency per session?

Likewise, power seems to have quite a bit of overlap with reliability as well. Even if you clearly think of "power" in numerical terms (i.e. 1d4 vs. 10d4), the reliability metric would only affect this value into an "average score" anyway.

Unfortunately I have to stop now, but I think you really should think of only two main categories, one thing "power" and the other "versatility", where this reliability and frequency can be sub-categories (among other subcategories).

Lastly, is it useful. Most theoretical frameworks are useful to some degree. Even the wrong ones. So yes.

My understanding

Frequency: How often can the character use this ability? What limits the use from happening infinite times over 0 seconds? For example in 5E the Attack action costs an Action and thus is usable roughly once per turn. In contrast each of a 5E Warlock's Mystic Arcanum can be used once per day. If a gun has 6 bullets, then it has a frequency of 6 uses per reload. Which particular reference frame you are using might shift depending on context.

Reliability: When the ability is used, how often does its effect(s) occur? Is there a check that causes a chance of failure? Does it have any guaranteed outcome? For example consider an ability that does 10 Poison damage plus Con Save vs Stun. With that ability the poison damage is more reliable than the stun effect.

Versatility: How broadly applicable is the ability? Is this an ability that is always useful, or an ability that is only useful during combat?

These 3 categories are not about how "complicatedness of use" the ability is, rather it is a measurement of the projected power beyond the measurement of raw power.

Reliability vs Power:
If you want to display both Reliability and Power you would create a probability distribution. The shape of the distribution is the reliability. A sword that deals 2d6 damage is different from a sword that deals 7 damage. If you use expected value as the measurement for power, then the shape of the probability distribution is the reliability.

Versatility vs Reliability + Frequency:
One Wizard knows Fireball. Another Wizard knows "Fire or Ice ball: AS per Fireball but you could do cold/ice damage instead". The two spells have the same Reliability (Dex Save DC 13 for half damage), and Frequency (2 times per day for example). However the second spell is more versatile (albeit barely).

I can see how one could fold Raw Power, Reliability, and Frequency into just "Power", however you can also fold Versatility in at the same time. Which categories make sense to be explicit/separate rather than implicit/folded in might depend on context.

**Quertus** · 2021-08-22, 07:41 PM (ISO 8601)

Originally Posted by Quertus

I know of no shortcut beyond evaluating "how can this character participate in each of these scenes" to accurately measure such things.

To expand on this a bit…

If I'm bringing an übercharger / Street Samurai / combat monster that I *know* will take center stage in most combat scenes, if I'm trying to balance to the table, I will try to design for a more passive role in other scenes / try to make sure that there are places where the other characters can shine.

Or, if I build the Sarak of diplomats, who is also a warrior, maybe he'll be a reluctant warrior, who is slow to join the fight, or takes penalties from disarming and subduing his foes, or from not being left-handed, or some such.

But it's a matter of, well, what *matters* to a given table.

At a table where a Monk is considered OP and has to be nerfed, or where the only thing that counts is damage and a high-op BFC Tainted Sorcerer is considered "not contributing", I'll build differently, with different considerations in mind, when I'm trying to balance to that table, than I would trying to balance to "the Playground", or to one of my tables.

And that's the rub. The high-op BFC Tainted Sorcerer is considered Power 0 at one table, and OP at another, because people don't measure the same things.

Or, put another way, what does it matter how many points a character's abilities are worth, if the only thing that the table cares about is getting to deliver one-liners?

**Vahnavoi** · 2021-09-03, 06:25 AM (ISO 8601)

These are all real measures, but on the highest abstracted level, you only have the first two, power and frequency. The last two, versatility and reliability (the latter which is effectively randomness, without a random component it reduces into versatility) are factors of frequency.

Where I disagree is measuring these on an arbitrary 1 to 10 scale. With many games, it's perfectly possible to measure them using some existing real or game unit.

**Quertus** · 2021-09-03, 11:24 AM (ISO 8601)

Originally Posted by Vahnavoi

These are all real measures, but on the highest abstracted level, you only have the first two, power and frequency. The last two, versatility and reliability (the latter which is effectively randomness, without a random component it reduces into versatility) are factors of frequency.

Where I disagree is measuring these on an arbitrary 1 to 10 scale. With many games, it's perfectly possible to measure them using some existing real or game unit.

Interesting.

A) do you have any examples of game/metric sets that you might use? Like… you could try to use "casting cost" to estimate "power" in MtG, but there would definitely be some outliers on the accuracy of that metric.

B) how well do these metrics measure and encapsulate what is actually important in the game?

**PhoenixPhyre** · 2021-09-03, 12:37 PM (ISO 8601)

Originally Posted by Vahnavoi

Where I disagree is measuring these on an arbitrary 1 to 10 scale. With many games, it's perfectly possible to measure them using some existing real or game unit.

I should have been clearer that the 1 to 10 scales was a throwaway scale for pure convenience on that particular abstract example. If I were to actually implement/quantify this, the scales would have to be crafted better and would almost certainly be system-specific.

The one quibble there is that since one of the fundamental principles here is that you're considering the aggregate of these, you have to use metrics that are comparable in some way. 1-10 scales are easy for that, but you could have some kind of mapping function between <power units> and <frequency units>.

As for frequency including versatility/etc, the distinction I had in my head was that frequency was a maximum in in-game time units--how frequently can you push that button in a given amount of in-game time? So something like a 1x/day ability would have a frequency of 1x/day. On the other hand, versatility measures how many different things can this one ability do? A 1x/day "deal X fire damage to one target" and a 1x/day "deal X fire OR cold damage to one target" and a 1x/day "deal X fire damage to a target AND banish target to hell for 1 minute" ability are very different, despite all being useable 1x/day and all having the same basic "use this in combat" specifier. Reliability goes to "how often does pressing the button actually produce the effect." Which could be rolled into frequency, but the variations are much faster so it seems reasonable to have it as a separate factor in the aggregate.

Note: There's no reason why the aggregate would necessarily be the sum of the scores; these could have different weights (as a simple difference) or even be a state machine with arbitrary complexity. The only criteria for the aggregate function is that it's
a) computable reasonably simply (because people have to do it)
b) output can be totally ordered over the ability space of the system. That is, for any two abilities supported by the system, the aggregate function has to be able to implement all of the comparison functions (==, <, >, >=, <=, !=). Most of those can be derived from the others, so basically you need equality, negation, and one of > or <. This is easiest with numerical values, but you can do it with other values as long as you can compare

**KorvinStarmast** · 2021-09-03, 01:40 PM (ISO 8601)

Originally Posted by PhoenixPhyre

This is easiest with numerical values, but you can do it with other values as long as you can compare

In engineering speak, the units need to match up.

**PhoenixPhyre** · 2021-09-03, 01:52 PM (ISO 8601)

Originally Posted by KorvinStarmast

In engineering speak, the units need to match up.

Or have a dimensionful transformation that makes them match.

So if power was a 1-10 scale (example only) and frequency was a uses/in-game-day scale, then one of the two would have to have some function f(frequency) such that [f(frequency)] == [power] or [f(power)] == [frequency]. So you could normalize to f(frequency) = c * uses/day, where c might be something like 1 (day/use) (so something with 10 uses per day would be a 10 on the power-unit-scale), etc.

**Quertus** · 2021-09-04, 08:19 AM (ISO 8601)

So, this actually gets kinda weird.

Because you have to account for playstyle. Player > Class, as they say.

For some, having a 1/day ability is actually a net negative: they'll hoard it & never use it, forget about it & never use it, or use it "at the wrong time" & be filled with and distracted by regret afterwards.

For some, an "at-will" ability is a rut that they'll fall into, making them less likely to notice their other abilities.

Handing *me* an item with limited charges is different from handing it to most people. I'll sit there and hoard the charges, seemingly having forgotten about it. Then one day, I'll suddenly spam it. Maybe it's because we "needed" it. But more likely, everyone will be confused. I'll just have my character shrug, and say, "it rained yesterday, the ground was still wet - I didn't want to get my boots muddy" or some such. (Depends on the character, of course; point is, my reasoning is not Determinator approved)

Quertus, my signature academia mage for whom this account is named, is ridiculously OP from a pure power standpoint but is balanced (or even "The Load") because of his personality. And there's tables where Monks are considered OP and need to be nerfed, or where totally OP BFC Tainted Sorcerer Arcane Spellcaster builds would be considered "not contributing" because they dealt 0 damage.

Point is, I think that each individual table would need to make their own heuristic to determine exactly how powerful an individual character was.

So let's look at the simplest possible example of a real table: the table that only counts damage dealt.

Let's measure the expected damage output of some 10th level characters over, say, 40 rounds.

The totally OP BFC Tainted Sorcerer Arcane Spellcaster build (also party healer) deals 0 damage (unless you count damage he deals to himself). Well, that was easy.

Based on the AC and DR of the foes the GM has in the module, the Half-Dragon Monk… who gets Haste 10 rounds/day… based on expected AC and DR… is looking at around 700 damage.

The ½-Ogre Rogue… based on AC, DR, and SA vulnerability… if he chooses his targets to maximize his damage… is looking at maybe around 2,100 damage. In one rough patch, though, those numbers drop to about 210.

The blaster Sorcerer… who runs out of gas, so how many *days* those 40 rounds cover matter… based on expected touch AC, saves, clumping, and energy resistances… could be looking at over 4,000 damage. In a rough patch of the adventure, though, those numbers drop to about 530.

Obviously, the blaster is the strongest, and the totally OP BFC Tainted Sorcerer Arcane Spellcaster build the weakest.

Yet I'd think that the totally OP BFC Tainted Sorcerer Arcane Spellcaster build would be very effective at their job, and the blaster Sorcerer the least fun to actually play (having run out of spells and sitting there doing nothing much of the game).

So, even if this metric were right for this table, I'm concerned that a GM using it would be predisposed to Nerf the Sorcerer, rather than listen to their concerns that their character wasn't fun to play, and be disinclined to acquiesce to their request (to retrain / to add to the whitelist / whatever) to get a reserve feat to give them something to do when they're out of gas.

So, again, my suspicion and assertion is that this would make a great tool for the Players to use, to give a name to their pain. But a bad tool for a GM who will get a false sense of security that they understand the issue. And a horrible thing for system designers, who don't know what the actual table looks like or cares about.

**LibraryOgre** · 2021-09-04, 11:41 AM (ISO 8601)

Originally Posted by Xervous

While conceptually this is nice and neat I’m not sure how you could practically apply it for design purposes as a rule. Is targeting their mental shield rather than targeting their armor worth a +1 or a +2? What’s the cap in any given category?

Some of this can be managed by looking at some other systems; Ars Magica, for example, lets you grade powers on their total ability... range, duration, level of effect, and so on, resulting in a level of spell.

If you could design some reliable parameters for this, you could probably use a 0-9 scale, with the mean turning into the level of the spell in D&D.

**RandomPeasant** · 2021-09-04, 12:05 PM (ISO 8601)

Originally Posted by Mark Hall

Some of this can be managed by looking at some other systems; Ars Magica, for example, lets you grade powers on their total ability... range, duration, level of effect, and so on, resulting in a level of spell.

But that doesn't result in particularly balanced spells. Unless you go full effect-based for your system, it's very hard to tell the difference between a broken spell and a pointless one. Conjuring a wooden table is a modestly impressive utility effect, while conjuring a wooden cage is a quite powerful lockdown effect. Even in an effects-based system, mechanics don't always compose neatly. The range that is necessary for remote viewing to be useful is game-changing on blasting spells, but puzzlingly useless on buffs. Asking "how does this feel in comparison to other effects" can be useful, but ultimately you have to test things, and breaking stuff down too far into the details is often counter-productive because it gives a false sense of clarity.

**Telok** · 2021-09-04, 02:56 PM (ISO 8601)

Originally Posted by Mark Hall

Some of this can be managed by looking at some other systems; Ars Magica, for example, lets you grade powers on their total ability... range, duration, level of effect, and so on, resulting in a level of spell.

If you could design some reliable parameters for this, you could probably use a 0-9 scale, with the mean turning into the level of the spell in D&D.

One thing I considered doing when I cared about D&D magic was to build a lot/all/some % of the spells in a couple different point buy systems. The difference in effect valuation between the different pb systems should be relatively stable and able to be expressed as ratios, which would allow you to normalize across the systems. You'd end up with a "score" for each spell based on the point buys.

Thread: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Re: Power, Frequency, Versatility, Reliability -- four ability balance metrics

Posting Permissions