I'm trying to find a general range of values for the minimum "success rate" that's noticeable by the general population during an activity like playing a game. In plain English: How small of a % change do people notice in play? I haven't been able to really find any research on it, just one paper where it was sort of tangentially mentioned as an aside. I've read like the intros of like thirty plus papers now, checks some of their references, tried dozens of search phrases, and come up with nothing.

I know that I, personally, don't notice a 10% change in success rate during play, but I do notice ones at about 15%. I can see it on a character sheet, calculate it, model it in code... but in the game a +/-10% is basically ****-all lost-in-the-noise bupkis as far as I can tell. I know that to some extent it's dependent on how the games play. something like a d100 or blackjackD20 with no hidden modifiers are super easy because you know you had 60% last time and 65% this time and rolled a 63. But stuff with limited info or hidden variables like the D&Ds where it's d20+8 vs ???, that's where the uncertainty comes in, where improving or "leveling up" blurs the line between knowing you have a slight statistical improvement versus feeling that you've gotten better at something happens.

Anyone have any leads or ideas short of me running my own damn study?

You might instead look at things like how accurate people are at estimating uncertainties from experiences or data, that would be a wider area of research and it should be closely related.

I feel like this sort of thing depends heavily on how many rolls are being made at a given success rate, consecutively. For example, in the average tactical RPG, the success rate for any given action might vary from 0-100% across characters and circumstances and the player might actually take (and receive, assuming the game reveals enemy chances) actions across nearly that entire range over the course of a single turn (sometimes even extremely low percentage actions are worth taking if there's nothing better to do and they carry no cost). This sort of constant shifting around makes gauging the differences in success rates very difficult, randomness just overwhelms everything.

By contrast, I imagine a situation where a character repeatedly does the same thing over and over again, with many rolls involved each time, is going to be noticed by the player. For example, in Disgaea it is common, when power leveling, to have a character play the same map over and over again, sometimes dozens of times in a row, and the steady creeping increase in their power, is noticeable, even across fairly small increments and used to monitor when the time has come to increase enemy power by one additional level.

Power level, success rate, and failure rate are all different things though. A +1 on a d20 could double your success chances.

In Diagaea, the trigger for increasing enemy level is usually some map going from nonzero failure rate at the higher level to zero failure rate. Which is a large relative change even if the stats difference was 0.1%

Here, for a concrete example in D&D:
Got a warlock with a +15 to hit with eldrich blast (because stupid wotc printed caster +attack bonus stuff that stacks in d&dbeyond), rolls three attacks a round for six rounds.

I can't tell the difference during play between shooting ac 18 & 20, can't tell the difference between shooting ac 21 & 23, can't tell the difference between shooting ac 19 vs 19 w/disadvantage. I can tell the difference between ac 20 vs ac 20 w/disadvantage.

I'm looking for actual data or research, not supposition or theorizing, on if people can actually tell the difference between 5% vs 10% or 90% vs 95%. Stuff like that.

Here, for a concrete example in D&D:
Here's a review article: https://www.sciencedirect.com/scienc...0169189190036Y

Evidently the keyword to search for is 'calibration' for this field, though that has to do with numerical estimates of confidence versus actual rate of success. There's also 'probability assessment tasks', e.g.: https://www.sciencedirect.com/scienc...30507380900458

You might want to try using Google Scholar to search articles on economics, the psychology of gambling, etc. along with key phrases like "unknown probability" or "unknown probability distribution" and such. I tried a few combinations and got some almost promising results in a few minutes.

You probably care enough about the topic to spend long enough trying combinations and looking through results to have a much better chance of finding something really relevant to your question.

Alas "calibration" in these seems to refer to subject's confidence in a decision or estimation. It'll be tomorrow before I can hit the other phrases.

I think the main challenge with perceiving success rates in RPGs is not just keeping track of successes, but identifying similar situations.
It's not just the die roll that can be adjusted up or down, it's also that the target numbers can be moved to any atbitrary value. If your rate of success goes up in a way that is significant, you don't know if the difference is the result of an increased bonus to your die rolls or a reduced difficulty in the target number. Many times an increase in character ability also goes along with an increase in target numbers for challenges.
To make things even worse, most actions that have a die roll are not made that regularly. Attack rolls to hit are probably the most regular rolls, but after that, but after that it drops of sharply.

My arbitrary guess would be that a change would really only be perceptible if it's something like "I switch from less than 30% success rate to less than 30% failure rate." Or the odds moving from 1:2 to 2:1.

I think the main challenge with perceiving success rates in RPGs is not just...
My arbitrary guess would be that a change would really only be perceptible if it's something like "I switch from less than 30% success rate to less than 30% failure rate." Or the odds moving from 1:2 to 2:1.
Ya, known potential issue for multiple variables. But in order for to say if that challenge is insurmountable there must first be a baseline for estimation. That is the search. Anecdotally it seems people pick up on ~15% rate changes. As a believer in data driven decisions I want my ****ing data.

In useful news what I want is in the realm of "change point detection" in a "time series dataset". The issue is it's a century old algorithm & computational issue, weeding out to get human accuracy is being a bother.

It might be easier to just set up and run the experiment yourself. Make something that shows two coin flip sequences in parallel, four flips per second, for 30 seconds. The subject should click on which sequence they think had a higher probability of heads. Vary the difference in bias between the two sequences, as well as the baseline rate, and find the points where the human error rate approaches chance level vs the baseline bias.

Ya, known potential issue for multiple variables. But in order for to say if that challenge is insurmountable there must first be a baseline for estimation. That is the search. Anecdotally it seems people pick up on ~15% rate changes. As a believer in data driven decisions I want my ****ing data.

In useful news what I want is in the realm of "change point detection" in a "time series dataset". The issue is it's a century old algorithm & computational issue, weeding out to get human accuracy is being a bother.
15% is a number that I've also been thinking about intuitively. But since we're not talking about observing a series of coin flips from the outside there will be a huge perception bias based on how useful the successes and how damaging the failures are in the game situation.
If you were to run a series of tests of having people declare if they got a +15% advantage or not in a game they played, I would expect there to be no consistency at all.

15% is a number that I've also been thinking about intuitively. But since we're not talking about observing a series of coin flips from the outside there will be a huge perception bias based on how useful the successes and how damaging the failures are in the game situation.
If you were to run a series of tests of having people declare if they got a +15% advantage or not in a game they played, I would expect there to be no consistency at all.
There should probably be a point where the ability to perceive the difference fails no matter how important the stakes are, and that level would be the one you'd want to call the 'just noticeable difference' for probabilities.

Finally starting to get decent hits with "regime change" and "subjective probability" with some other stuff. So far looks like people are **** at detecting a 60/40 split change over and seriously overestimate the increases.

I have only anecdotal data, but when I worked at the casino the people in charge decided to alter the odds by 1% in favor of the house to increase revenue (most of the machines have an adjustment built-in, and there is a range that they are allowed to operate in). This was unannounced (I only knew about it because I knew some of the people in charge of implementing it), but the customers all figured it out by the end of the next day.

I have only anecdotal data, ......customers all figured it out by the end of the next day.
Looks like it's not just relative & absolute signal strength (10/90 vs 5/95 and 60/40 vs 90/10) but also signal length that plays a major role. At something in the '8-11 trials within a minute' range people seriously increase detection accuracy, but under that 8-11 there's a bit of a cliff. That was actually part of what the last paper I read was testing for, relations in detection & prediction between different signal strengths, variabilities, and lengths.

Originally Posted by Telok
Looks like it's not just relative & absolute signal strength (10/90 vs 5/95 and 60/40 vs 90/10) but also signal length that plays a major role. At something in the '8-11 trials within a minute' range people seriously increase detection accuracy, but under that 8-11 there's a bit of a cliff. That was actually part of what the last paper I read was testing for, relations in detection & prediction between different signal strengths, variabilities, and lengths.
And in the case of the casino, it was not just about the experience of each individual. If you are sitting at a machine, there are many others around you. You are probably not paying attention to them, but you will hear their machines making happy sound with every win. Together with all the other noise it just creates an ambient background you would just ignore after some time as long as it the same. So if the frequency of wins changes it might have been like when your old clock stops ticking - you hear the silence where a sound was expected. A bit more subtle as the winnings were not in regular intervals as ticking of a clock, but it might still make you feel that something is off.

Bear with me for phone posting.

Like with my search for basic perception to make a %chart for game, this search led me back to a set of core research in the '60s & '70s that people keep referencing & building on. Unlike that search these people didn't have a crapton of military peeps available as subjects and had limited budgets*. Also different is the civvy research almost never includes the data at the end of the papers, uses way more dicipline specific jargon, and sometimes doesn't label axis on charts. Label your chart axis ****wad.

So I don't have nice numbers to play with this time, just general observations.

1. People have a really hard time discriminating a 60/40 split.
2. They aren't as good as you'd expect at catching a 70-75/30-25 split or a 5% vs 60% hit rate. For stuff around 70/30 they're overlapping numbers in error bars with the 60/40 results.
3. People are really good at noticing a 90/10 split or 50% vs 100% hit rate.
4. People reliably estimate a 60% rate as a 50% rate because they don't get streaks right, and trying to human produce a 50% "random" set reliably put out a 60% set with too few/short streaks.
5. Personal tendencies towards thinking about stuff in aggregate (all events over time & multiple people/trials) vs individual (only consider my own last 10 trials) has a potentially major difference in perception of the rates & changes when combined with other factors.
6. Minimum 6 to 10 trials in a short amount of time to detect any change, increasing up to 20+ trials for some 60/40 sets. Consistent across several studies. 5 or 6 was to be sure of a 50% vs 100% or 90/10 split, 8 to 10 was for a 70/30 type split.
7. Some people simply couldn't tell a difference in a 60/40 and it was worse if you went closer like 54/46.

My personal generalized conclusions.

A. You need 4+ trials to actually tell any difference in rates, and they need to be fairly close in time. Minutes at most. Anything rolled 1/hour you can't tell except massive massive swings (50%v100% or 90/10) or by direct comparison of numbers ("used to fail this roll on 17- and just rolled a 17=success")
B. You need at least a 20% swing in effect for everyone in the audience** actually tell any difference in rates if you don't have the numbers in your face.
C. Systems that produce memorable success spikes (rerolls or post roll "add another die" type stuff) have a magnifying effect on people noticing rate increases, partially because they require attention & knowledge of the numbers (ref A & B caveats).

For reference, four of the more relevant & useful papers to start with.

Detecting Regime Shifts: The Causes of Under- And Over-Reaction

Detection of Change in Nonstationary, Random Sequences
DONALD M. BARRY AND GORDON F. PITZ

Detecting Regime Shifts: The Role of Construal Levels on System Neglect
Samuel N. Kirshner

Detection of change in nonstationary binary sequences
JOHN THEIOS and JOHN W. BRELSFORD, JR

* "Do we know how good people are at spotting **** from jets?" "No. Lets send a bunch of guys out to the range in jeeps, trucks, and tanks, then spend two weeks flying fighter jets around at different altitudes to spot them." vs "If I grab 20 students and pay them \$5, minus 7 cents per miss, estimate misses as... hmm... ok, yeah, that comes in under budget if I can get my roomate to write the software for a \$8 pizza."

** People who know probability (especially stats students) and are looking specifically to identify rate changes notice it more. Everyone else you want to err on ghe high side.

A. You need 4+ trials to actually tell any difference in rates, and they need to be fairly close in time.
Of course you need a lot of trials to estimate probability differences of a few percent.

After four trials, we still know basically nothing about the underlying probability distribution, even if you do know there is a fixed probability of success. If the probability is 50%, the standard deviation after only four trials is still 0.25, which is definitely not enough to narrow down an estimator to the 5% level.

I wonder if the field of psychophysics has any bearing on this question. The field has a history of research in just noticeable differences. I dunno if you've searched the literature there already.

I'm not sure I get your question, though. For example, players often notice critical hits. Gets a big reaction from everyone. But they notice it no matter what the odds of a critical hit are. It's a lil unclear what a "change in percent of success rate" exactly means. Change in percentage points? Change in percentage total?

I wonder if the field of psychophysics has any bearing on this question. The field has a history of research in just noticeable differences. I dunno if you've searched the literature there already.

I'm not sure I get your question, though. For example, players often notice critical hits. Gets a big reaction from everyone. But they notice it no matter what the odds of a critical hit are. It's a lil unclear what a "change in percent of success rate" exactly means. Change in percentage points? Change in percentage total?
Looks like that's mostly going to be about vision/hearing stuff. I had searches were 4 words then fifteen not-this-words. I'll check at some point.

The whole question came about from a d&d 5e roll20 game where I realized I couldn't tell the diff between shooting at ac 22 or ac 25 just from the success rate at 3/turn and 1 turn/5 minutes for a whole hour. I'd much earlier noticed that halfling reroll 1s was super minor & totally unnoticable if I wasn't rolling physical dice. The whole toxic 5e forum just gets all whiteroom wankery nutso over a +2 bonus to to a couple 2 or 3 rolls a session for off stat abilities. Yet I'm here facing at 7 point diff from a fighter to warlock on a charisma check that gets rolled about once every 2 weeks over like 6 months, and just by success/fail memory I can't notice any difference in fighter vs warlock charisma.

So I started wondering if people can actually notice these sorts of differences in action and not when they're being forced to do the math and read the numbers for every roll. Turns out that, without doing the numbers & math each time, most people can't tell worth **** a +6 from a +3 on a d20 roll if they aren't making something like twenty plus rolls an hour at a single invariate target. It really explains why d&d 5e characters feel to me like they're static or getting worse at stuff outside of attacks & hp & damage & spells.

Originally Posted by Telok
Looks like that's mostly going to be about vision/hearing stuff. I had searches were 4 words then fifteen not-this-words. I'll check at some point.

The whole question came about from a d&d 5e roll20 game where I realized I couldn't tell the diff between shooting at ac 22 or ac 25 just from the success rate at 3/turn and 1 turn/5 minutes for a whole hour. I'd much earlier noticed that halfling reroll 1s was super minor & totally unnoticable if I wasn't rolling physical dice. The whole toxic 5e forum just gets all whiteroom wankery nutso over a +2 bonus to to a couple 2 or 3 rolls a session for off stat abilities. Yet I'm here facing at 7 point diff from a fighter to warlock on a charisma check that gets rolled about once every 2 weeks over like 6 months, and just by success/fail memory I can't notice any difference in fighter vs warlock charisma.

So I started wondering if people can actually notice these sorts of differences in action and not when they're being forced to do the math and read the numbers for every roll. Turns out that, without doing the numbers & math each time, most people can't tell worth **** a +6 from a +3 on a d20 roll if they aren't making something like twenty plus rolls an hour at a single invariate target. It really explains why d&d 5e characters feel to me like they're static or getting worse at stuff outside of attacks & hp & damage & spells.
I wonder if you have the same thresholds if instead of rolling to pass/fail, you roll and then use the roll to determine how much you spend. So for example, you'd flip AC/to-hit and have the defender roll, and for each point short they fell they would take an increment of damage.

Looks like that's mostly going to be about vision/hearing stuff. I had searches were 4 words then fifteen not-this-words. I'll check at some point.
Yeah, it's possible that psychophysics might have practically no bearing on your question. Mostly it's focused on our senses, hence the name.

The whole question came about from a d&d 5e roll20 game where I realized I couldn't tell the diff between shooting at ac 22 or ac 25 just from the success rate at 3/turn and 1 turn/5 minutes for a whole hour. I'd much earlier noticed that halfling reroll 1s was super minor & totally unnoticable if I wasn't rolling physical dice. The whole toxic 5e forum just gets all whiteroom wankery nutso over a +2 bonus to to a couple 2 or 3 rolls a session for off stat abilities. Yet I'm here facing at 7 point diff from a fighter to warlock on a charisma check that gets rolled about once every 2 weeks over like 6 months, and just by success/fail memory I can't notice any difference in fighter vs warlock charisma.

So I started wondering if people can actually notice these sorts of differences in action and not when they're being forced to do the math and read the numbers for every roll. Turns out that, without doing the numbers & math each time, most people can't tell worth **** a +6 from a +3 on a d20 roll if they aren't making something like twenty plus rolls an hour at a single invariate target. It really explains why d&d 5e characters feel to me like they're static or getting worse at stuff outside of attacks & hp & damage & spells.
I guess I'm still unclear what "these sorts of differences" are. You mentioned percentages, but here you're mentioning die bonuses, which are obviously different. I have some sense of what you're gesturing at, but I'm just not sure exactly what your question is.

You could try looking up probabilities in Xcom, there might be some formal study on the subject given how high profile the memes were/are. I know there has been informal stuff but I doubt any of it is particularly scientific.

