Correct. I was informed that it is an existent system that uses a normal probability distribution that can relatively-easily accommodate for the probabilities of different skill levels: ie: Someone with an elo distance of 100 from someone else has a ~64% chance of winning. 200 is 75%, etc. While it's normally used probabilistically as an approximate measure of skill, there's no reason you couldn't use elo as a skill number.
Oooh, elegant. We could totally draw from probability distributions instead of die rolls--we're on computers here, so no reason not to, after all.
Apart from the whole "wanting a playtested system" thing.
I do really like the idea of using something modeled on the Elo systems used in chess, since that gives us a natural point of comparison / sanity check. (Hazou is "low non-amateur" in taijutsu, jounin are masters/grandmasters, Gai's Kasparov-level, etc.) Unfortunately, the standard Elo system models everyone as having the same variance, which does seem like a disadvantage compared to systems where more skillful people are more consistent because they roll more dice. You can also easily give lower-skilled people more variance, but I'm not sure on the best way to do it while keeping the chess analogy reasonable.
I'm not sure about the normal distribution in specific, though it's not unreasonable (rolling a fair number of d100s approximates normal pretty closely, anyway). There's also the possibility of using a nonstandard Elo system with something besides the normal distribution, which I'm overexcited about because it's cool--but it would offer some advantages.
--
Fun stats time
The general idea of an Elo rating, as I understand it, is to model opposing rolls as rolling twice with different modifiers (Elo scores) to see which one comes out higher. Each roll comes from the same statistical distribution; Elo chose the normal distribution, which for our purposes is similar to rolling, say, 100d100--you'd get something sharply peaked around 5050 (+/- 280), with modifiers being appropriately large, maybe differing by ~500 between Jiraiya and Mari. People have found that fatter-tailed distributions are more realistic for chess; this would be something like taking 5x 20d100 with the same modifiers, where there's more of a chance for Mari to luck out a win.
The advantage of using probability distributions over die rolls is you can logic out what the spread of outcomes should look like--there are statistical distributions that are a natural choice depending on how you imagine things work. For example, you know how the normal distribution pops up everywhere? The reason is a wild little fact called the
central limit theorem, which basically says that if you're adding together a bunch of little effects, the result will look like a normal distribution--specifically, it'll converge to a normal distribution as the number of little* effects you're adding increases. So, if you think of a contest as being decided by a bunch of similar-scale advantages that add up, maybe that's reasonable. Of course, that didn't work too well for chess. Instead, effects might be more multiplicative, and you might want to go with the
log-normal distribution. (It's a distribution effective altruists
sometimes use as a prior for charity cost-effectiveness, for example.) This distribution corresponds to a variable whose log is normal, so it's the natural choice for a bunch of multiplied effects, since multiplying a bunch of variables corresponds to adding their logs. Or you could imagine success coming from the number of small independent events going in your favor, in which case you'd use a
Poisson.
(Or you might think there are a few specific effects at work, in which case you can try to model those and add/multiply them up--but that might be a bit of a rabbit hole unless you use a really simple model.)
This does lose the advantage of the direct analogy to chess Elo, though you can probably get scores that translate somewhat reasonably. (Especially if you use something that's already quite similar to the normal distribution, like the logistic.)
*Specifically, the theorem technically requires all the effects to be identically distributed, but for large enough numbers of effects all that's practically required is for the effects not to range too widely in scale.
This has been
fun stats time. Join us next week for our discussion of the heat death of the universe!
--