one thing to consider is that powerful and cheesy do not mean the same thing. the reason the power category is there is to measure how powerful you can get.

cheesy, as far as i have seen, refers to using the rules in ways that are not intended, such as using sanctum spell or precocious apprentice to get into mystic theurge early. abrupt jaunt is not cheesy. it does exactly what it's meant to do, which is make you nearly impossible to hit. dark chaos shuffle is not cheesy. one spell is meant to give you abyssal heritor feats, the other is meant to take them away. they even share a name and a book, so you know they were meant to be used together. both options may be too powerful for some campaigns, but in an optimization challenge, simply being powerful should not ever be a penalty, so long as your power makes sense.

as for criteria, you don't need specific formulas for everything. that's one way to do it, but feel free to ad hoc a bonus or penalty when it's needed. or, go with a relative criteria. a wizard build is compared to straight wizard, and a fighter build is compared to straight fighter. fighter 19/wizard 1 is clearly more powerful than fighter 20, so you give it high marks in power, despite being far weaker than wizard 20, because it's not a wizard build.