Thanks for that. Yeah - to be clear I think on the balance of probabilities Niemann is likely a cheater. He's admitted cheating, he's been called out further by Chess.com and there are now these weird statistical anomalies. It's just I'm naturally cautious about using data to reach stronger conclusions than should be. Science is littered with observations which appear at first glance to be statistically significant but turn out not to be once more rigour has been applied.
To your points. If the net was drawn as widely as possible for Magnus then that's great - it must have picked up as many 90%+ games as is possible so yeah, that lessens my concerns.
I do think that 100% in longer games is a lot less defensible than those just out of known theory or what have you. So again, suspicious. I just think it's probably a lot easier to play 100% against a 2600 than against a 2750. Obviously you'd have to play way above 2600 to do it, but while unlikely it's not without the realm of possibility that Niemann can play to those standards on occasion. I guess I'd just prefer it if analyses were done on similar up and comers with similar opponents. I guess this has already been done or is being done as I type.
A minor quibble with your first point. After a quick eyeball of Deleng's histogram I'd suggest his dataset contains far more than 278 games. At a guess I'd say 380+ games went into producing that. Not sure if it's the same dataset or what though. Edit:
This seems to be the original Niemann dataset. I computed 407 games listed (but I might be misunderstanding/miscalculating).