not got sleep all night so i skimmed it. the analysis seems simple enough, the deletion very suspicious.
i also skimmed the paper where the sequences were initially reported:
https://onlinelibrary.wiley.com/doi/full/10.1002/smll.202002169
interestingly the (old, chinese) paper (cited as wang 2020b in the preprint) also discusses the mutation T28144C talked about in the preprint ... and the chinese paper reports:
By contrast, it seems the new preprint says that it (the T28144C, or "S type" is present 6 of 13 samples (Table 1), reversing their relative abundance.
So uhhh i don't know what's up. i think the new pre-print is using a less strict criteria to include or exclude sequences. but table 1 also has fewer total sequences analysed compared to the old paper's table 1, though less stringency should give more samples.
i'll probably need some time and properly awake thinking to figure out what's going on. i suspect it's either that the chinese paper includes a mix of samples from different places analysed together and the new one chooses what it sees the most relevant. or that the new one's less different cutoffs have led to changes in the result.