# Thread: Corona Virus math and statistics

1. ## Corona Virus math and statistics

To separate theory a bit from general talk I decided to make a new thread. (Not everyone is interested in or likes talking about the math of an ongoing pandemic.) I have been talking about numbers a bit on reddit but reddit is a horrible format for discussion and things pretty much disappear after a day or two. So I thought I would do it here instead.

So let's start with the differences between SK and Germany. looking at the SK data I think it is probably pretty good (at least for not horrible overburdened hospitals.) But Germany has a much lower ratio currently. I can't say how much is from insufficient data or differences in the hospital care so I just wanted to see how much normal statistics can account for. (Well back of the envelop math only.) With the data from the 26th germany has 253 death, 43938 total cases for a CRF of 0.58 percent. Against 131/9241=1.4% in South Korea. So 615 death in germany would be the same ratio.

First age, the infected in germany tend low. Here is data from SK for how likely each age bracket is to die if they are infected. Here are the ages of the infected in germany. The age brackets don't match so I assume that in germany's age brackets the numbers are evenly distributed (and hope the resulting error isn't too large.) and then multiply with the SK fatality rate. Also I assume that sub 30 no one dies. That gives:

0.12%*(2750+4200) + 0.09%*8400 + 0.38%*8400 + 1.37%*3600 + 5.27%*3600 + 9.26%*1669=441

441/615=0.717 so I think we can account for a decent amount by adjusting for age. But 253/442=0.57 so there is still a good

The tricky bit is adjusting for time delay, people don't die instantly. Exponential case growth stopped in SK a while ago. And the ratio of South Korea has grown after their exponential case growth stopped. It is possible that some of the new cases just have a worse ratio but I think much of that is probably existing cases dying. I don't know the average time to death after detection or anything so the best I can come up with is:

https://www.worldometers.info/corona...y/south-korea/ on the 9th I would say their exponential phase stopped. From there it only grew from 7478 to 9241 total cases so the change in ratio might be a decent indicator for how the ratio changes if there isn't a constant increasing influx of new cases. And hospital stays supposedly can be pretty long so 2 and a half weeks might not be to extreme for resolving them. (Although some cases are of course older, but thanks to exponential growth the majority aren't that much older. On march 2 it was at 4335, a weeks before that on feb 24 they were at 833 cases and a week before that at 30, still 1-2 weeks longer is a long time for it to play out.) The ratio on the 9th was 53/7478=0.71%, so their ratio was 0.71%/1.4%=51% of what it is now. And as I said "253/442=0.57" so if for the current cases in Germany the numbers change in a similar way the resulting CFR would be pretty similar. Though maybe two and half weeks are a bit much, after just on week on the 16th it is 75/8236=0.91%, that would just give 0.71/0.91=78%.

Now the age part makes sense I think, the time delay part is a bit weak though. I just don't know how to do that better, so if anyone has a better idea/knows how to handle statistics better that would be nice.
Still I think a good part of the Germany-SK difference can be explained with age and time gap. Which is too bad because it would be nice if the true death rate is closer to the current one in Germany instead of the current one in SK.

Another made up statistic: There is a 90% chance this thread will die quickly since there isn't all that much statistic wise I want to talk about and I have no idea whether somebody else has something they want to talk about in that regard.

Edit:
- COVID-19 patients average time on ventilator: 11 - 21 days (vs. 3 - 4 days for non-COVID-19 patients). "We have patients that have been 20 days 30 days on a ventilator. The longer you are on a ventilator, the more likely you are not going get off a ventilator"
From the NY gov https://twitter.com/NYGovCuomo/statu...99576183410688 as context for the delay.

Edit 2: Nothing to do with this but the micronation of San Marino will be interesting to watch https://en.wikipedia.org/wiki/2020_c..._in_San_Marino Super small sample size of course but if it keeps spreading it might soon provide a decent lower bound of deadliness (though only for the really optimistic people)

As of 27 March 2020: With 223 confirmed cases out of a population of 33,344 (as of 2018), it is the country with the highest percentage of confirmed cases per capita at 0.67% – 1 confirmed case per 150 inhabitants.[2] Also, with 21 confirmed deaths, the country has the highest rate of confirmed deaths per capita at 0.063% of the total population – 1 death per 1,588 inhabitants
If they manage another doubling of deaths (and I hope they don't a small town sized country is probably hit pretty hard by extra death) we will have passed 0.1% of their population so flu level if their whole nation already had it.

Edit: To not double post i shall put it into the same post, I was curious how off the case fatality rate from the IFR would be with different disease lengths and days to death, it looks like this https://ibb.co/BV0KqGr . I initialize with 100 cases and grow them by a set percentage per day, and 1% of the infected die exactly x days later (the number you see at the end of the curves.) Caveat in reality there is spread and if it is death after 10+-5 days moves the CFR in direction of 5 days. (Because exponentially growing case number means people from a later day dying early have a bigger impacts than people from an earlier day dying later.)

Ah I should probably remind people that most places do not have perfect data rather they overestimate the current CFR quite a bit so Italy isn't even higher.

Super simple python script, didn't bother to post it because it is super simple and not all that useful, but if someone wants it tell me.

2. ## Re: Corona Virus math and statistics

Now the age part makes sense I think, the time delay part is a bit weak though. I just don't know how to do that better, so if anyone has a better idea/knows how to handle statistics better that would be nice.
Still I think a good part of the Germany-SK difference can be explained with age and time gap. Which is too bad because it would be nice if the true death rate is closer to the current one in Germany instead of the current one in SK.
You could try fighting the existent data to various ARIMA models and see if fits one, and use that to project things into the future. ARIMA is often used for time-series analysis, and while some of it is more random (the "MA" part is for Moving Average, which is by definition based on randomness), the Auto-Regressive and I parts (which I remember less about since I took my time series course in statistics) are probably more relevant.

Since it spread differently and at different points of time in different countries, it'd probably be good to have the country be a major classification variable. Or do models independently for each nation.

I have some R code that runs several ARIMA models on a set of data to let you see which ones fit best and which ones are just invalid... but I can't access the computer with that code right now. Or I'd share it. Basically a loop that does analysis and checks assumptions for ARIMA models from (0,0,0)* up to (3,3,3)**.
*no AR, I, or MA elements
**3 AR, I, and MA elements, respectively. You can keep adding more elements, but then it eventually gets "loose" enough that the model can fit any data as it's just several tight curves, so any data point would be somewhere near a curve

3. ## Re: Corona Virus math and statistics

The most recent, large scale study from Denmark shows a mortality rate of 0.18%.

4. ## Re: Corona Virus math and statistics

I don't think a very simple model / approach is going to give you a good idea of spread or mortality.
Both depend on a lot of factors that are hard to implement.
Population density and behavior for one are hard to model when it comes to spread, obviously.
And Corona is the kind of disease where mortality is strongly dependent on medical care (and age and a bunch of other things but..) . Yes, you might die even if you get the best care or you might survive with little or no care, but on a population scale, this matters a lot.
E.g. between Germany and SK while neither is perfect or horrible, SK has a stronger gradient between poor and rich, especially for 'luxury' like health care. It seems likely that would mean a higher mortality even if everything else was identical (which it isn't, of course)

So of course you can get some idea by comparing and contrasting numbers, but I'm not sure you'll find what I think you want to find (?)

5. ## Re: Corona Virus math and statistics

Originally Posted by JeenLeen
You could try fighting the existent data to various ARIMA models and see if fits one, and use that to project things into the future. ARIMA is often used for time-series analysis, and while some of it is more random (the "MA" part is for Moving Average, which is by definition based on randomness), the Auto-Regressive and I parts (which I remember less about since I took my time series course in statistics) are probably more relevant.

Since it spread differently and at different points of time in different countries, it'd probably be good to have the country be a major classification variable. Or do models independently for each nation.

I have some R code that runs several ARIMA models on a set of data to let you see which ones fit best and which ones are just invalid... but I can't access the computer with that code right now. Or I'd share it. Basically a loop that does analysis and checks assumptions for ARIMA models from (0,0,0)* up to (3,3,3)**.
*no AR, I, or MA elements
**3 AR, I, and MA elements, respectively. You can keep adding more elements, but then it eventually gets "loose" enough that the model can fit any data as it's just several tight curves, so any data point would be somewhere near a curve
Thanks for the suggestion it is interesting though it doesn't really matter anymore by now. Numbers move too fast to look at data from two weeks ago. Germany cfr for known cases is up to 2% or so (so similar to SK now I guess) and there are preliminary results from random antibody testing in a town in the hardest hit region suggesting 0.37% cfr https://www.land.nrw/sites/default/f...dy_gangelt.pdf (caveat in a town with 11634 people 15% having or currently being infected and ifr of 0.37% comes down to 6.5 deaths (they don't give the actual death number) so one more or less would move the result noticeably)

6. ## Aren't we looking at a first order derivative? [CV]

So can somebody explain to me why we are being told that everything is going to get/getting better and we are "past the peak" in some places, when we are looking at the increase in cases per day going down. Flatten the curve I get because that is about the number of patients that need treatment at one time, but how can things be getting better if you go from 500 new cases in a day to 400? You still have 900 new cases over the course of two days. You didn't go down by 100 cases, you still went up by 400.

I'm inclined to believe this is people not understanding how math works/what they are looking at, but I'm open to being wrong. It just seems to me that we are looking at a derivative when we should be worried about the integral...

7. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by PallentisLunam
So can somebody explain to me why we are being told that everything is going to get/getting better and we are "past the peak" in some places, when we are looking at the increase in cases per day going down. Flatten the curve I get because that is about the number of patients that need treatment at one time, but how can things be getting better if you go from 500 new cases in a day to 400? You still have 900 new cases over the course of two days. You didn't go down by 100 cases, you still went up by 400.

I'm inclined to believe this is people not understanding how math works/what they are looking at, but I'm open to being wrong. It just seems to me that we are looking at a derivative when we should be worried about the integral...
By that logic, things can't ever get better, and will only stop getting worse when there are 0 new cases per day.

When analyzing the spread of an infectious disease, those 500 cases from the previous day are irrelevant. They're old news, and you can't change the past to make them never have gotten infected. One way or another, they will reliably be removed from the list of current cases after some time period. Trying to make sure that removal is because they recovered rather than because they died is a matter for treatment, not containment.

For stopping an epidemic, the rate of new cases per day is everything. If every day you get 10% more new cases than the day before, then you have an exponential growth curve and things are going to get very very much worse if something isn't done to change it. If instead you get 10% fewer new cases each day, then you have an exponential decay curve, and if nothing changes then the epidemic will peter out to nothing after a while.

Thinking in terms of derivatives and such, yes the integral (total cases) is the thing we ultimately should be worrying about. However, if the derivative (new cases per day) is increasing, then the naive extrapolation of the integral diverges to infinite. When new cases per day is consistently decreasing, the extrapolation of the integral indefinitely into the future converges to a finite number.

Thus, increasing new cases per day vs decreasing new cases per day is the difference between "oh crap, we are so screwed" vs "the end is in sight".

8. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by PallentisLunam
So can somebody explain to me why we are being told that everything is going to get/getting better and we are "past the peak" in some places, when we are looking at the increase in cases per day going down. Flatten the curve I get because that is about the number of patients that need treatment at one time, but how can things be getting better if you go from 500 new cases in a day to 400? You still have 900 new cases over the course of two days. You didn't go down by 100 cases, you still went up by 400.
Using your example, when you have those 500 people cured, you only have 400 blocking hospital beds/needing care. Thus, the strain on your medical system going down.
So yes, you are "over the peak"in three weeks when you cured those ppl.

9. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by Douglas
By that logic, things can't ever get better, and will only stop getting worse when there are 0 new cases per day.

When analyzing the spread of an infectious disease, those 500 cases from the previous day are irrelevant. They're old news, and you can't change the past to make them never have gotten infected. One way or another, they will reliably be removed from the list of current cases after some time period. Trying to make sure that removal is because they recovered rather than because they died is a matter for treatment, not containment.

For stopping an epidemic, the rate of new cases per day is everything. If every day you get 10% more new cases than the day before, then you have an exponential growth curve and things are going to get very very much worse if something isn't done to change it. If instead you get 10% fewer new cases each day, then you have an exponential decay curve, and if nothing changes then the epidemic will peter out to nothing after a while.

Thinking in terms of derivatives and such, yes the integral (total cases) is the thing we ultimately should be worrying about. However, if the derivative (new cases per day) is increasing, then the naive extrapolation of the integral diverges to infinite. When new cases per day is consistently decreasing, the extrapolation of the integral indefinitely into the future converges to a finite number.

Thus, increasing new cases per day vs decreasing new cases per day is the difference between "oh crap, we are so screwed" vs "the end is in sight".
Okay, in the long term yes, but why are people acting like it is better right now. From the news sources I have been watching things seem to be going from "we are so screwed" skipping right over "the end is in sight" to land in "everything is fine now" territory.

Originally Posted by Rydiro
Using your example, when you have those 500 people cured, you only have 400 blocking hospital beds/needing care. Thus, the strain on your medical system going down.
So yes, you are "over the peak"in three weeks when you cured those ppl.
Okay so through put or the processing rate is important, and of course difficult to quantify since this thing seems to put people down for anywhere from a few days to a few weeks. It still seems like the picture provided by new cases per day is incomplete.

10. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by Rydiro
Using your example, when you have those 500 people cured, you only have 400 blocking hospital beds/needing care. Thus, the strain on your medical system going down.
So yes, you are "over the peak"in three weeks when you cured those ppl.
Yeah, here we talked about being past the peak when the number of people classified as newly recovered was greater than the number of newly infected.

11. ## Re: Corona Virus math and statistics

Public/media portrayal can be weird. Though it also depends on the area you are talking about, say if the USA stopped increasing its death per day number now it would be getting away relatively unhurt (except for new york). Currently the US is only at 62 death per million, for comparison Spain is at 355 death per million and NY at 440 death per million or 730 if you restrict it to the NYC. Part of what is scary about fast exponential growth is that as long as it happens the current damage will be dwarfed in the coming weeks. So if exponential growth stops before it gets really bad that is a relief. (Though you should of course never go by a single day that is better.)

12. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by PallentisLunam
Okay, in the long term yes, but why are people acting like it is better right now. From the news sources I have been watching things seem to be going from "we are so screwed" skipping right over "the end is in sight" to land in "everything is fine now" territory.
Bear in mind that some parties involved have a vested interest in saying 'everything is all right, go back to work' and restarting the economy, even when the situation doesn't warrant it.

13. ## Re: Aren't we looking at a first order derivative? [CV]

Originally Posted by Brother Oni
Bear in mind that some parties involved have a vested interest in saying 'everything is all right, go back to work' and restarting the economy, even when the situation doesn't warrant it.
I agree. A lot of the people that are saying it's almost over are pertty transparently channeling the mayor from the movie Jaws

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•