Data Science

SARS-CoV-2 (“Corona”) Data Sources and APIs

Last update: 2020-04-01

You just have to write about something, and it changes. In this case, that’s pretty good 🙂 . So here’s a list of APIs and data sources for the CoV-2 pandemia.


Data sources:

Dashboards and visualizations:


Data Science

SARS-CoV-2, a.k.a “Corona Virus”, pseudo “data science” and Twitter

Some terminology (taken from – who would have thought it – and background information:

  • SARS-CoV-2 is the correct name of the virus. Everybody calls him “the Corona Virus” though, which is technically incorrect, cause there’s only the family of corona viruses – it’s a group, not a single one.
  • COVID-19 is the name of the disease. It’s an acronym made of “COrona VIrus Disease 19“. (It’s basically the same as HIV and AIDS – one is the infection, the other the actual illness).
  • It’s not the flu, but I think everybody has that down by now
  • It’s R0-value (how many people are infected by one person on average) is about 3 (source: RKI Germany)

All of us are following the development of this very closely, cause we have no choice since we’re all basically locked at home. And even if we would go out everything’s closed. That gives us a ton of time to play DOOM Eternal (me), or write twitter bots who regularly publish the numbers of COVID-19 infections in Germany (me as well). As macabre as it is, this is very good oppotunity to start fooling around with data science. But to do this you need datasets, which are surprisingly hard to get. For the current case numbers  I found those:

There’s probably more, but like I said – it’s surprisingly hard to find regularly updated, publicly available data sets in machine-readable form. (Also, this is the first time I go looking for this stuff, so maybe I just have no clue).

Now back to the twitter bot. What does it do exactly? Go look for yourself, but if you are too lazy:

  • It prints the current infection rate at 8h, 12h, 16h, 20h. At the last time it will include a graph, a 1-week-forecast and a small evaluation how are stand today in comparison to the forecast which would have been made a week ago.
  • A friend helped me greatly by providing first a Holt-Winters prediction (which is live now), and then upgraded this to a more intelligent ARIMA prediction, which is still buried in a jupyter notebook and waiting for daylight. (Maybe she will write a guest post here to explain what it does and how it works – but I haven’t asked yet).
  • As for the bot’s code – it’s not (yet) public. Which is unusual for me, really. I should remedy that.

What I learned so far:

  • It’s surprsingly easy and hard at the same time to get a Twitter developer account
  • Twitter does not permit publishing the exact same tweet multiple times in a short period of time
  • Heroku is really really nice for this, as long as you don’t need to pay for it. I would be interested in alternatives.
  • matplotlib is a lot more complicated than I expected
    • but strangely neither bokeh, nor plotly can actually export png graphics without either a separate electron app (WTF?) or a headless browser and selenium (W-T-F?!?)
  • pandas rocks, or more precisely: pandas data frames rock.
  • there does not seem to be a single properly maintained Twitter library for JavaScript, but there is at least tweepy for Python. (I mean I want to try something in JS, but honestly, if everything I find is outdated, I just stay with good old Python …)
  • The infection growth is intensely exponetial – almost a straight line on a log scale plot.

Let’s see where this goes, and I hope you all stay healthy.


Configure Python on Windows

All right, I have a Windows machine. It’s a PITA, but it’s here. And for some reason I started doing some Python testing on it. So this is how I managed to do it:


  • Install python with choco (choco install -y python)
  • Run PowerShell as Administrator
    • Execute Set-ExecutionPolicy -ExecutionPolicy Unrestricted (we’ll see why in a very short time)

Now to code it’s pretty similar to *NIX:

  • Create your code folder
  • Set up a python venv (python -m venv .env)
  • In VS Code, choose this interpreter

So why the PowerShell stuff? Cause to activate the environment VS Code needs to execute a .ps1 script. Which it can’t, cause “executing scripts is disabled on this machine”, which seems to be the default setting.

All in all, surprisingly straightforward. And I just noticed even the *NIX keyboard shortcuts (CTRL-A, CTRL-K, for example) work in the terminal window now. Crazy.



Under macOS I use TextExpander, under Windows there’s the fantastic AutoHotkey. One of the few softwares I can’t live without.

This is my default configuration:

; ---------- "auto reload" ----------

FileGetTime ScriptStartModTime, %A_ScriptFullPath%
SetTimer CheckReload, 1000, 0x7FFFFFFF ; ms & priority

; from here:
CheckReload() {
    global ScriptStartModTime
    FileGetTime curModTime, %A_ScriptFullPath%
    If (curModTime <> ScriptStartModTime) {
            Sleep 300 ; ms
            MsgBox 0x2, %A_ScriptName%, Reload failed. ; 0x2 = Abort/Retry/Ignore
            IfMsgBox Abort
            IfMsgBox Ignore
        } ; loops reload on "Retry"

; ---------- actual content here ----------

; removed all my email address shortcuts ...

  ; from here:
  Send, %A_YYYY%-%A_MM%-%A_DD%

  Send, %A_YYYY%%A_MM%%A_DD%_%A_Hour%%A_Min%%A_Sec%