Data Science

SARS-CoV-2, a.k.a “Corona Virus”, pseudo “data science” and Twitter

Some terminology (taken from – who would have thought it – and background information:

  • SARS-CoV-2 is the correct name of the virus. Everybody calls him “the Corona Virus” though, which is technically incorrect, cause there’s only the family of corona viruses – it’s a group, not a single one.
  • COVID-19 is the name of the disease. It’s an acronym made of “COrona VIrus Disease 19“. (It’s basically the same as HIV and AIDS – one is the infection, the other the actual illness).
  • It’s not the flu, but I think everybody has that down by now
  • It’s R0-value (how many people are infected by one person on average) is about 3 (source: RKI Germany)

All of us are following the development of this very closely, cause we have no choice since we’re all basically locked at home. And even if we would go out everything’s closed. That gives us a ton of time to play DOOM Eternal (me), or write twitter bots who regularly publish the numbers of COVID-19 infections in Germany (me as well). As macabre as it is, this is very good oppotunity to start fooling around with data science. But to do this you need datasets, which are surprisingly hard to get. For the current case numbers  I found those:

There’s probably more, but like I said – it’s surprisingly hard to find regularly updated, publicly available data sets in machine-readable form. (Also, this is the first time I go looking for this stuff, so maybe I just have no clue).

Now back to the twitter bot. What does it do exactly? Go look for yourself, but if you are too lazy:

  • It prints the current infection rate at 8h, 12h, 16h, 20h. At the last time it will include a graph, a 1-week-forecast and a small evaluation how are stand today in comparison to the forecast which would have been made a week ago.
  • A friend helped me greatly by providing first a Holt-Winters prediction (which is live now), and then upgraded this to a more intelligent ARIMA prediction, which is still buried in a jupyter notebook and waiting for daylight. (Maybe she will write a guest post here to explain what it does and how it works – but I haven’t asked yet).
  • As for the bot’s code – it’s not (yet) public. Which is unusual for me, really. I should remedy that.

What I learned so far:

  • It’s surprsingly easy and hard at the same time to get a Twitter developer account
  • Twitter does not permit publishing the exact same tweet multiple times in a short period of time
  • Heroku is really really nice for this, as long as you don’t need to pay for it. I would be interested in alternatives.
  • matplotlib is a lot more complicated than I expected
    • but strangely neither bokeh, nor plotly can actually export png graphics without either a separate electron app (WTF?) or a headless browser and selenium (W-T-F?!?)
  • pandas rocks, or more precisely: pandas data frames rock.
  • there does not seem to be a single properly maintained Twitter library for JavaScript, but there is at least tweepy for Python. (I mean I want to try something in JS, but honestly, if everything I find is outdated, I just stay with good old Python …)
  • The infection growth is intensely exponetial – almost a straight line on a log scale plot.

Let’s see where this goes, and I hope you all stay healthy.


Configure Python on Windows

All right, I have a Windows machine. It’s a PITA, but it’s here. And for some reason I started doing some Python testing on it. So this is how I managed to do it:


  • Install python with choco (choco install -y python)
  • Run PowerShell as Administrator
    • Execute Set-ExecutionPolicy -ExecutionPolicy Unrestricted (we’ll see why in a very short time)

Now to code it’s pretty similar to *NIX:

  • Create your code folder
  • Set up a python venv (python -m venv .env)
  • In VS Code, choose this interpreter

So why the PowerShell stuff? Cause to activate the environment VS Code needs to execute a .ps1 script. Which it can’t, cause “executing scripts is disabled on this machine”, which seems to be the default setting.

All in all, surprisingly straightforward. And I just noticed even the *NIX keyboard shortcuts (CTRL-A, CTRL-K, for example) work in the terminal window now. Crazy.


Misc Django I – forms

Custom form errors

If you want to validate something in the view, and return with a custom error message in the same form, you can use the “Form.add_error(fieldname, errorstring)” method. And then, of course, return to the previous template.

class MyView(View): 
    def get(self, request): 
        data = form.cleaned_data
        if len(res) > 0:
            form.add_error( 'login', "Diese Personalnummer existiert bereits!")
        return render(request, 'my_template.html', {'form': form})

Dynamic choice fields in forms

You want a form which fills its choice field from the database? And if the database changes, if you reload the page, the form should change as well? Of course! Django got you covered.

class UserForm(forms.Form): 
    def __init__(self, *args, **kwargs):
        super(UserForm, self).__init__(*args, **kwargs) 
        self.fields['site'] = forms.ModelChoiceField( label="Site", queryset=Site.objects.all().order_by('name'), ) 
        for field in ('department', 'office', 'phone'):

    login = forms.CharField(label="Login")
    email = forms.EmailField(label="Email")
    site = None # this is set in __init__() :)
    department = forms.CharField(label="Department")
    office = forms.CharField(label="Office")
    phone = forms.CharField(label="Phone")

… now, why the “for field in (‘department’ …)” line you ask?

Simple. The fields dict is an OrderedDict. If you replace a field it is appended to the end again. So in the form the “Site” input box would be displayed last, although it makes more sense to display it where it is in the original definition.

Using “.move_to_end()” you can re-adjust this. If someone knows a better method … feel free to tell me.

(Sources: here)


Python & Visual Studio code

The official python plugin claims that the interpreters of Pipenv are automatically found.

They are not.

At least not on my machine.

Here’s how you set them.


JIRA and Python

I really came to hating JIRA with a passion. And now I have to create about 350 tickets.

Naturally I don’t do this by hand. But using the JIRA API is kind of … hard, there is a Python library, but usage is rather sparsely documented, and this whole thing is just annoying as hell.

Although when you did it, it’s quite simple. Here’s an example of how to create a ticket with due date and estimate set from a CSV file:

from pprint import pprint, pformat
from getpass import getpass

import jira
def do_shit():
    url = read("JIRA url:")
    username = read("JIRA username:")
    password = getpass("JIRA password:")
    jira_inst = jira.JIRA(url, basic_auth=(username, password))
    test = {
        "project":          {"key": "TEST"},
        "issuetype":        {"name": "Task"},
        "labels":           ["deleteme", "whatisthis"],
        "summary":          "woho",
        "description":      "wohooooo",
        "duedate":          "2017-11-11",
        "timetracking":    {"originalEstimate": "4d"},
    created = jira_inst.create_issues([test])
if __name__ == "__main__":

So simple.


PyCharm, Arch linux & Python 3.6

Love Python. Love PyCharm. Love Arch Linux.

Unfortunately Arch sneakily updated Python to 3.6. Cool, new version … but hey, why don’t my debug runs in PyCharm work any more??

ImportError: cannot open shared object file: No such file or directory

Yup, pretty confusing. It seems unable to find shared python 3.5 library. Well. After some cursing, turns out the solution is pretty simple (if you know what to do):

  • get pyenv
  • use pyenv to install Python 3.5.2, but with –enable-shared option set
  • use this python version for PyCharm projects (it does not matter if it’s in a virtualenv or not)

Like this:

$ PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.5.2
$ sudo $HOME/.pyenv/versions/3.5.2/bin/python "/opt/pycharm-professional/helpers/pydev/" build_ext --inplace
$ _

That solved it for me 🙂


Really annoying thread properties

This sucks monkey ass, mainly because I didn’t think of that before. And that’s just one example why multi-threaded (soon to be -processing, probably) applications are hard.

import subprocess as sp
import time
import os
from threading import Thread

class MyThread(Thread):

    def __init__(self, mydir):
        self.mydir = mydir

    def run(self):
        print("I'm (%s) in directory %s"
              % (str(self), os.getcwd()))

if __name__ == "__main__":

Result is:

I'm (<MyThread(Thread-1, started 140195858716416)>) in directory /
I'm (<MyThread(Thread-2, started 140195850323712)>) in directory /

OpenStack floating IP convenience

Problem: I am working in a tenant which has a couple of hosts with floating IPs assigned. I always have to look them up either manually using the command line clients (and dealing with all those UUIDs), or manually in the web GUI. Didn’t like.

Solution: Python script, which outputs FLOATING_IP -> HOST_NAME.

Here it is.

#!/usr/bin/env python

from novaclient import client
import novaclient.v2.floating_ips as os_fips
import novaclient.v2.servers as os_servers
import novaclient.v2.networks as os_networks

#from pprint import pprint as pp
import os
from sys import exit

def error(printme):
    print("ERROR: {}".format(printme))

def check_env():
        if not os.environ.get(a): error("Please set ${}".format(a))

def get_client():
    return client.Client(2,

if __name__=="__main__":
    nova = get_client()
    fipman = os_fips.FloatingIPManager(nova)
    servman = os_servers.ServerManager(nova)
    netman = os_networks.NetworkManager(nova)

    ips = fipman.list()
    srs = servman.list()

    id2server = {}
    for a in srs:
        id2server[] = a

    ips = [(ip.ip, ip.instance_id) for ip in ips]
    # filter out unused floating ips (which have  as instance id)
    ips = filter(lambda x: x[1], ips)
    # create (IP, SERVER_NAME pairs)
    ips = map(lambda x: (x[0], id2server[x[1]].name), ips)
    # sort for convenience by host instance name
    ips = sorted(ips, key=lambda x: x[1].lower())

    for a in ips:
        print("{:18s} {}".format(a[0], a[1]))

Sample output:

$ tools/list_ips      ab_1695_1_dml      ab_1695_1_jump      media0       RJD1_CouchbaseA1       RJD1_JumpServer       RJD1_OpenshiftMaster      tla-centos66


Main sources: