Updates from April, 2018 Toggle Comment Threads | Keyboard Shortcuts

  • penguin 18:10 on 2018-04-24 Permalink | Reply
    Tags: , gitlab   

    GitLab spot runners & Puppet 

    We are on AWS with GitLab. For ease of use, and because our build hosts degenerate for some reason (network issues), we decided to use spot instances with GitLab.

    The journey was all but easy. Here’s why.

    GitLab Runner configuration complaints

    First: The process

    To configure GitLab runner, you have to …

    • install GitLab,
    • write down the runner registration token,
    • start a runner,
    • manually a registration command using above token.

    That registration command will then modify the config file of the runner. That is important because you can’t just write a static, read-only config file and start the runner. This is not possible for two reasons:

    • when you execute the registration command, the runner wants to modify the config file to add yet another token (its “personal” token, not the general registration secret), so it must not be read-only
    • the runner has to be registered, so just starting it will do … nothing.

    That is in my eyes a huge design flaw, which undoubtedly has its reasons, but it still – sorry – sucks IMHO.

    Second: The configuration

    You can configure pretty much everything in the config file. But once the runner registers, the registration process for some reason appends a completely new config to any existing config file, so that … the state is weird. It works, but it looks fucked, and feels fucked.

    You can also set all configuration file entries using the gitlab-runner register  command. Well, not all: The global parameters (like, for example, log_level  or concurrent ) cannot be set. Those have to be in a pre-existing config file, so you need both – the file and the registration command, which will look super ugly in a very short time.

    Especially if you still use Puppet to manage the runners, cause then you just can’t just restart the runner once the config file changes. Because it will always change, because of above reasons.

    Third: The AWS permission documentation

    Another thing is that the list of AWS permissions the runner needs in order to create spot instances is nowhere to be found. Hint: EC2FullAccess  and S3FullAccess is not enough. We are using admin permissions right now, until we figured it out. Not nice.

    Our solution

    For this we’re still using Puppet (our K8S migration is still ongoing), and our solution so far looks like this:

    • Create a config file with puppet next to the designated config file location,
      • containing only global parameters.
      • The file has a puppet hook which triggers an exec that deletes the “final” config file if the puppet-created one has changed.
    • Start the GitLab runner.
    • Perform a “docker exec” which registers the runner in GitLab.
      • The “unless” contains a check that skips execution if the final config file is present.
      • The register  command sets all configuration values except the global ones. Like said above, the command appends all non-global config settings to any existing config file.

    Some code

    Does this look ugly? You bet.

    Should this be a puppet module? Most probably.

    Did I foresee this? Nope.

    Am I completely fed up? Yes.

    Is this stuff I want to do? No.

    Does it work?

    Yes (at least … 🙂 )


    If you wander what all those create::THING  entries are – it’s this:

    We have an awful lot of those, cause then we can do a lot of stuff in the config YAMLs and don’t need to go in puppet DSL code.

  • penguin 16:49 on 2016-02-26 Permalink | Reply  

    CI/CD, the status quo 

    … is quickly summed up: Working. 🙂

    So I am – naturally – a happy camper. Although, there are a couple of things I would like to change, which don’t scale or are in a state I’m not happy with for “v1.0” yet.

    Container data persistency

    Rancher does offer convoy, but I’m not sure it really fits my needs, and even if I’m not using it yet. And it starts to hurt not to have this. Badly. (AWS Elastic File System would be excactly what I need, but that’s not going to happen I fear). And even this is the most hurting part, not sure if this is the one I can solve quickest.

    Pressure: 9


    It’s embarassing, but – I don’t have host monitoring. I don’t even remember how often I needed to re-create a host cause the disk went full without me noticing (and recreation is just so much easier than fixing, actually, so at least this is a good sign). The current evaluation candiate is sysdig cloud. That might change to DataDog. And don’t get me started on log management.

    Pressure: 8,5


    While I like TeamCity, it’s – like all things Java – horribly, completely, fully over-engineered mess with billions of functions which all don’t quite do what you want. In our case it’s also reduced to executing basically 3 scripts. So TeamCity must go, mid term. Replacement? No idea. Gradle, drone.io, wercker, distelli or Travis seem viable candidates. (Also found an interesting article about online CI tools which is well worth a read).

    Pressure: 7


    It’s written by me. And it sucks, and, although it works, it must go.

    Pressure: 7


    Currently I am focused on “getting things to usable”. Security is not on that list. I need to address some of the issues which I ignored until now, mainly because they might have architecture impact. (Which I believe I planned pretty well, but who knows until you have to actually do it, right?)

    Pressure: 6


    I am still using Puppet. Might be overkill, but it’s reliable once set up. That brings me to … the set up. I am still using masterless puppet, one environment, and pulling the repo from each host each time. Simple, robust, working, but not elegant. I see different environments coming up, and then … hm. It’s gonna be complicated.

    Pressure: 6


    Rancher is awesome. But I’m so desperately waiting for this feature right here to arrive it’s not funny any more 🙂 . I also need the same thing for host-based services (which are not inside of a container orchestration platform). I want to spawn a service somewhere, and an Amazon Route53 entry should appear automatically. Preferably based on a consul readout. So DNS management becomes a non-issue.

    Pressure: 5


    If one host breaks, the whole cloud is inoperable. Which one? The NAT host. Needed for any host to get a connection to the outside in AWS. And that’s just one breaking point of a couple. Critical? Not really. Super annoying? You bet.

    Pressure: 4


    Currently I use cloudformation to set everything up. And although it’s an awesome product, it’s kinda limited. If a host is gone, I can’t call “cloudformation fix-it” and this very host will respawn. With tools like terraform this is possible, but terraform relies (or relied the last time I looked) on local state data, which is a big pffft. It also means to really test an updated cloudformation template I have to recreate the full thing, which is just too cumbersome. (Takes about 1 hour to completely migrate everything, if done quickly, which is awesome for a complete outage, but really bad for testing). But maybe I just don’t know enough and my template is way too tightly written.

    Pressure: 3


    • Seth Paskin 18:11 on 2016-02-26 Permalink | Reply

      If you are looking for monitoring, check out TrueSight Pulse (formerly Boundary).


      Free trial for 14 days. I’m the marketing manager for the product so if you’ve got questions or want to connect with the dev team on issues specific to what you are doing, let me know.

compose new post
next post/next comment
previous post/previous comment
show/hide comments
go to top
go to login
show/hide help
shift + esc