CICD Puppet Snippets

GitLab spot runners & Puppet

We are on AWS with GitLab. For ease of use, and because our build hosts degenerate for some reason (network issues), we decided to use spot instances with GitLab.

The journey was all but easy. Here’s why.

GitLab Runner configuration complaints

First: The process

To configure GitLab runner, you have to …

  • install GitLab,
  • write down the runner registration token,
  • start a runner,
  • manually a registration command using above token.

That registration command will then modify the config file of the runner. That is important because you can’t just write a static, read-only config file and start the runner. This is not possible for two reasons:

  • when you execute the registration command, the runner wants to modify the config file to add yet another token (its “personal” token, not the general registration secret), so it must not be read-only
  • the runner has to be registered, so just starting it will do … nothing.

That is in my eyes a huge design flaw, which undoubtedly has its reasons, but it still – sorry – sucks IMHO.

Second: The configuration

You can configure pretty much everything in the config file. But once the runner registers, the registration process for some reason appends a completely new config to any existing config file, so that … the state is weird. It works, but it looks fucked, and feels fucked.

You can also set all configuration file entries using the gitlab-runner register  command. Well, not all: The global parameters (like, for example, log_level  or concurrent ) cannot be set. Those have to be in a pre-existing config file, so you need both – the file and the registration command, which will look super ugly in a very short time.

Especially if you still use Puppet to manage the runners, cause then you just can’t just restart the runner once the config file changes. Because it will always change, because of above reasons.

Third: The AWS permission documentation

Another thing is that the list of AWS permissions the runner needs in order to create spot instances is nowhere to be found. Hint: EC2FullAccess  and S3FullAccess is not enough. We are using admin permissions right now, until we figured it out. Not nice.

Our solution

For this we’re still using Puppet (our K8S migration is still ongoing), and our solution so far looks like this:

  • Create a config file with puppet next to the designated config file location,
    • containing only global parameters.
    • The file has a puppet hook which triggers an exec that deletes the “final” config file if the puppet-created one has changed.
  • Start the GitLab runner.
  • Perform a “docker exec” which registers the runner in GitLab.
    • The “unless” contains a check that skips execution if the final config file is present.
    • The register  command sets all configuration values except the global ones. Like said above, the command appends all non-global config settings to any existing config file.

Some code


# the configuration for the build runner running on the same host, which
# manages the autoscaling-based spot-instance-allocation
global::concurrent:                       '5'

registration_command::docker_image:       'ubuntu:artful'
registration_command::token:              'our-token'
registration_command::url:                'https://our.gitlab.server'

runners::concurrency:                     '1'
runners::limit:                           '10'
runners::name:                            'spot-runner'

cache::bucket_location:                   'eu-central-1'
cache::bucket_name:                       'our-cache-bucket'
cache::shared:                            'true'
cache::type:                              's3'

machine::idle_nodes:                      '0'
machine::idle_time:                       '1800'
machine::max_builds:                      '10'
machine::machine_name:                    'standard-%s'

machine::option::access_key:              "%{hiera('ops::gitlab::spotrunner::id')}"
machine::option::ami:                     'ami-44d48eaf'
machine::option::block_minutes:           '60'
machine::option::iam_profile:             'gitlab-runner'
machine::option::instance_type:           'm4.xlarge'
machine::option::private_address_only:    'true'
machine::option::private_address:         'true'
machine::option::region:                  'eu-central-1'
machine::option::secret_key:              "%{hiera('ops::gitlab::spotrunner::secret')}"
machine::option::spot_instances:          'true'
machine::option::spot_price:              '0.2'
machine::option::subnet_id:               'subnet-subbb'
machine::option::tags:                    'stage,prod'
machine::option::vpc_id:                  'vpc-veepeecee'

machine::option::other:                   >
  --machine-machine-options amazonec2-security-group=cognodev3_sg_docker_machine
  --machine-machine-options amazonec2-security-group=cognodev3-SG0-shubdu
  --machine-machine-options amazonec2-security-group=cognodev3-SG1-shalala
  --machine-machine-options amazonec2-security-group=cognodev3-SG2-shubndibndu
  --machine-machine-options engine-insecure-registry=some.internal.repo
  --machine-machine-options engine-insecure-registry=other.internal.repo:5000



  # gitlab spot runner
    image:              gitlab/gitlab-runner:latest
    pull_on_start:      true
      - /var/docker-apps/gitlab-spot-runner:/etc/gitlab-runner
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/run/.docker:/root/.docker

    ensure: present
    notify: 'Service[docker-gitlab-spot-runner]'
    content: |
      concurrent = %{hiera('global::concurrent')}
      check_interval = 0

  'delete config file':
    command:     rm /etc/gitlab-runner/config.toml
    path:        /bin:/usr/bin:/sbin:/usr/sbin
    refreshonly: true
    subscribe:   File[/etc/gitlab-runner/config.toml.puppet]
    before:      Docker::Exec[register-spot-runner]

# why are we doing this?
# because gitlab WANTS to re-write the config file. and we CANNOT set "global"
# parameters (e.g. "log_level", "concurrent") in using the registration
# command line.
# so, we ...
#    * create a config "config.toml.puppet" file with ONLY global params
#    * rm the config.toml file in case of changes to the .puppet file
#    * re-execute registration using ALL other config settings when the
#      config.toml file is no longer present
# we do this because:
#    * we can't watch the toml file for changes, cause the runner WILL change
#      it
#    * but we WANT to set global parameters, and luckily the runner seems
#      to just append all non-global settings to an existing config file
#      on registration.
# THIS FUCKING SUCKS. I know. but I really have not found another way (abk)

    detach:     false
    container:  gitlab-spot-runner
    tty:        true
    command: >-
      /bin/bash -c 'cp /etc/gitlab-runner/config.toml.puppet /etc/gitlab-runner/config.toml &&
      gitlab-runner register --non-interactive
      --registration-token %{hiera('registration_command::token')}
      --url %{hiera('registration_command::url')}
      --executor "docker+machine"
      --docker-image %{hiera('registration_command::docker_image')}
      --docker-volumes /var/run/docker.sock:/var/run/docker.sock
      --name %{hiera('runners::name')}
      --limit %{hiera('runners::limit')}
      --request-concurrency %{hiera('runners::concurrency')}
      --cache-s3-bucket-location %{hiera('cache::bucket_location')}
      --cache-s3-bucket-name %{hiera('cache::bucket_name')}
      --cache-type %{hiera('cache::type')}
      --machine-machine-driver amazonec2
      --machine-idle-nodes %{hiera('machine::idle_nodes')}
      --machine-idle-time  %{hiera('machine::idle_time')}
      --machine-machine-name %{hiera('machine::machine_name')}
      --machine-max-builds %{hiera('machine::max_builds')}
      --machine-machine-options amazonec2-private-address-only=%{hiera('machine::option::private_address_only')}
      --machine-machine-options amazonec2-use-private-address=%{hiera('machine::option::private_address')}
      --machine-machine-options amazonec2-region=%{hiera('machine::option::region')}
      --machine-machine-options amazonec2-access-key=%{hiera('machine::option::access_key')}
      --machine-machine-options amazonec2-ami=%{hiera('machine::option::ami')}
      --machine-machine-options amazonec2-block-duration-minutes=%{hiera('machine::option::block_minutes')}
      --machine-machine-options amazonec2-iam-instance-profile=%{hiera('machine::option::iam_profile')}
      --machine-machine-options amazonec2-instance-type=%{hiera('machine::option::instance_type')}
      --machine-machine-options amazonec2-request-spot-instance=%{hiera('machine::option::spot_instances')}
      --machine-machine-options amazonec2-secret-key=%{hiera('machine::option::secret_key')}
      --machine-machine-options amazonec2-spot-price=%{hiera('machine::option::spot_price')}
      --machine-machine-options amazonec2-subnet-id=%{hiera('machine::option::subnet_id')}
      --machine-machine-options amazonec2-tags=%{hiera('machine::option::tags')}
      --machine-machine-options amazonec2-vpc-id=%{hiera('machine::option::vpc_id')}
    unless:     test -f /etc/gitlab-runner/config.toml
    require:    Service[docker-gitlab-spot-runner]

Does this look ugly? You bet.

Should this be a puppet module? Most probably.

Did I foresee this? Nope.

Am I completely fed up? Yes.

Is this stuff I want to do? No.

Does it work?

Yes (at least … 🙂 )


If you wander what all those create::THING  entries are – it’s this:

$files = hiera_hash('create::files', {})
if ! empty($files) {
  create_resources('file', $files)

We have an awful lot of those, cause then we can do a lot of stuff in the config YAMLs and don’t need to go in puppet DSL code.


jq makes AWS “describe-instances” actually useful

Just so I don’t forget 🙂

aws ec2 describe-instances | \
  jq '.Reservations[].Instances[] | {IP: .PrivateIpAddress, ID: .InstanceId, Name: .Tags[] | select(.Key=="Name").Value}'


Docker Infrastructure Snippets

Elastic Beanstalk with Docker using Terraform

I just investigate AWS Elastic Beanstalk. And I want to use terraform for this. This is what I’ve done, and how I’ve got it running. I basically do this because the docs for this are either super-long (and are still missing critical points) or super-short (and are also missing critical points), at least what I’ve found.

This should get you up and running in very little time. You can also get all the code from a demo github repository.

General principles

The Architectural Overview is a good page to read to get an idea of what you’re about to do. It’s not that long.

In short, Elastic Beanstalk runs a version of an application in an environment. So the process is: Declaring an application, defining a couple of versions and environments, and then combine one specific version with one specific environment of an app to create an actually running deployment.

The environment is just a set of hosts configured in a special way (autoscaling & triggers, subnets, roles, etc.), whereas the application version is the info about how to deploy the containers on that environment (ports, env variables, etc.). Naturally, you think of having a DEV environment which runs “latest”, and a PROD environment which runs “stable” or so. Go crazy.

Prerequisites & Preparation

For the example here you need a couple of things & facts:

  • An AWS account
  • In that account, you need:
    • an S3 bucket to save your app versions
    • a VPC ID
    • subnet IDs for the instance networks
    • an IAM roles for the hosts
    • an IAM service roles elastic beanstalk. (see bottom for how to create that)
  • Terraform 🙂
  • The aws command line client

Get started

The files in the repository have way more parameters, but this is the basic set which should get you running (I tried once, then added all that stuff). The  file below will create the application and an environment associated with it.

# file:

resource "aws_elastic_beanstalk_application" "test" {
  name        = "ebs-test"
  description = "Test of beanstalk deployment"

resource "aws_elastic_beanstalk_environment" "test_env" {
  name                = "ebs-test-env"
  application         = "ebs-test"
  cname_prefix        = "mytest"

  # the next line IS NOT RANDOM, see "final notes" at the bottom
  solution_stack_name = "64bit Amazon Linux 2016.09 v2.5.2 running Docker 1.12.6"

  # There are a LOT of settings, see here for the basic list:
  # This should be the minimally required set for Docker.

  setting {
    namespace = "aws:ec2:vpc"
    name      = "VPCId"
    value     = "${var.vpc_id}"
  setting {
    namespace = "aws:ec2:vpc"
    name      = "Subnets"
    value     = "${join(",", var.subnets)}"
  setting {
    namespace = "aws:autoscaling:launchconfiguration"
    name      = "IamInstanceProfile"
    value     = "${var.instance_role}"
  setting {
    namespace = "aws:elasticbeanstalk:environment"
    name      = "ServiceRole"
    value     = "${var.ebs_service_role}"


If you run this, at least one host and one ELB should appear in the defined subnets. Still, this is an empty environment, there’s no app running in it. If if you ask yourself, “where’s the version he talked about?” – well, it’s not in there. We didn’t create one yet. This is just the very basic platform you need to run a version of an app.

In my source repo you can now just use the script , followed by . You should be able to figure out how to use them, and they should work out of the box. But we’re here to explain, so this is what happens behind the scenes if you do this:

  1. create a file “ ” with the information about the service (Docker image, etc.) to deploy
  2. upload that file into an S3 bucket, packed into a ZIP file (see “final notes” below)
  3. tell Elastic Beanstalk to create a new app version using the info from that file (on S3)

That obviously was . The next script, , does this:

  1. tell EBS to actually deploy that configuration using the AWS cli.

This is the  file which describes our single-container test application:

  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "flypenguin/test:latest",
    "Update": "true"
  "Ports": [
      "ContainerPort": "5000"
  "Volumes": [],
  "Logging": "/var/log/flypenguin-test"

See “final notes” for the “ContainerPort” directive.

I also guess you know how to upload a file to S3, so I’ll skip that. If not, look in the script. The Terraform declaration to add the version to Elastic Beanstalk looks like this: (if you used my script, a file called app_version_<VERSION>.tf  was created for you automatically with pretty much this content):

# define elastic beanstalk app version "latest"
resource "aws_elastic_beanstalk_application_version" "latest" {
  name        = "latest"
  application = "${}"
  description = "Version latest of app ${}"
  bucket      = "my-test-bucket-for-ebs"
  key         = ""

Finally, deploying this using the AWS cli:

$ aws elasticbeanstalk update-environment \
  --application-name test-app \
  --version-label latest \
  --environment-name test-env 

All done correctly, this should be it, and you should be able to access your app now under your configured address.

Wrap up & reasoning

My repo works, at least for me (I hope for you as well). I did not yet figure out the autoscaling, for which I didn’t have time. I will catch up in a 2nd blog post once I figured that out. First tests gave pretty weird results 🙂 .

The reason why I did this (when I have Rancher available for me) is the auto-scaling, and the host-management. I don’t need to manage any more hosts and Docker versions and Rancher deployments just to deploy a super-simle, CPU-intensive, scaling production workload, which relies on very stable (even pretty conservative) components in that way. Also I learned something.

Finally, after reading a lot of postings and way to much AWS docs, I am surprised how easy this thing actually is. It certainly doesnt look that way if you start reading up on it. I tried to catch the essence of the whole process in that blog post.

Final notes & troubleshooting

  1. I have no idea what the aws_elastic_beanstalk_configuration_template  Terraform resource is for. I would like to understand it, but the documentation is rather … sparse.
  2. The solution stack name has semantic meaning. You must set something that AWS understands. This can be found out by using the following command:
    $ aws elasticbeanstalk list-available-solution-stacks 
    … or on the AWS documentation. Whatever is to your liking.
  3. If you don’t specify a security group (aws:autoscaling:launchconfiguration  – “SecurityGroups “) one will be created for you automatically. That might not be convenient because this means that on “terraform destroy” this group might not be destroyed automatically. (which is just a guess, I didn’t test this)
  4. The same goes for the auto scaling group scaling rules.
  5. When trying the minimal example, be extra careful when you can’t access the service after everything is there. The standard settings seem to be: Same subnet for ELB and hosts (obviously), and public ELB (with public IPv4 address). Now, placing a public-facing ELB into an internal-only subnet does not work, right? 🙂
  6. The ZIP file: According to the docs you can only upload the JSON file (or the Dockerfile file if you build the container in the process) to S3. But the docs are not extremely clear, and Terraform did not mention this. So I am using ZIPs which works just fine.
  7. The ContainerPort is always the port the applications listens on in the container, it is not the port which is opened to the outside. That always seems to be 80 (at least for single-container deployments)

Appendix I: Create ServiceRole IAM role

For some reason on the first test run this did not seem to be necessary. On all subsequent runs it was, though. This is the way to create this. Sorry that I couldn’t figure out how to do this with Terraform.

  • open AWS IAM console
  • click “Create new role”
  • Step 1 – select role type: choose “AWS service role”, and under that “AWS Elastic Beanstalk”
  • Step 2 – establish trust: is skipped by the wizard after this
  • Step 3 – Attach policy: Check both policies in the table (should be “AWSElasticBeanstalkEnhancedHealth”, and “AWSElasticBeanstalkService”)
  • Step 4 – Set role name and review: Enter a role name (e.g. “aws-elasticbeanstalk-service-role”), and hit “Create role”

Now you can use (if you chose that name) “aws-elasticbeanstalk-service-role” as your ServiceRole parameter.

Appendix II: Sources


Rancher IAM role

Rancher can create instances on EC2. If you want to define a dedicated IAM user for this, refer to the Amazon docs for the a profile template.

Unfortunately the first thing you get when using those permissions in rancher is “You are not authorized”. Great. I’ll update this when I know the correct permissions.

(Source: Rancher docs)


VPC with NAT to internet on AWS

… and other TLAs.

Anyways, as far as I remember OpenStack does not need this, so I thought I document it here. I at least was surprised.

Situation: You want a private network sement in the cloud (in my case an Amazon VPC), and you don’t want all hosts to be accessible from the internet. So you don’t assign public IPs, and you need a router/gateway.

Amazon creates a network internet gateway, but this thing does not do one thing: NATting. If your host does not have a private IP, it can’t connect to “the internet”.

Solution: You actually need to instantiate an EC2 instance, which you have to configure to do NAT (which is forwarding and masquerading) with a public IP address. And you have to set routing tables which point to that instance for all networks which should be inaccessible from the internet.

Thankfully there’s an article providing an example CloudFormation template.

Really, thanks.

Configuring the NAT instance is super-easy then. Amazon mentions in its docs that there are special Amazon Linux instances (“These AMIs include the string amzn-ami-vpc-nat in their names […]”) which come with NATting preconfigured. Just instantiate an instance using the appropriate AMI image, and you’re done. No further configuration needed.

Longer things

My take at a CI infrastructure, Pt.1

… so far.

It might be crappy, but I’ll share it, cause it’s working. (Well, today it started doing this 😉 ). But enough preamble, let’s jump in.

The Situation

I am in a new project. Those people have nothing but a deadline, and when I say nothing I mean it. Not even code. They asked me what I would do, and I said “go cloud, use everything you can from other people, so you don’t have to do it, and you stay in tune with the rest of the universe” (read: avoid NIH syndrome). They agreed, and hired me.

The Starting Point

They really want the JetBrains toolchain, the devs use CLion. They also want YouTrack for ticketing (which doesn’t blow my mind so far, but it’s ok). Naturally they want to use TeamCity, which is the Jenkins alternative from JetBrains, and pretty all right from what I can see so far.

The code is probably 95%+ C++, and creates a stateless REST endpoint in the cloud (but load balanced). That’s a really simple setup to start with, just perfect.

Source code hosting was initially planned to be either inhouse or in the bought cloud, not with a hoster. Up to now they were using git, but without graphical frontend which involved manual creation (by the – part time – admin) of every git repo.

The Cloud Environment

That’s just practical stuff now, and has nothing – yet – to do with CI/CD. Skip it if you’re just interested in that. Read it if you want to read my brain.

I looked around for full-stack hosted CI/CD systems, notably found only Shippable, and thought that they don’t fully match the requirements (even when we move source code hosting out). So I went to AWS, and tried ElasticBeanstalk. This is quite cool, unfortunately scaling takes about 3-5 minutes for the new host to come up (tested with my little load dummy tool in a simple setup, which I actually didn’t save, which was stupid).

Anyway, before deploying services CI (the compilation & build stuff) must work. So my first goal was to to get something up and running ASAP, and thats bold and capitalized. Fully automated of course.

For any kubernetes/CoreOS/… layout I lack the experience to make it available quickly, and – really – all the AWS “click here to deploy” images of those tools didn’t work out-of-the-box. So I started fairly conventional with a simple CloudFormation template spawning three hosts: TeamCity server, TeamCity agent, Docker registry, and – really important – GitLab. Since then GitLab was replaced by a paid GitHub account, all the better.

Setting the hosts up I used Puppet (oh wonder, being a Puppet “Expert”). Most of the time went in writing a TeamCity puppet module. A quirk is that the agents must download their ZIP distribution image from a running master only, which is kinda annoying to do right in puppet. For now TeamCity is also set up conventionally (without docker), which I might change soon, at least for the server. The postgres database runs in a container, though, which is super-super simple to set up (please donate a bit if you use it, even 1€ / 1$ helps, that guy did a great job!). Same went for gitlab (same guy), and redis (again). I also used the anti-pattern of configuring the hosts based on their IP addresses.

I also wanted to automate host bootstrapping, so I did this in the cloudformation template for each host. The archive downloaded in this script contains 3 more scripts – a distribution-dependent one which is called first, have a look to see details. Basically it’s just a way to download a snapshot of our current puppet setup (encrypted), and initialize it so puppet can take over. I also use “at” in those scripts to perform a reboot and an action after, which is highly convenient.

CI (finally)

… in the next post 😉


Docker registry, S3 and permissions

There are a couple of bazillion blog posts saying “yah just did my docker registry on S3”.

It’s not so easy, though. Cause what if you want to limit access to a certain IAM user? Yup, you need to go deep (well, a bit) into the policy thing of Amazon. Which sounds simple, but isn’t.

I got “HTTP 500” errors from the docker registry when I first deployed. My configuration, which was wrong, looked like this:

"RegistryIAMUser" : {
  "Type" : "AWS::IAM::User"
"RegistryIAMUserAccessKey" : {
  "Type" : "AWS::IAM::AccessKey",
  "Properties" : { "UserName" : { "Ref" : "RegistryIAMUser" } }
"Bucket" : {
  "Type" : "AWS::S3::Bucket",
  "Properties" : { "BucketName" : "flypenguin.docker-registry" }

"RegistryPrivateAccess" : {
  "Type" : "AWS::S3::BucketPolicy",
  "Properties" : {
    "Bucket" : {"Ref":"Bucket"},
    "PolicyDocument": {
        "Action":[ "s3:*" ],
        "Resource":  { "Fn::Join" : ["", ["arn:aws:s3:::", { "Ref" : "Bucket" } , "/*" ]]},
        "Principal": {"AWS" : {"Fn::GetAtt":["RegistryIAMUser","Arn"]}}

Since this didn’t work really well, I googled my a** off and found a little post, which used a UserPolicy (instead of a bucket policy, which is basically the other way around), but did one thing different. My working configuration is now … (let’s see if you can see the difference):

[... same as above ...]

"UserPolicyRegistryPrivateAccess" : {
  "Type" : "AWS::IAM::Policy",
  "Properties" : {
    "PolicyName" : "AccessToDockerBucket",
    "Users" : [ {"Ref":"RegistryIAMUser"}],
    "PolicyDocument" : {
      "Version" : "2012-10-17",
      "Statement" : [{
        "Action":[ "s3:*" ],
        "Resource": [
          { "Fn::Join" : ["", ["arn:aws:s3:::", {"Ref":"Bucket"} , "/*" ]]},
          { "Fn::Join" : ["", ["arn:aws:s3:::", {"Ref":"Bucket"} ]]}

See it?

It’s the two resources now. You need not only “resource/*” as a target, you also need “resource” itself as a target. Which makes sense if you know it and think about it. If you don’t … it’s a bit annoying. And time-consuming.