Jarrod's face.

Hi, I'm Jarrod

I work at Adzerk. I write here.

Instead of comments lets just use @jarrodswart.

I've chosen mdBook as the vehicle for this site because its simple to use and I like the idea of laying information out as a book that can be updated and curated. As opposed to a blog that collects information by date.

Programming

The subsections here all relate to programming in some way, but are not necessarily connected to one another directly.

Estimation

"Most of what we do is R&D." -- An Intelligent Coworker

Lately (Fall 2018) our team at work has been working on refining and implementing our engineering process. One of our goals is getting better at estimation, and while there is a lot to say here I found that as an individual estimation remains about as hard as it always has been.

So I decided to try something odd. I've been experimenting with using the resolution mechanic from a table top RPG to assist me with estimation.

In the game any time you roll the dice you first consider two factors: position and effect. Position is a statement about your approach to the situation at hand and can be either: controlled, risky or desperate. While Effect is a statement about the results and can be: great, standard or limited.

Our goal as a team is to pick an arbitrary system of estimation, that we can eventually rely on as a team. So if we say that a certain task will be 4 points of effort, and we all agree that 4 points of effort is roughly two days of work and we routinely complete tasks to this estimate then we are on the right track.

So its all arbitrary but hopefully correct over time.

Getting approximate

With that in mind I decided to use position and effect to help me with getting better at estimation personally. So what am I doing?

First we have to come up with some parameters, I've highlighted risky and standard as that is the default position and effect in the game and a good baseline for this purpose as well.

Position

  • Controlled: I have done this exact thing, or something 99% like it before.
  • Risky: I haven't done this before but I know a lot of ways to solve it.
  • Desperate: I have not done this before, I do not know how to solve it.

Effect

  • Great: Solving this problem would make or save the entire business a lot of money.
  • Standard: Solving this problem would add substantial value, new functionality or fix a bug.
  • Limited: Solving this problem will automate a small task, or provide a quick solution to a small pain point.

Just like the game this gives us 9 combinations to work with.

  • Controlled/Great

  • Controlled/Standard

  • Controlled/Limited

  • Risky/Great

  • Risky/Standard

  • Risky/Desperate

  • Desperate/Great

  • Desperate/Standard

  • Desperate/Limited

Let's try it!

  • "We need you to install Wordpress and point a new domain at it."
    • Controlled/Standard
    • I've done this before, many times.
  • "The production Postgres database is deadlocking and we need to figure out why."
    • Controlled/Standard
      • Identifying the source of deadlocking. I've done this before, but this really only applies to the method of identifying the deadlocks, not the solution.
    • Risky/Standard
      • Solving the deadlocks. I've done it, but not very often. This case could be unique.
  • "We need to process customer data and apply Machine Learning to gather insights."
    • Desperate/Great
      • I have no idea how to this at this moment and it seems like it could add A LOT of value to the business.

So what have we done here?

This process adds a new dimension to estimation that I wasn't considering previously: my potential aptitude at creating a good estimate. Often when attempting to estimate I look first at the task, then at the choices available for creating that estimate and pick a number.

The old process was:

  1. Look at task.
  2. Look at estimation scale/points/units.
  3. Pick a unit to fit this task.

The new process is:

  1. Look at task.
  2. Assess task using Position/Effect.
  3. Look at estimation scale/points/units.
  4. Pick a unit, based on my Postion/Effect to fit this task.

The goal being that in forcing self-awareness of the situation and task while attempting to estimate I can hopefully create better estimates.

Naming

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

There seem to be two breeds of thought on naming in software: literal and captivating. I think that literal speaks for itself but the idea of a captivating name is that it invokes the imagination to associate something unique with an abstract idea.

An illustrated example is a discussion between several of my coworkers, debating what to name a reporting service. For context my company has a long history of naming large projects after Norse gods.

"What about Hel?", proposed John. "Hel? What does that mean? We should call it Metric Reporting Service.", Steve suggests as a counter example. Steve had a long history of proposing literal names and absolutely abhorring the Norse god naming scheme. "You know,.. I bet Steve named his dog 'Outside Poop Machine'.", retorts John.

It was at this time that we all burst into laughter.

Point, Counter-point

I've found that I enjoy a more captivating and abstract name for larger projects. In the same way that people name conference rooms, a unique name helps solidify and captivate the imagination around an otherwise mundane or ordinary thing. There are limitations though and over time I've found that as the number of captivating names increases the value is lost. Perhaps this was Steve's point all along. As the newest memeber to the team he was introduced to a long list of abstract names.

  • Thor
  • Hel
  • Loki
  • Rattoskr
  • Odin

... and so on. Being presented with so many abstract names they lose the sense of captivation. Whereas the rest of the team had to opportunity to slowly grow the list, picking some of the names themselves. After a while though, as a service is finished and runs in production much of the team, especially if they didn't name or work on the project directly will lose the captivating association as well. At this point, the literal name has the advantage.

Lit.eral

The aspect of literal naming that fails for me personally is when something's description doesn't designate what makes it different.

An example:

    var logProperties = util.getProperties(data)

This is a contrived example but there are a range of words used in programming that I think have lost their meaning: log, data, build(er), config, properties, object, obj, and the list goes on. There is a point where literal descriptions fall back into the same pattern as that of the captivating scheme, meaning is lost.

Solutions?

I have no idea, naming is still hard. Perhaps like estimation, multiple vectors could be used to generate better names or using a thesaurus on really important items.

Alternatively there is a chance that there is no solution at all and naming will simply be hard to get right. Or its only when we get rid of names entirely (lambdas, declarative systems) that the problem can be solved through avoiding it all together.

Testing

Localstack (mocking AWS)

When it comes to production or production-adjacent work, I have a debilitating amount of fear. Early on in my career I accidently deleted the entire MySQL database containing about 90 customer's worth of data. Luckily I had a 15 minute old backup, otherwise I probably would have been fired. The terror I felt as my boss screamed from the next room, when our app wasn't loading was heavy. The next day I made read-only credentials.

Since then, I don't mess around.

But at times this fear can prevent me from actually getting work done. At my current company our services live on AWS and I've found two tools that help ease this fear:

  1. Localstack
  2. Cloud Formation

This post is about a simple way to get started with Localstack. Perhaps I'll discuss the other in a future post.

Setting up Localstack

Localstack is a docker container that mocks AWS. I've been using it during development and I greatly enjoy being able to develop and experiment without having to worry about touching actual infrastructure. Working in this manner involves docker.

Lets create a simple development environment with:

  1. Our application
  2. Postgres
  3. Localstack

Each of these will be run in their own docker containers.

Your app

You need a minimal setup to get your application code in a container. This will vary from application to application. Whatever you end up doing: give the container a good name.

The rest

We don't actually need to acquire the other containers, they are in the dockerhub cloud. Instead we need to create a script that will run our app and link everything together. I use the following.

You will of course need to tweak instances of my-app to get this script to work. Its meant as an example not a literal template.

#!/bin/bash

trap "docker stop postgres-$USER localstack-$USER" EXIT

docker run \
    --name localstack-$USER \
    -e SERVICES='s3,dynamodb' \
    -e HOSTNAME_EXTERNAL='localstack' \
    --rm -d localstack/localstack

docker run \
    --name postgres-$USER \
    -e POSTGRES_PASSWORD=1337h4ck35 \
    --rm -d postgres:9.6 \-c 'log_destination=stderr' \-c 'logging_collector=on' \-c 'log_directory=pg_log' \-c 'log_filename=postgresql-%Y-%m-%d_%H%M%S.log'

docker run \
    --name myapp-$USER \
    --link postgres-$USER:postgres \
    --link localstack-$USER:localstack \
    -v "$PWD:/my-app" \
    -e USER_HOME=$HOME \
    -e USER_NAME=$USER \
    -e AWS_DEFAULT_REGION=us-east-1 \
    -e AWS_ACCESS_KEY_ID=abc \
    -e AWS_SECRET_ACCESS_KEY=def \
    -e LOCALSTACK_HOST='localstack' \
    --rm -ti myapp-$USER

Then set execution permissions: $ chmod +x run-myapp-docer and start it up: $ cd my-app && ./run-myapp-docker.

The key thing is the use of --link which ties our application to the Postgres and Localstack containers. Though --link is considered deprecated and the docker project recommends moving to docker compose I have not found a need to do so personally. Your results may vary.

Within our app container postgres is the hostname we would use to refer to our DB and localstack for AWS.

The final piece is getting this to work is setting the AWS endpoints. The localstack repo has a list of the various endpoints you would need to use as well as the general information on how to get started.

Overall I find this to be a really nice way to start developing a service that is going to live in AWS, while keeping everything local and under more simple control.

Notes

DynamoDB Attribute Definitions

This always trips me up so I'm writing it down, I hope this commits it to memory.

AttributeDefinitions = KeySchema + Indexes

So if the table has a KeySchema of {KeyType: "HASH", AttributeName: "my_primarykey"} and an global secondary index named {KeyType: HASH, AttributeName: "my_index"} then you simple add these into the AttributeDefinition and give their type. Like the following.

AttributeDefinitions:
    - AttributeName: "my_primarykey"
      AttributeType: S
    - AttributeName: "my_index"
      AttributeType: S

Very simple. It says this pretty much verbatim in the docks and yet I always spend five minutes futzing with it. No more!

TIL: AWS Security Groups

Today while I was struggling with getting an EC2 instance to be able to communicate with an RDS Aurora cluster, my coworker Craig Andera taught me a something very useful about AWS Security Groups.

They can be thought of as two things:

  1. An identity
  2. A container for rules.

This first aspect was completely missing from my mental model.

In my mind, and through AWS, I was attempting to create the following.

EC2 Instance <--- The Security Group --->  DB Cluster

Where The Security Group was simply a set of rules, a mapping of ports. The in/outbound 5432 ports open, and yet it wasn't working. And this is of course where the missing element of identity came into play. I wasn't specifying who was allowed to connect on those ports only how they were allowed to connect.

The answer was to create an additional SG for just the EC2 machine and then use this to specify both who, and how the machine would connect to the DB Cluster.

With this I have created a new mental model.

EC2 Instance (as SG-1234)
   ^
   |
   \
    -> DB Cluster (allowing: SG-1234 on port 5432)

I now understand that I need to specify who is connecting and not just how.