Automating workflows is key in setting your team up for success. We believe in building tools to help others focus on their craft rather than the nuances of process. These are the tools we rely on to improve our engineering workflow at Doximity:

  1. Setup
  2. Integration
    1. Continous Integration
    2. Instant Gratification
    3. Code Metrics
  3. Delivery
    1. Staging and Production
    2. Deployments
  4. Monitoring

Setup

Jumping into new applications can be daunting. Learning the code and its business rules isn't enough. There are many dependencies to install and configure. Early on, we relied on README files to setup development environments. This worked well, but it was a manual process and consumed too much time. We are a Rails shop. Therefore, we rely on bin/setup to configure our environments.

git clone git@github.com:doximity/application.git
cd application
./bin/setup

The above three commands are all one should need to bootstrap our applications. Our bin/setup includes installing brew and other system libraries. Here is a trimmed example of a few items from our bin/setup:

puts "== Installing brew & friends"
system 'ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"'
system 'brew update'

# Ruby
system 'brew install rbenv'
system 'brew install ruby-build'
system 'rbenv install -s $(cat .ruby-version)'
system 'gem install bundler'
system 'gem install bundler --conservative'
system 'bundle check || bundle install'

# Rails
puts "\n== Preparing database"
system 'bin/rake db:setup'

puts "\n== Removing old logs and tempfiles"
system 'rm -f log/*'
system 'rm -rf tmp/cache'

puts "\n== Restarting application server"
system 'touch tmp/restart.txt'

Integration

Once a team member is set up with an application, the next concern is merging commits into master. In our setup, they can rely on alerts to stay informed of errors. We encourage every new team member to deploy their code to production on day one. It’s vital they feel comfortable around our safeguards. Let's delve into some of them.

Continous Integration

Whether we are running rSpec or minitest, we ensure every commit has an automated test suite ran against it. On completion, it's reported to the GitHub Pull Request and the respective Slack channels. Setting up Continuous Integration should be trivial with all the tools on the market. CircleCI is our choice except for our mobile builds, for which we use Jenkins.

Pro Tip: parallel_tests can speed up builds while getting the most out of the VMs to save money.

Instant Gratification

Having good test coverage is important, but if it takes hours to finish the build, it slows down your team. A fast build is a requirement. In our early days, our largest application’s test suite took 4 hours to run. Over the years, we’ve brought it down to 20 minutes by having a dedicated TestOps team who improved our test infrastructure. Smaller applications don't consume as much time, finishing in under 3 minutes. Consider how much time your team waits for a build to finish. That’s time users are deprived from the next new feature or a critical bug fix.

Code Metrics

CodeClimate can be an invaluable tool. It keeps your code lean and secure. Ensure your repository is hooked up to notify on Pull Requests so your team knows of issues before merging into master. Brakeman is also a good resource for analyzing security vulnerabilities.


Most aren't brave enough to deploy to production when their pull request looks like above.

Delivery

Delivering your branch to production should be trivial -- but we all know it's not always that simple. Even with green automated tests, there is still a chance your branch can cause havoc in production. Perhaps it's the missing index which goes unnoticed in development. In the context of millions of rows in production, the impact can be drastic. Here is how we combat this.

Staging and Production

Staging environments are common for testing before production deployment. Parity between these environments is critical. Ideally these environments replicate production as closely as possible from the NGINX configuration to production data stores. We use the following tools to build hundreds of environments in various clusters.

Packer

We ensure there are base images created via Packer templates with Ruby installations, speeding up the time it takes to bootstrap new environments.

Terraform

The biggest wrench in our toolbox is Terraform. We configure a lot with Terraform. Environment creation, IAM policies, S3 buckets, Route 53 records, Consul key/value stores, and many others. The following is a simplified version of our Terraform configuration with documentation:

# Providers and its configuration. In this case AWS, Consul and Digital Ocean.
provider "aws" {}
provider "consul" {
  scheme     = "http"
  datacenter = "[redacted]"
  address    = "[redacted]"
}
provider "digitalocean" {}

# Digital Ocean droplet that provisions the new instance with Chef Server and runs chef-client.
resource "digitalocean_droplet" "web" {
  name     =  "${var.app}.${var.cluster}.${var.domain}"
  image    =  "${lookup(var.image, var.ruby_version)}"
  region   =  "${var.region}"
  size     =  "${var.size}"
  private_networking = true

  provisioner "local-exec" {
    command = "knife bootstrap ${self.ipv4_address} -N ${var.app}.${var.cluster} -y -x root --bootstrap-vault-item secrets:base -j '{\"rbenv\": {\"version\": \"${var.ruby_version}\", \"global\": \"${var.ruby_version}\"}}' -r 'recipe[app-ruby::${var.stage}]'"
  }
}

# Route 53 record for the new application pointing to the Digital Ocean droplet.
resource "aws_route53_record" "dns" {
  zone_id =  "${var.aws_route53_zone_id}"
  name    =  "${var.app}.${var.cluster}.${var.domain}"
  type    =  "A"
  ttl     =  "300"
  records =  ["${digitalocean_droplet.web.ipv4_address}"]
}

# IAM user and access key.
resource "aws_iam_user" "user" {
  name = "${var.cluster}-${var.app}"
}

resource "aws_iam_access_key" "key" {
  user = "${aws_iam_user.user.name}"
}

# S3 bucket.
resource "aws_s3_bucket" "b" {
  bucket = "${var.cluster}-${var.app}-${var.stage}"
  acl = "private"
}

# Access policy for this user to the S3 bucket.
resource "aws_iam_user_policy" "b_ro" {
  name = "${var.cluster}-${var.app}-${var.stage}-s3-access"
  user = "${aws_iam_user.user.name}"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": ["s3:*"],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::${var.cluster}-${var.app}-${var.stage}/*",
        "arn:aws:s3:::${var.cluster}-${var.app}-${var.stage}"
      ]
    }
  ]
}
EOF
}

# AWS keys and S3 bucket names into Consul so it can be accessed by the Rails application.
resource "consul_keys" "app" {
  key {
    name = "aws_access_key_id"
    path = "${var.stage}/${var.cluster}/${var.app}/AWS_ACCESS_KEY_ID"
    value = "${aws_iam_access_key.key.id}"
  }

  key {
    name = "aws_secret_access_key"
    path = "${var.stage}/${var.cluster}/${var.app}/AWS_SECRET_ACCESS_KEY"
    value = "${aws_iam_access_key.key.secret}"
  }

  key {
    name = "s3_bucket_name"
    path = "${var.stage}/${var.cluster}/${var.app}/S3_BUCKET_NAME"
    value = "${aws_s3_bucket.b.id}"
  }
}

Adding new environment involves the simple configuration shown below and running terraform apply:

module "APPLICATION_CLUSTER_DOMAIN_TLD" {
  app           = "APPLICATION"
  cluster       = "CLUSTER"
  size          = "2gb"
  ruby_version  = "2.2.2"
  source        = "./app"
}

Chef and Capistrano

Once Terraform finishes, Chef takes over provisioning everything that’s required by the Rails application. At that point, Capistrano handles all our deployments.

Deployments

Ensuring your team (whether it’s engineers or not) can deploy via ChatOps can be a huge time saver. Besides, it allows us to queue up dozens of deployments.

At Doximity, we rely on a combination of Slack, Hubot, and Heaven to handle deployments. For better visibility, we've customized Heaven with a user interface. It reports status checks on the individual environments along with deployment history.

Pro Tip: Lock deployments to production at Beer O’Clock.

Migrations

Migrating tables are common and they should be trivial to perform via ChatOps. With the combination of Capistrano, Hubot, and Heaven, this should be doable by everyone. Migrations have dangers such as locked tables and missing columns. This results in either downtime or errors for users. Relying on Large Hadron Migrator, we perform live migrations without downtime for our users.

Monitoring

Everyone on your team should be able to react to performance problems and address them with ease. After all, It’s Not Done Until It’s Fast. We use a combination of NewRelic for application reporting and Sensu for server monitoring. VictorOps alerts us when we aren’t by our computers. Otherwise, notifications are delivered to the Slack channel.


Combining the aforementioned tools, we help our team ship product faster and be more confident with what they deliver.

Developer happiness is an important but often neglected subject. Once a team surpasses a few engineers, optimizing for developer productivity is imperative. Our team caught this train a little late, but we're making strides now. Expect to see a few more articles about this topic here. Follow us for updates.

Imagine you stumble into a medical conference and decide you want to socialize with physicians. Your process may be to approach a random physician, speak to them for a bit, and then have them randomly select one of their colleagues whom you can speak to next. If you continued like this forever, how much time would you spend talking to each person? You might spend the most time talking with the person with the most connections, but depending on how the data is structured, that might not be the case. Let's take a look!

Step 1: Get a hold of the data you want to analyze and format it as an edge list in a csv. An edge list has two columns which both represent nodes in the graph. For example, a row like 103, 105 might represent that physician 103 is connected to physician 105.

Step 2: Install R, as well as the igraph package.

Step 3: Build a graph with the igraph package from your csv edge list. Running the function below prompts the user to select a csv file from the system and builds a graph.

library(igraph)
graphFromEdgeList <- function(){
  dat=read.csv(file.choose(),header=TRUE) # choose an edgelist in .csv file format
  graph.data.frame(dat,directed=TRUE)
}

Step 4: Manipulate the graph using some of the techniques from Google's page rank paper and build a transition matrix.

Before moving to step 5, let's take a quick look at graph theory. An adjacency matrix is a representation of a graph where a spot in the matrix (for example, [103,105]) is 1 if physician 103 is connected to 105, and 0 if they aren't connected. To get the probability of randomly transitioning from any given node to any of its connections, you divide 1 by the number of connections for the physician. If physician 103 has 5 connections, then there would be a 1/5 or .20 chance of physician 103 directing you to 105. A matrix of all these probabilities is called a transition matrix.

But what happens if physician 103 is connected to 104 but 104 is connected to nobody? Or what if 104 is only connected to 105 and 105 is only connected to 104? These two examples, also known as leafs and periodic subgraphs, can distort the results of a random walk because the walker could never get to the rest of the graph. When Google was working on its page rank algorithm they came up with some tricks to ensure that the graph they analyzed would not have either of these properties. First, if a node has no outbound connections, then connect it to everything in the graph at equal probability. Second, the walker is given a 20% chance of jumping to anyone randomly at any time. The code to build such a transition matrix given a graph object is below:

# takes a weighted & directed graph
# returns a modified n x n transition matrix
  # return matrix is modified so that any node that had no outbound edges is
  # connected to every node in the graph at a uniform weight.
  # Every node is connected so the graph is guaranteed not to have hidden periodic graphs
randomChatterMatrix <- function(G){
  A <- get.adjacency(G) # warning this matrix can be quite large
  N <- nrow(A)
  r <- c()

  for (i in 1:N){
    s = sum(A[i,])

    if (s == 0){
      # connect all leafs to every other node in the graph at equal probability
      # manipulating A in a loop is not performant because it will copy A every time
      # keep a vector of rows to bulk update later
      r = c(r, i)
    } else {
      # since s varies row to row perform the operation on the spot
      A[i, ] <- A[i, ] / s
    }
  }

  A[r,] <- 1/N # bulk update all zero rows
  m <- as.matrix(A)
  (.8 * m) + (.2 * 1/N)
}

Step 5: Compute the random walk probabilities for each physician. Now that we have a transition matrix to work with, we can use linear algebra to answer our original question. How much time will you spend talking to each person? It just so happens that when you're dealing with a real square matrix with positive entities, the eigenvector corresponding to its largest eigenvalue will give us exactly that information. Using the functions above the code to compute the dominant eigenvector looks like this:

g <- graphFromEdgeList()
r <- randomChatterMatrix(g)
eigen_vect = eigen(t(r))$vectors[,1]
probs = eigen_vect/sum(eigen_vect)
print(probs) # lets see what we got!

Below is a graph of a small subset of Doximity physicians where the X axis shows the proportion of time spent talking to a physician and the Y axis shows how many physicians fell into that amount of time. The results follow a logarithmic trend where you would randomly chat with most physicians for about the same amount of time. A few physicians however stand out as people you would spend significantly more time chatting with.
histogram of physician chatting time

And just like that, you're able to quantify exactly how much time you would spend with each physician. This new piece of information is interesting on its own, but it can also be the start of many more fun data science exercises with R!

When consumers first get acquainted with an API, they'll often turn to documentation. This is a natural starting point for most consumers, especially since APIs are so prominent in development. Why, then, do we start with writing tests for code? The short answer: we don’t.

At Doximity, we often write the documentation for our APIs before writing the first line of code. Tom Preston-Werner actually wrote about the benefits of this type of documentation in his blog post Readme Driven Development. Over time, we've found a way of streamlining this workflow to offer benefits for both developers and consumers alike.

The Workflow

First, we write a draft of the documentation, including an example request and response. After receiving edits from the consumer, our iOS team, we make the necessary revisions. This feedback loop assures all shareholders, project manager included, agree on a final product. (This is similar to the process that user interface design mocks go through before development.) After this loop is complete, we write the failing tests that support and drive the code as we normally would.

Apiary

To help us with this process we use the Apiary service, which hosts our documentation. But to call them a mere host misses the point. Apiary champions a documentation format, API blueprint, which standardizes the way to write documentation. After comparing a few different products in this area, like Swagger and RAML, we decided on Apiary. The ease of learning API blueprint’s syntax, especially for less technical people, was too appealing to pass up.

Apiary also runs a web server for the defined endpoints themselves. The HTTP server responds with the JSON found in the example of the documentation. This is a powerful feature. It allows the iOS team to start developing against it as soon as we finish the documentation. There are two other tools built around the mock API that are logical derivatives. The first is the proxy API. This tool proxies requests through Apiary to any server, such as a staging server. When debugging a request or its response, it helps to match the expected with the actual. To that end, Apiary provides a diff. Finally, while consumers are reading an API's docs, they can send requests to a production server. By adding a header for authorization, the browser will send the request to the server.

Final Thoughts

There are many other natural benefits to this style of driving design. Our documentation serves as a transparent boundary for our teams as we grow as a company, And our development team can stub the responses from services. Apiary can help with that. The blame for what team owns a bug goes from fuzzy to clear. We can even write tests for the documentation itself. It's well worth the cost of the initial time spent writing documentation if we can continue to have a meeting of the minds later on.

There is a fantastic Thoughtbot article, written by Caleb, about signing commits (among other items, like emails). I presented it to our team as an excellent opportunity to provide some authenticity and ensure provenance.

If you don't have the time to follow along with Caleb, I'm going to attempt to tl;dr his article here. However, I highly recommend referring back to the original article.

Signing a commit proves you yourself made those changes. This is advantageous for a number of reasons that you can learn about from horror stories.

To get setup, run these commands:

brew install gpg2 gpg-agent pinentry-mac
gpg2 --gen-key

Use RSA and 4096. Set key expiry to 1 year if this is your first one. This way lost passphrases, forgotten keys, etc. all get expired. However, if you use PGP regularly, having a key that doesn't expire isn't unreasonable as long as you generate a revocation certificate you store somewhere separately, so pick 1 year.

After you follow the prompts, generate a revocation certificate, especially if not expiring your key.

gpg2 --output revoke.asc --gen-revoke your@email.com

Follow the prompts and tell gnupg you're giving no reason, since you're pre-generating it. Seriously, you need this. If you lose it, you're hosed, so store it safely. Printing as a QR code is highly recommended.

Finally, make this automatic for git by adding it to your gitconfig. This is the best part and was only recently added to git. Run gpg-agent so you only have to enter the secret key's passphrase once.

If you made it this far, consider exchanging and signing each other's keys at your organization for unlocking the full power.

Machine Learning Made Simple with Ruby

How is it possible to make automatic classification work properly without resorting to using external prediction services? Starting with Bayesian classification, you can use the ruby gem classifier-reborn to create a Latent Semantic Indexer. Hands on!

Thinking in React

Pete Hunt walks you through the process of creating a React.js application, explaining the process and how to think the React.js way.

Go and Ruby-FFI

How to write a shared library in Go that can be loaded by Ruby-FFI.

Profiling & Optimizing in Go

Transcript of a talk going through the tools and strategies for profiling and optimizing Go.

Best practices for a new Go developer

Read what Gophers from across the world have to say to the question — “What best practices are most important for a new Go developer to learn and understand?”