Infrastructure Posts

On microservices and distributed architectures

On microservices and distributed architectures

Like the boiling frog, we often fail to appreciate just how significantly the infrastructure upon which we rely as developers has improved over the last decade. When I began working with the Rails framework in 2010, everything from the hardware we used for local development, to the infrastructure upon which we tested and deployed, was positively anemic by today’s standard.

My personal laptop, reasonably high-end for the time, had a 5400 RPM spinning disk and 2 GB of RAM. SSDs were exotic, even on servers. Nowadays, you can get bare metal servers with 512gb-1tb of RAM, 2x multi-core CPUs and terabytes of fast SSD storage for a price that is perfectly reasonable for even small companies. Similarly, you can easily and cheaply launch fleets of high-spec virtual servers with providers like Amazon Web Services and DigitalOcean at minutes’ notice.

In many ways, it seems to me that we are often basing architectural decisions on imagined constraints. In my experience, a decision to embrace a microservices architecture should not follow primarily from concerns about scalability.

High-performance logging from Nginx to Postgres with Rsyslog

High-performance logging from Nginx to Postgres with Rsyslog

While there are excellent purpose-built solutions for general log storage and search, such as Librato and the ELK Stack, there are sometimes reasons to write log data directly to Postgres. A good example (and one we have recent experience of at Superset) is storing access log data to present to users for analytics purposes. If the number of records is not explosive, you can benefit greatly from keeping such data closer to your primary application schema in PostgreSQL (even scaling it out to a separate physical node if need be by using something like Foreign Data Wrappers or the multi-database faculties of your application framework).

For our own application, we have a few geographically-distributed Nginx hosts functioning as “dumb” edge servers in a CDN. These Nginx hosts proxy back to object storage to serve up large media files to users. For each file access we want to record analytics information such as IP, user agent, and a client identifier (via the Nginx userd module, to differentiate unique users to the extent that it is possible).

Dedicated vs. Cloud vs. VPS vs. PaaS – a value comparison

Dedicated vs. Cloud vs. VPS vs. PaaS – a value comparison

I find myself increasingly evangelizing for awareness of infrastructure and ops practices because I believe that such awareness (or lack thereof) has cascading effects for application architecture and, ultimately, company success. Understanding the relative value of platforms can keep you on a path of rapid execution. Misunderstanding or neglecting it can get you into very dire situations.

I see many teams break out their applications into services prematurely, with immense consequence in terms of cognitive overhead and loss of velocity. Typically the decision is a consequence of a perception that they have hit a performance ceiling, when in fact they are still on some relatively weakly-provisioned PaaS.

I want to do a detailed value comparison to aid others in making informed infrastructure decisions.

How to mess up DevOps: working at the wrong level of abstraction

How to mess up DevOps: working at the wrong level of abstraction

There is a story I’ve seen unfold enough times to find disappointing:

A tech company gets its product off the ground with a small handful of developers and a user-friendly fully hosted Platform-as-a-Service (PaaS) solution like Heroku.

The company’s product is a success. A huge one! The company raises money, they scale the team, they iterate. One thing that doesn’t change is the PaaS. It’s working for them. Maybe not as well as they’d like but well enough to keep up with the roadmap.

At some point, costs get way out of hand. The once $1k/month bill has exploded to $40k/month. On top of this, developers are sick of hacking around arbitrary constraints of the PaaS. They learn of the dramatically better performance they can achieve at lower cost if they take greater ownership of their infrastructure.