Programming – Superset Blog https://blog.supersetinc.com The Superset Inc. company blog Sun, 08 Sep 2019 07:04:42 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.25 Fat frameworks and microframeworks https://blog.supersetinc.com/2018/12/30/fat-frameworks-microframeworks/ Sun, 30 Dec 2018 22:09:44 +0000 https://blog.supersetinc.com/?p=614 We recently had to evaluate some development proposals on behalf of a client for a web project of moderate scope. Reviewing the proposals and considering them in the relation to project priorities, we found ourselves weighing the matter of microframeworks vs. more comprehensive (what I will call “fat”) frameworks. I wanted to share some of […]

The post Fat frameworks and microframeworks appeared first on Superset Blog.

]]>
We recently had to evaluate some development proposals on behalf of a client for a web project of moderate scope. Reviewing the proposals and considering them in the relation to project priorities, we found ourselves weighing the matter of microframeworks vs. more comprehensive (what I will call “fat”) frameworks. I wanted to share some of our thoughts in hopes that they might help others in a similar place.

Standards and conventions

Microframeworks, arguably by definition, prescribe fewer conventions for achieving a given functionality. Advocates often consider this freedom a virtue, however strong conventions make it much easier for any other developer to understand and become a productive contributor to your codebase.

Mitigating dependency risk

When using a fat framework, you generally end up relying on fewer libraries. Every dependency is a liability: you never know when something might fall out of maintenance, or when a maintainer might choose to drop compatibility with your framework version or some of your other dependencies. In practice, you end up having one major dependency to keep up with (the fat framework itself), but this is much easier to keep on top of than dozens or hundreds of third-party dependencies.

Fat frameworks like Ruby on Rails also provide mature and well-documented faculties for many peripheral application needs (think things like previewing emails, securely managing credentials, seeding the database). Every peripheral need of this sort would, with a microframework, end up necessitating either some fully custom solution or the integration of a library/tool that would be yet another dependency to document for present and future collaborators. There are also more core application requirements like caching and background job processing that have excellent first-class solutions in most fat frameworks but that you might be left to sort out for yourself if working with a microframework.

Deployment, hosting and tooling

One of the great things about building with a fat framework is that infrastructure dependencies are usually relatively homogenous and known. This means that many PaaS providers have great out-of-the-box solutions for hosting your application. You should also have a reasonably easy time sourcing human experts who can help in this area if that is what you want.

Insurance against abandonment

If you, as a non-developer, have a relatively conventional application in a fat framework like Ruby on Rails, Django or Laravel and your developer decides to disappear into the ether, you likely won’t have too hard a time finding somebody who is able to take over the project and be productive quickly. This is much less likely to be the case with a microframework. While you will certainly be able to find developers familiar with Flask or Gin or Sinatra, whatever templating libraries and whatever frontend libraries and tools might have been used to construct your app, any new developer will likely have a much higher burden to take over the codebase and become productive. They may even push for a rewrite.

These are things I generally keep in mind when selecting a framework. It is not at all to say that microframeworks don’t have a place. They certainly do – for instance in narrowly-scoped services and narrowling-scoped applications exposing simple interfaces. There are also circumstances where performance, memory requirements or niche needs like the ability to hold open tens of thousands or millions of connections on minimally-specced hardware recommend microframeworks. For typical web applications with teams of 1-100 developers, however, I think the case for fat frameworks is strong.

The post Fat frameworks and microframeworks appeared first on Superset Blog.

]]>
On microservices and distributed architectures https://blog.supersetinc.com/2018/09/08/microservices-distributed-architectures/ Sat, 08 Sep 2018 14:36:53 +0000 https://blog.supersetinc.com/?p=609 Like the boiling frog, we often fail to appreciate just how significantly the infrastructure upon which we rely as developers has improved over the last decade. When I began working with the Rails framework in 2010, everything from the hardware we used for local development, to the infrastructure upon which we tested and deployed, was positively anemic […]

The post On microservices and distributed architectures appeared first on Superset Blog.

]]>
Like the boiling frog, we often fail to appreciate just how significantly the infrastructure upon which we rely as developers has improved over the last decade. When I began working with the Rails framework in 2010, everything from the hardware we used for local development, to the infrastructure upon which we tested and deployed, was positively anemic by today’s standard.

My personal laptop, reasonably high-end for the time, had a 5400 RPM spinning disk and 2 GB of RAM. SSDs were exotic, even on servers. Nowadays, you can get bare metal servers with 512gb-1tb of RAM, 2x multi-core CPUs and terabytes of fast SSD storage for a price that is perfectly reasonable for even small companies. Similarly, you can easily and cheaply launch fleets of high-spec virtual servers with providers like Amazon Web Services and DigitalOcean at minutes’ notice.

In many ways, it seems to me that we are often basing architectural decisions on imagined constraints. In my experience, a decision to embrace a microservices architecture should not follow primarily from concerns about scalability.

Typically the burden and overhead of managing several services across several environments (development, testing, QA, production, etc) is a huge multiple of that of managing a more monolithic codebase. Furthermore, scaling a monolithic application, within most practical bounds, is actually often simpler and cheaper than scaling a more distributed app.

From a technical perspective (speaking in this instance of web apps) a monolithic application can scale very naturally. The application tier can scale horizontally to an almost infinite degree by adding more application servers. Particularly high-traffic pages with largely static content can easily be placed behind a reverse proxy cache like Varnish (or a commercially-hosted sibling like Fastly). High-traffic pages with more dynamic content can still have their performance dramatically improved with strategies like fragment caching (using a memory store like Redis or Memcached). Relational databases can scale to immense capacity either in hosted/managed forms (such as Amazon RDS) or hosted on your own hardware. Master-Slave replication schemes can allow database reads to scale in a horizontal manner similar to scaling the application tier. Only extremely write-heavy apps present any significant challenges in this area, and even these scenarios now have a multitude of purpose-built solutions such as Cassandra and Citus (this is also not something that will be overcome any more easily with a microservices solution).

So when should you adopt microservices solutions? To me there are two especially compelling scenarios. One is what I would call the “service bridge” scenario. This would be where you have a niche feature that has a significantly different traffic profile to your larger app and, more importantly, would introduce extremely awkward dependencies to your application tier.

A good example of this might be something like IP geolocation, which could require data sets of hundreds of megabytes or more (assuming something like the Maxmind’s binary data files) that you may not want to shoehorn into your primary application (so as not to bloat your application server). Such a niche dependency might be better implemented as a microservice (though I would argue you would probably be better off delegating to a hosted provider with an API).

Microservices architectures are also well-suited in circumstances where you have a very large organization with many domain-focused teams that would benefit from a very high degree of autonomy. One of the organizations most visibly advocating for and implementing service oriented architectures early on was Amazon (as  wonderfully documented by Steve Yegge in his famous Google Platforms Rant [archive link]). It’s arguable that this vision of service oriented architecture (SOA) is more along the lines of having multiple large, monolithic applications with distinct teams and some data shared, rather than the common understanding of microservices (which is more akin to single applications composed of several small services).

When adopting microservices, be mindful of the unique challenges of the architecture, and have a plan to address them. These should not be incidental concerns but a primary focus from the outset if your team is to thrive. Things such as bootstrapping the development environment and having cohesive QA and versioning practices can be challenging with a microservices architecture. So too can logging and tracing. Many (especially in the context of smaller organizations) take an ad-hoc approach to these issues because they can still manage to make the system function, but oversights of this nature can become serious liabilities at scale.

The critical thing that I hope to convey is that microservices should not be adopted as a default solution for the problem of scaling an application. They can be a great fit for scaling teams and organizations, as well as for wrapping up functionality that it is particularly impractical to fit within your primary application’s deployment. The matter of scaling an application can be addressed extremely effectively with a monolithic codebase and traditional horizontal scale-out methods.

The post On microservices and distributed architectures appeared first on Superset Blog.

]]>
High-performance logging from Nginx to Postgres with Rsyslog https://blog.supersetinc.com/2018/04/09/high-performance-logging-nginx-postgres-using-rsyslog/ https://blog.supersetinc.com/2018/04/09/high-performance-logging-nginx-postgres-using-rsyslog/#comments Mon, 09 Apr 2018 20:37:28 +0000 https://blog.supersetinc.com/?p=513 While there are excellent purpose-built solutions for general log storage and search, such as Librato and the ELK Stack, there are sometimes reasons to write log data directly to Postgres. A good example (and one we have recent experience of at Superset) is storing access log data to present to users for analytics purposes. If the number […]

The post High-performance logging from Nginx to Postgres with Rsyslog appeared first on Superset Blog.

]]>
While there are excellent purpose-built solutions for general log storage and search, such as Librato and the ELK Stack, there are sometimes reasons to write log data directly to Postgres. A good example (and one we have recent experience of at Superset) is storing access log data to present to users for analytics purposes. If the number of records is not explosive, you can benefit greatly from keeping such data closer to your primary application schema in PostgreSQL (even scaling it out to a separate physical node if need be by using something like Foreign Data Wrappers or the multi-database faculties of your application framework).

For our own application, we have a few geographically-distributed Nginx hosts functioning as “dumb” edge servers in a CDN. These Nginx hosts proxy back to object storage to serve up large media files to users. For each file access we want to record analytics information such as IP, user agent, and a client identifier (via the Nginx userd module, to differentiate unique users to the extent that it is possible).

For the purposes of this article we will go through the manual steps of configuration (on Ubuntu Server) though you should absolutely be using a configuration management tool such as Ansible to automate these steps in a production, as we have.

The dependencies for this logging configuration are nginx-extras (which includes the userid module for analytics) and rsyslog-pgsql (the package for the Rsyslog Postgres output module, which is not part of the default Rsyslog install). You can install these with apt (either manually or via Ansible’s apt module):

sudo apt-get install nginx-extras rsyslog-pgsql

Ubuntu should have a top-level Rsyslog configuration file at /etc/rsyslog.conf which should end with the line:

# ...
$IncludeConfig /etc/rsyslog.d/*.conf

This instructs the Rsyslog daemon to pull in any configuration files contained in the directory /etc/rsyslog.d when it loads. We will use this to set up a special configuration to pull-in and forward our formatted nginx logs momentarily. First, let’s configure nginx to log a json payload to a unique log file.

Ubuntu’s standard nginx configuration pulls in per-site config files from the /etc/nginx/sites-available (and assumes you have sym-linked the configurations for sites you want to go live to /etc/nginx/sites-enabled/). For this example, we’ll assume a configuration for mysite.com in /etc/nginx/sites-available/mysite.com.conf:

# in /etc/nginx/sites-available/yoursite.com:

log_format json_combined '{"time_local": "$time_local", '
   '"path": "$request_uri", '   
   '"ip": "$remote_addr", '
   '"time": "$time_iso8601", '
   '"user_agent": "$http_user_agent", '
   '"user_id_got": "$uid_got", '
   '"user_id_set": "$uid_set", '
   '"remote_user": "$remote_user", '
   '"request": "$request", '
   '"status": "$status", '
   '"body_bytes_sent": "$body_bytes_sent", '
   '"request_time": "$request_time", '
   '"http_referrer": "$http_referer" }';

server {    
  listen 80;
  # + SSL configuration...

  # Optional: Nginx userid module, useful for analytics.
  # (see http://nginx.org/en/docs/http/ngx_http_userid_module.html)
  userid on;
  userid_name uid;
  userid_expires 365d;

  server_name yoursite.com;
  # any additional server-level configuration such as site root, etc...
 
  location / {
    access_log /var/log/nginx/yoursite.com/access.log json_combined;
  } 
  # You will probably want to add some gzip, cache, etc standard header rules for performance...
}

And load the Nginx configuration with:

sudo service nginx reload

We now have Nginx writing a one-off access log with a json-formatted payload to /var/log/nginx/yoursite.com/access.log. We can configure Rsyslog to read this log file using the imfile module. In the past imfile defaulted to a polling mode but today defaults to filesystem events and is very performant and effectively real-time. With the imfile module reading the log file on Rsyslog’s behalf, we can then forward the log file data to Postgres using the Rsyslog ompgsql (PostgreSQL Database Output) module. The combined configuration is as follows:

# Load the imfile input module
module(load="imfile") # Load the imfile input module

input(type="imfile"
      File="/var/log/nginx/yoursite.com/access.log"
      Tag="yoursite:")

# Load the ompgsql output module
module(load="ompgsql")

# Define a template for row insertion of your data.
# The template below assumes you have a table called
# "access_log" and are inserting columns named 
# "log_line" (with the log payload) and "created_at" (with the timestamp).
template(name="sql-syslog" type="list" option.sql="on") {
  constant(value="INSERT INTO access_log (log_line, created_at) values ('")
  property(name="msg")
  constant(value="','")
  property(name="timereported" dateformat="pgsql" date.inUTC="on")
  constant(value="')")
}      

# The output "action". This line instructs rsyslog
# to check if the log line is tagged "yoursite:" (a tag
# which we set with the imfile module configuration above)
# and if so to use the sql-syslog template we defined
# above to insert it into Postgres.
if( $syslogtag == 'yoursite:')  then {
  action(type="ompgsql" server="{{ postgres_host }}"
        user="{{ postgres_user }}" 
        pass="{{ postgres_password }}"
        db="{{ postgres_db_name }}"
        template="sql-syslog"
        queue.type="linkedList")
}

You will want to name this file something like /etc/rsyslog.d/51-yoursite.conf, since Rsyslog loads config files in alphabetical order and on Ubuntu has a default configuration file in the same directory called 50-default.conf. It probably goes without saying but the ompgsql “action” line in the configuration above is using mock templatized credentials (I can recommend Ansible Vault for managing/templating credentials such as these in production). I should also note that as Rsyslog is a very long-lived project, it supports several different configuration file formats. The example above is using the “advanced” (a.k.a. “RainerScript”) format because I find this to be by far the most readable. Once you have saved the above log file, you will need to restart the Rsyslog daemon for it to take effect:

sudo service rsyslog restart

The above configuration should be pretty performant as the “linkedList” queue.type argument supplied to the ompgsql action is instructing Rsyslog to buffer/batch its writes to Postgres. You can read about the performance tweaking that is available for ompgsql in an excellent article, “Handling a massive syslog database insert rate with Rsyslog“, which was written by Rainer Gerhards himself (primary author of Rsyslog).

The post High-performance logging from Nginx to Postgres with Rsyslog appeared first on Superset Blog.

]]>
https://blog.supersetinc.com/2018/04/09/high-performance-logging-nginx-postgres-using-rsyslog/feed/ 1
A review of Sendy – driving the cost of maintaining your mailing list towards zero https://blog.supersetinc.com/2018/02/24/sendy/ Sat, 24 Feb 2018 00:20:20 +0000 https://blog.supersetinc.com/?p=422 I’ve been using Sendy on several of my sites over the last few years and wanted to write a review of it, since most of the reviews I’ve seen out there neglect a lot of what I think are its best features. Sendy is a sort of self-hosted Mailchimp. It is a PHP application with […]

The post A review of Sendy – driving the cost of maintaining your mailing list towards zero appeared first on Superset Blog.

]]>
I’ve been using Sendy on several of my sites over the last few years and wanted to write a review of it, since most of the reviews I’ve seen out there neglect a lot of what I think are its best features.

Sendy is a sort of self-hosted Mailchimp. It is a PHP application with very few dependencies (MySQL and a PHP-compatible web server like Apache) that you install on your own server. You pay a one-time fee of $59 US for a perpetual license to your major version. Major versions updates (of which there has only been one in the ~3 years I’ve held my Sendy license) are $29.

Sendy offers the sort of functionality that I consider table stakes with most email list software. Among its features:

  • Letting you manage multiple distinct brands
  • Unlimited email lists
  • Unlimited email deliveries (possibly bound by your email service provider, see below)
  • Sign users up through a form hosted in your Sendy install, an API, or through a multitude of embeddable widgets (such as Sendy Widget for WordPress)
  • Optionally require users to go through a double-opt-in flow
  • Templating system with built-in tags (i.e. “unsubscribe”, “name”, “email”, “date”, etc.) and custom tags (similar to Mailchimp’s “merge tags”)
  • Transactional email sending (…sort of – see “API and transactional emails” section below)
  • Open and click tracking
  • Reporting of deliverability-related stats like unsubscribe rate, bounces, “marked as spam”, etc
  • Integration with Amazon Simple Email Service (SES) AND any email service provider that exposes an SMTP interface (which is nearly all)

Sendy built its identity on the stellar pricing and deliverability record of Amazon’s SES (to wit, Sendy’s homepage proclaims: “Send newsletters, 100x cheaper via Amazon SES”). The full pricing for SES can be found here but it is around 10 cents US ($0.10) for every 1,000 emails sent. If you host your Sendy install on an EC2 instance then you also get the first 62,000 emails free per month.

Sendy campaign report

Believe it or not, I actually prefer not to use Amazon SES with Sendy. I find dealing with SES unappealingly bureaucratic: raising of delivery limits (even from 0 to 100/day) must be done through a ticketing system with manual review, which I find annoying. I now prefer Mailgun as an upstream email service provider for Sendy. Mailgun gives you 10,000 emails a month for free, then tiered pricing for amounts above that. You can even host your own Postfix server if you have the energy to do so (you can find a guide to setting up SPF and DKIM with Postfix here) but I think in most cases a paid provider will wind up being so cheap that it’s not worth the hassle to bother doing this.

Most email service providers have tiered pricing so it is difficult for me to do an accurate comparison matrix, but I have linked the pricing pages for several of the more popular providers below:

Click through any of those links and you should get a sense for approximately what you will end up realistically spending on email delivery at your particular volume when using Sendy.

One thing worth noting is that because hosted providers like Mailchimp, AWeber, Constant Contact, etc. work zealously on compliance, their IPs are often not just kept off of the blacklist, but they are often in fact whitelisted by spam filters almost completely assuring delivery.  I am not sure that you will always have this assurance with the SMTP hosts listed above so it is something to take into account, though they and Sendy will both give you deliverability stats and my own deliverability has been near perfect so far.

Sendy’s Features

So far we’ve focused mostly on the cost of email delivery with Sendy. Truth is this is what most people who have taken a look at Sendy are interested in. Most people know instinctively that the pricing of fully hosted/integrated email marketing solutions like MailChimp and AWeber is high, even for subscriber counts in the low thousands. In my opinion, however, you do get great value from such platforms as far as campaigns being able to achieve a very professional appearance and function even with relatively non-technical administrators.

Campaign and template editor

Both MailChimp and AWeber have highly polished default templates, and template editors that are very advanced. This is something that I really think is important to have in mind when considering Sendy.

You can certainly construct highly polished and professional campaigns with Sendy, but it will require some technical knowledge. This is because there are zero supplied default templates, and the campaign/template editing interfaces are basic compared to those of hosted services. To achieve professional results, in my opinion, you will likely need to get a hold of third party email templates (such as through ThemeForest) or code your templates in some external environment (believe it or not, Dreamweaver is still an industry standard in this area).

Sendy’s campaign edit view – WYSIWYG editor

Sendy’s campaign edit view – HTML panel

Accordingly, you probably want somebody moderately technical and comfortable with HTML email templates and external editing environments involved in managing your campaigns if you decide to go with Sendy and want highly polished campaigns.

List management

Sendy allows unlimited lists and list subscribers. I find the list management functionality to be pretty good and to stack up with hosted commercial services. There is good high level reporting (with charts) as well as support for custom fields and list segmentation. You can easily export lists by CSV through the interface should you ever have a need to switch to a different service.

Sendy offers a ready-to-use embeddable form from the list administration interface. This form is similar to the embeddable Mailchimp forms that you will find widely used on the internet. It is simply a very basic HTML form pointing to a endpoint on your Sendy host and with a parameter indicating the list.

Double opt-in is supported and recommended for protecting reputation/deliverability.

Analytics and reporting

Sendy has comprehensive reporting throughout. Once you have sent a campaign, you will get a well-structured graphical report encompassing opens, specific link clicks, bounces, unsubscribes and even a geographical breakdown. Furthermore all of this data can be easily exported in CSV format.

You get similar reporting at the list level. For each of your mailing lists you get a rich graphical representation of active subscribers, subscribers who never confirmed, unsubscribes, bounces, and incidences of being marked as spam.

Sendy’s list-level stats UI

The suite of reporting in Sendy is comprehensive enough for you to keep on top of campaign success and all the info that might impact your sending account’s health/reputation.

Autoresponders

Sendy’s autoresponder functionality is basic but good. For a given list, you can create a flexible autoresponder sequence. The interface for editing individual emails in the sequence is essentially the same as the template and campaign editing interfaces:

Autoresponder editor

The sequencing interface is simple and flexible, giving you the ability to set delays in a very granular way, with intervals in minutes, hours, days, months, etc:

Adding an autoresponder in Sendy

Sendy has flexible sequence/delay configuration for autoresponders

I do feel that the autoresponder functionality built into Sendy is not quite comparable to bonafide “marketing automation” features you might find in commercial software (i.e. things like rule-based “workflows” that you find in competing products). You can certainly hack this sort of functionality via the API but won’t be a first-class feature as with other products.

API and transactional emails

Sendy exposes functionality via an authenticated API (see documentation here). This is very useful if you have an interest in things like creating highly-optimized sign up experiences, integrating with ecommerce frameworks, or doing advanced list management taking into account factors outside of Sendy (like registrants’ recent purchases or activity on your site).

The API also allows …sort of… triggered/transactional email functionality (by which we mean having your external application trigger an email which is templated and delivered by Sendy). Unfortunately the support for transactional emails is not first-class, but the functionality can be achieved through a workaround:

  • Create a list to correspond only to the transactional email
  • Create a single autoresponder for the list with a delay setting of “Send email immediately”
  • When you intend to trigger the transactional email, add a user to the list through the API with tags to pass any data you need to to the template.

I’m honestly not sure whether this hack will work if the same email needs to be triggered multiple times for the same customer, however (without removing them from and re-adding them to the list) – such as you might want for an order confirmation email where a customer may purchase several times over from you using the same email.

Sendy’s absence of first class transactional email functionality never bothered me because I have mostly used the tool for maintaining ordinary mailing lists. Unless you actually need consolidated reporting, it is usually much more robust to template transactional emails in the application they are coming from and deliver them directly through SMTP (this is what things like WordPress, Megento and any Ruby on Rails application utilizing the stock ActionMailer library will do by default).

Summary

I love Sendy for my own needs and I believe it could suit many others’ needs. I do think that anybody considering Sendy should take into account their priorities. If you are a commercial operation for whom the $10s-$100s (or more) a month you might spend on a more expensive alternative are a drop in the bucket, you may want to go that route as I believe you will have an easier time achieving polished results and stellar deliverability and you will probably save quite a lot of time overall. Similarly I’d say that if you are relatively non-technical and have a desire for very polished campaigns you also might be best off with tools like Mailchimp so long as your list isn’t in the many hundreds of thousands (see here for pricing of Mailchimp’s “Growing Business” plan at various volumes – you can also use Mailchimp’s “Forever Free” plan as long as you have fewer than 2,000 subscribers).

I think Sendy is a great fit for cost-conscious startups and semi-technical people who have the wherewithal to deal with it’s somewhat more DIY aspects and don’t want to worry about lock-in to their email provider. Even though it is not a truly Free Open Source project, I think there is sort of an analogy that could be made to WordPress: Sendy probably appeals to the sort of people who opt to host WordPress on their own servers vs. using WordPress.com, Medium, etc.

Sendy has a live demo that you can log into and play with test campaigns. I think that when you try Sendy it becomes immediately clear whether it’s going to work for you or not. I encourage you to try the demo as the experience is really identical to the experience of using your own live Sendy install. While I wish Sendy offered something along the lines of a 30 day free trial, they do say on the site that “If it doesn’t work out, we’ll refund you”. For my purposes I have been very happy with Sendy.

The post A review of Sendy – driving the cost of maintaining your mailing list towards zero appeared first on Superset Blog.

]]>
Implementing flexible, Stripe-style API authentication in Ruby on Rails https://blog.supersetinc.com/2018/01/18/flexible-stripe-style-api-client-authentication-ruby-rails-applications/ Thu, 18 Jan 2018 00:05:54 +0000 http://blog.supersetinc.com/?p=322 Having dealt with many uncomfortable API authentication schemes in the last decade, I wanted to share the scheme that we’ve found most sustainable for our Ruby on Rails applications. This scheme aligns with that used by services like Stripe. It allows clients to authenticate either via HTTP Basic Auth or an X-AUTH-TOKEN custom header. What […]

The post Implementing flexible, Stripe-style API authentication in Ruby on Rails appeared first on Superset Blog.

]]>
Having dealt with many uncomfortable API authentication schemes in the last decade, I wanted to share the scheme that we’ve found most sustainable for our Ruby on Rails applications. This scheme aligns with that used by services like Stripe. It allows clients to authenticate either via HTTP Basic Auth or an X-AUTH-TOKEN custom header.

What it will look like

API interactions with the Authentication scheme we are implementing will look as follows:

curl https://yourapp.com/api/resource/123 \
   -u sk_test_8d7cd4bc9a8cdfe635d3881d4fc1439d:

In using the -u flag, curl is authenticating via HTTP Basic Auth. The client’s API key can also be passed via an X-AUTH-TOKEN header, which you may find more convenient for implementing clients depending on the features and conventions of the underlying HTTP library you are using.

The basics of Basic Auth

HTTP Basic Auth works via the Authorization header. This header takes the following form:

Authorization: <type> <credentials>

In the case of HTTP Basic Auth, the type component of the header value will be set to Basic. The credentials component is a Base64-encoded string containing the username and the password, separated by a colon character. In our case, we are actually disregarding the password portion and assuming that the user portion is the API key. By placing a colon at the end of the credentials we passed with the -u flag, we are telling curl to pass the key as the user portion of the Authentication payload and an empty string as the password portion.

When we make the request:

curl https://yourapp.com/api/resource/123 \
 -u sk_test_8d7cd4bc9a8cdfe635d3881d4fc1439d:

The server will see the following:

Authorization: Basic c2tfdGVzdF84ZDdjZDRiYzlhOGNkZmU2MzVkMzg4MWQ0ZmMxNDM5ZDo=

If we Base64-decode the credentials part, this expands to:

Authorization: Basic sk_test_8d7cd4bc9a8cdfe635d3881d4fc1439d:

Controller-level authentication

With these building blocks, we can implement a controller-level API authentication scheme. We will assume an ApiKey model with a value column containing the actual key (sk_test_8d7cd4bc9a8cdfe635d3881d4fc1439d in our example).

We will support passing of the key via HTTP Basic auth and also an X-AUTH-TOKEN header. We can therefore implement an ApiController that performs an authorization based upon these headers:

class ApiController < ApplicationController
  skip_before_action :verify_authenticity_token # Disable CSRF protection
  before_action :authenticate_client

  private

  def authenticate_client
    key = if request.headers['Authorization'].present?
            Base64.decode64(request.headers['Authorization'].split(' ').last).remove(':')
          else
            request.headers['X-AUTH-TOKEN']
          end
    @api_key = ApiKey.find_by value: key
    
    # Now raise an exception if the API key does not exist.
    # We are assuming the use of the CanCanCan authorization library
    # here but you can use any method you wish.
    raise CanCan::AccessDenied if @api_key.nil?
  end
end

For every action in this controller or any inheriting controller, you can now be assured of the existence of a valid @api_key in scope. You can use this @api_key to perform further authorization (such as by tying it to a company or user and implementing a current_ability method in the case of CanCanCan) and for scoping of resources.

Wrap up

I hope that this article gave some high-level guidance to how to approach Header-based authentication for APIs in Ruby on Rails. This approach is an excellent fit for most authenticated APIs (APIs where you can expect that client credentials will live on a secured host).

For circumstances where credentials cannot be kept private (such as when calls must be made directly from the browser, possibly in a different domain than your application, or when you are providing resource access on behalf of your users to third parties), you may require something like an OAuth scheme instead. In a future post we will be taking a look at this approach as well.

The post Implementing flexible, Stripe-style API authentication in Ruby on Rails appeared first on Superset Blog.

]]>
An intro to Stimulus JS: well-factored JavaScript for server-rendered applications https://blog.supersetinc.com/2018/01/11/well-factored-javascript-server-rendered-applications-using-stimulus/ https://blog.supersetinc.com/2018/01/11/well-factored-javascript-server-rendered-applications-using-stimulus/#comments Thu, 11 Jan 2018 03:08:41 +0000 http://blog.supersetinc.com/?p=233 Stimulus is a JavaScript framework from Basecamp that provides consistent conventions and hooks for JavaScript that manipulates the DOM in server-rendered applications. It aims to fill some gaps that have always existed for developers who embrace a traditional server-rendered paradigm (and who may also be using libraries like Turbolinks or PJAX) but who also need to […]

The post An intro to Stimulus JS: well-factored JavaScript for server-rendered applications appeared first on Superset Blog.

]]>
Stimulus is a JavaScript framework from Basecamp that provides consistent conventions and hooks for JavaScript that manipulates the DOM in server-rendered applications. It aims to fill some gaps that have always existed for developers who embrace a traditional server-rendered paradigm (and who may also be using libraries like Turbolinks or PJAX) but who also need to integrate one-off functionality in JavaScript. Stimulus does not at all concern itself with client-side (nor isomorphic) rendering and emphatically does not aim to be a heavy-client app framework like React, Angular or Vue.js. Instead it was created to support apps that are server-rendered first, and that rely on custom JavaScript where appropriate for UI enhancement.

In the existing paradigm, at least as concerns Ruby on Rails applications, it has been up to the developer to decide how to structure whatever custom JavaScript they have beyond UJS and SRJ responses. Stimulus gives us strong conventions to replace the ad-hoc or custom/bespoke approaches we may have previously taken to such work.

Getting started

Stimulus is distributed as an npm package and assumes that you will be using a JavaScript toolchain to load it into your application. If you are using Ruby on Rails, the conventional path here is to use Webpacker. If you’re starting a Rails 5.1+ application from scratch, you can have this done for you automatically by supplying the --webpack flag to rails new:

rails new your_app_name --webpack

If you have an existing Rails application that does not yet utilize the webpacker buildchain, you will need to manually add gem 'webpacker' to your Gemfile. Next, whether you created the app with the --webpack flag or not, you will have to run:

bundle exec rails webpacker:install

This will bootstrap the project’s package.json file with an appropriate "@rails/webpacker" client-side dependency and "webpack-dev-server" development dependency to complement the Ruby library.

Once you’ve installed and configured Webpacker in this way, you can add Stimulus as a dependency as well in package.json and then run yarn install:

{
  "name": "stimulus-demo",
  "private": true,
  "dependencies": {
    "@rails/webpacker": "^3.2.2",
    "stimulus": "^1.0.1"
  },
  "devDependencies": {
    "webpack-dev-server": "^2.11.1"
  }
}

Getting into the code

In the interest of making an example that is realistic and not contrived, I’ve decided to implement a slider-based rating widget, something that would not have fit neatly into a UJS/SJR paradigm. We’ll be building a shell of this component entirely in the DOM, and then adding the critical interactivity using a Stimulus controller.

The rating widget will consist of a custom slider component, along with a large text display of the percent-rating the user has chosen. It will look like this when done:

Note that the slider component is actually a combination of a range input (with styled handle and hidden track) placed atop a progress-bar type element. This is done to have more control over the fidelity of the UI than the CSS track pseudoclasses (::-webkit-slider-runnable-track and its other vendor-prefix brethren) permit. In this way it is a very realistic example of how one might use Stimulus when a high degree of UI fidelity and interactivity is required.

We’ll assume a very simple Rating domain model consisting of just an integer value and timestamps:

# schema migration
class CreateRatings < ActiveRecord::Migration[5.1]
  def change
    create_table :ratings do |t|
      t.integer :value, default: 50
      t.timestamps
    end
  end
end

class Rating < ApplicationRecord
  validates :value, presence: true, inclusion: 1..100
end

We can now create a simple erb partial for the Stimulus-backed rating widget:

<div class='rating-widget' data-controller='rating-widget'>
  <%= form_for @rating do |f| %>
    <div class='number-display' data-target='rating-widget.numberDisplay'>
      <%= f.object.value %>%
    </div>
    
    <div class='rating-bar'>
      <%= f.range_field :value, in: 1..100, 
          class: 'rating-slider', 
          data: {
            target: 'rating-widget.slider', 
            action: 'input->rating-widget#valueChanged'
          } 
      %>
      <div class='rating-bar-inner' data-target='rating-widget.innerBar' style="width: <%= f.object.value %>%;"></div>
    </div>

    <%= f.submit "Save Rating", class: 'submit-rating' %>      
  <% end %>
</div>

You will note the data-controller, data-target and data-action attributes present in this markup. These are Stimulus directives. Stimulus looks for data-controller directives within the page’s markup and uses these to bind the appropriate Stimulus controller class. By the convention of the library, if you have supplied the controller name “something-controller” in the attribute, Stimulus will look for the controller implementation in a file named either something_controller.js or something-controller.js.

Loading the library

When we ran the webpack:install task, Webpacker created a app/javascript/packs directory and dropped an application.js file into it. We can use this to load Stimulus:

# in your app/javascript/packs/application.js

import { Application } from "stimulus"
import { definitionsFromContext } from "stimulus/webpack-helpers"

const application = Application.start()
const context = require.context("./controllers", true, /\.js$/)
application.load(definitionsFromContext(context))

We must also load the application pack into our page using the javascript_pack_tag helper:

<!-- ...in your application layout head or footer... -->
  <%= javascript_pack_tag 'application' %>
<!-- ... -->

Note: you will also need to run the Webpack dev server, which you can do via a webpacker-provided binstub: bin/webpack-dev-server.

Creating the controller

With Stimulus’ autoloading initialized in this way and loaded into your page with Webpacker, all you need to do to load or initialize your controllers is to follow Stimulus’ naming conventions. We can now create a controller class in app/javascript/packs/controllers to bind to our rating widget:

// in app/javascript/packs/controllers/rating_widget_controller.js

import { Controller } from 'stimulus'

export default class extends Controller {
  valueChanged() {
    this.numberDisplay.textContent = this.innerBar.style.width = `${this.rating}%`;
  }

  get rating() {
    return parseInt(this.slider.value);
  }

  get slider() {
    return this.targets.find('slider');
  }

  get numberDisplay() {
    return this.targets.find('numberDisplay');
  }

  get innerBar() {
    return this.targets.find('innerBar'); 
  }  
}

What is wonderful about Stimulus is that a quick look at the original template markup adds remarkable clarity to the bindings and behavior going on in this JavaScript class:

<div class='rating-widget' data-controller='rating-widget'>
  <%= form_for @rating do |f| %>
    <div class='number-display' data-target='rating-widget.numberDisplay'>
      <%= f.object.value %>%
    </div>
    
    <div class='rating-bar'>
      <%= f.range_field :value, in: 1..100, 
          class: 'rating-slider', 
          data: {
            target: 'rating-widget.slider', 
            action: 'input->rating-widget#valueChanged'
          } 
      %>
      <div class='rating-bar-inner' data-target='rating-widget.innerBar' style="width: <%= f.object.value %>%;"></div>
    </div>

    <%= f.submit "Save Rating", class: 'submit-rating' %>      
  <% end %>
</div>

The controller binding itself occurs in the top-level element with the data-controller="rating-widget". The other elements that we are accessing via slider/numberDisplay/innerBar getters had been clearly marked as Stimulus targets with the data-target attribute in the markup. Finally, the bit of interactivity that we have (valueChanged function) was transparently bound using the attribute data-action="input->rating-widget#valueChanged". Note: the input-> portion of the attribute value is a directive to Stimulus to bind to the oninput DOM event (which is called any time the slider moves, vs onchange which would trigger only when the control has been let go of and the value has changed).

After adding a bit of styling, we will have the fully interactive component that we had shown earlier in this article:

Moving the slider causes the rating percent text to update and also moves the fat progress bar behind the slider. The bindings are crystal-clear and, save for the matter of getting the Webpack configuration set up in the first place, there is very little boilerplate.

Conclusion

This was a rather basic feature that we built, but it highlights the central concepts of the Stimulus library (controller, target and action bindings) as well as the steps necessary to fully integrate the library into a Ruby on Rails application. It is important to note that Stimulus is not an outright replacement for the asset pipeline. While it is wonderful to finally have a library that strengthens conventions around JavaScript components in our server-rendered web apps, you may still find it practical to lean on the asset pipeline, UJS and SJR for much of your app’s basic interactivity. That said, for the bits that don’t neatly fit those paradigms, it’s great to finally have something like Stimulus.

The post An intro to Stimulus JS: well-factored JavaScript for server-rendered applications appeared first on Superset Blog.

]]>
https://blog.supersetinc.com/2018/01/11/well-factored-javascript-server-rendered-applications-using-stimulus/feed/ 2
Dedicated vs. Cloud vs. VPS vs. PaaS – a value comparison https://blog.supersetinc.com/2017/12/31/dedicated-vs-cloud-vs-vps-vs-paas-value-comparison/ Sun, 31 Dec 2017 03:59:03 +0000 http://blog.supersetinc.com/?p=151 I find myself increasingly evangelizing for awareness of infrastructure and ops practices because I believe that such awareness (or lack thereof) has cascading effects for application architecture and, ultimately, company success. Understanding the relative value of platforms can keep you on a path of rapid execution. Misunderstanding or neglecting it can get you into very […]

The post Dedicated vs. Cloud vs. VPS vs. PaaS – a value comparison appeared first on Superset Blog.

]]>
I find myself increasingly evangelizing for awareness of infrastructure and ops practices because I believe that such awareness (or lack thereof) has cascading effects for application architecture and, ultimately, company success. Understanding the relative value of platforms can keep you on a path of rapid execution. Misunderstanding or neglecting it can get you into very dire situations.

I see many teams break out their applications into services prematurely, with immense consequence in terms of cognitive overhead and loss of velocity. Typically the decision is a consequence of a perception that they have hit a performance ceiling, when in fact they are still on some relatively weakly-provisioned PaaS.

I want to do a detailed value comparison to aid others in making informed infrastructure decisions.

For the purposes of this comparison, we will consider two scenarios: what I would call a mid/moderate-scale app, with 60GB of data in persistence storage and a requirement to handle 200 requests/second, and what I will call a higher-scale app, with 1 TB of storage in use and a requirement to handle 3,000 requests/second.

We will also assume a relatively monolithic 12-factor application architecture, with an application tier, an application caching tier and a persistence tier.

There is necessarily a lot of variability in the performance profile and resource usage of applications. For the purpose of this comparison, we will assume a Ruby on Rails application running atop the Passenger web application server. We will assume each application process consumes 300MB, and we will use at most 88% of the available memory on each application server (which means we would fit 3 worker processes on an application server with 1024MB of RAM). The mid-scale application will therefore require 10 application processes (20x reqs/second/process x 10 processes = 200 requests/second), and the high-scale application will require 150 application processes.

We will assume the application is tuned well enough to average 50ms server times. This would mean that we can assume each server process we have active can roughly accommodate 20 reqs/second.

This makes the cost estimates in this comparison optimistic, as it does not account for sub-optimal request routing/load-balancing (particularly relevant to Heroku) nor the fact that very few applications will have a performance profile that reliably keeps requests at a 50ms/second average server time.

Category 1: PaaS

Note: based on our resource utilization profile outlined above, we assume that a Standard 2x Dyno can accommodate 3 worker processes, a Performance M Dyno can accommodate 7 processes (in 88% = 2.2GB of its total 2.5GB of memory) and a Performance L Dyno can accommodate 41 processes.

Mid-scale configuration High-scale configuration
Application Tier
  • $200-$1,250/mo
    4 Standard 2x Dynos ($200/mo)
    2x Performance M Dynos ($500/mo)
    1x Performance L Dyno ($500/mo)
Application Tier
  • $6,500-$17,750/mo
    22x Performance M ($5,550/mo)
    4x Performance L ($2,000/mo)
Persistence Tier
  • $50-$200/mo
    Heroku Postgres Standard 0 Plan ($50/mo)
    Heroku Postgres Standard 2 Plan ($200/mo)
    Heroku Postgres Premium 0 Plan ($200/mo)
Persistence Tier
  • $2,000-$3,500/mo
    Heroku Postgres Standard 6 Plan ($2000/mo)
    Heroku Postgres Standard 7 Plan ($3500/mo)
    Heroku Postgres Premium 6 Plan ($3500/mo)
Caching Tier
  • $30-$60/mo
    Heroku Redis Premium 1 Plan ($30/mo)
    Heroku Redis Premium 2 Plan ($60/mo)
Caching Tier
  • $200-$750/mo
    Heroku Redis Premium 5 Plan ($200/mo)
    Heroku Redis Premium 7 Plan ($750/mo)
Total Cost
$280 – $1,510/mo.
Total Cost
$8,700 – $22,000/mo.

Category 2: VPS

For our VPS survey we will have a look at two of the leading VPS providers: Linode and DigitalOcean. Linode’s pricing is quite a lot better than DigitalOcean’s, but they have a public history of DDoS episodes that have taken major sites offline and an arguably less rich API and admin experience. I still consider Linode highly viable and think it is worth taking a close look if at both providers’ offerings if you are considering moving your infrastructure to VPS.

Linode

Mid-scale configuration High-scale configuration
Application Tier
  • $20-80/mo
    1x 4GB Linode ($20/mo)
    2x 4GB Linode ($40/mo)
    1x 8GB Linode ($40/mo)
    2x 8GB Linode ($80/mo)
Application Tier
  • $240/mo
    4x 16GB Linode ($240/mo)
    2x 32GB Linode ($240/mo)
Persistence Tier
  • $40-$80/mo
    1x 8GB Linode ($40/mo)
    1x 12GB Linode ($80/mo)
Persistence Tier
  • $160-480/mo
    1x Linode 24GB ($160/mo)
    1x Linode 64GB ($480/mo)
Caching Tier
  • $20/mo
    1x 4GB Linode ($20/mo)
Caching Tier
  • $60-$120/mo
    1x 16GB High-memory Linode ($60/mo)
    2x 8GB Linode ($80/mo)
    2x 16GB High-memory Linode ($120/mo)
Total Cost
$80 – $180/mo.
Total Cost
$460 – $840/mo.

 

DigitalOcean

Mid-scale configuration High-scale configuration
Application Tier
  • $20-80/mo
    1x 4GB Droplet ($20/mo)
    2x 4GB Droplet ($40/mo)
    1x 8GB Droplet ($40/mo)
    2x 8GB Droplet ($80/mo)
Application Tier
  • $320/mo
    4x 16GB Droplet ($320/mo)
    2x 32GB Droplet ($320/mo)
Persistence Tier
  • $40-$80/mo
    1x 8GB Droplet ($40/mo)
    1x 16GB Droplet ($80/mo)
Persistence Tier
  • $180-$260/mo
    1x 16GB Droplet + 1TB Block Storage ($180/mo)
    1x 32GB High-CPU Droplet + 1TB Block Storage ($260/mo)
Caching Tier
  • $20/mo
    1x 4GB Droplet ($20/mo)
Caching Tier
  • $40-$80/mo
    1x 8 GB Droplet ($40/mo)
    2x 8 GB Droplet ($80/mo)
    1x 16GB Memory-optimized Droplet ($80/mo)
Total Cost
$80 – $180/mo.
Total Cost
$540 – $660/mo.

Category 3: Cloud

For Cloud, we will consider configurations on Amazon Web Services. We will keep things roughly aligned with our VPS specs. It is important to note that opting for Reserved pricing can realistically lower your AWS costs as much as 40-45%, but you must have a rough idea of what your resource utilization will be at least one year forward, so we will consider both reserved and On-Demand pricing here. We have used 40% as a rough benchmark since the precise calculations will be unique to your specific service selection and whether you are prepaying or not.

Mid-scale configuration High-scale configuration
Application Tier
  • ~$88-~$156/mo
    1x t2.large instance (8GB memory)($67.93/mo)
    2x t2.medium instances (4GB memory) ($67.94/mo)
    2x t2.large instance (8GB memory)($135.86/mo)
    50GB General Purpose SSD EBS volume + snapshot storage (~$20/mo)
Application Tier
  • ~$563/mo
    4x t2.xlarge (16GB) instances ($543.44/mo)
    2x t2.2xlarge (32GB) instances ($543.44/mo)
    50GB General Purpose SSD EBS volume + snapshot storage (~$20/mo)
Persistence Tier
  • ~$148/mo
    db.m4.large (8GB) RDS PostgreSQL instance with 50GB storage + 100 GB backup storage ($148.48/mo)
Persistence Tier
  • ~$837/mo
    1x db.m4.2xlarge (32GB) RDS PostgreSQL instance with 1TB storage (+ 2TB backup storage) ($837.06/mo)
Caching Tier
  • ~$66-~$133/mo
    1x cache.m3.medium (2.78GB) Elasticache (Redis) node ($65.88/mo)
    1x cache.m3.large (6.05GB) Elasticache (Redis) node ($133.23/mo)
Caching Tier
  • ~$266.45-~$532.90/mo
    1x cache.m3.xlarge (13.3GB) Elasticache (Redis) node High-memory Linode ($266.45/mo)
    2x cache.m3.xlarge (13.3GB) Elasticache (Redis) node High-memory Linode ($532.90/mo)
Total Cost
~$322 – $375/mo.
With reserved pricing
~$193 – ~$225
Total Cost
~$1,686 – $1,953/mo.
With reserved pricing
~$1,012 – $1,172/mo.

* note: we have not been very specific in calculations concerning block storage (EBS) costs at the application tier. The requirements at this tier (if you are using block storage at all) are likely to be insubstantial (~50GB). Opting for provisioned IOPS may raise your costs into the low $100s. If relying on General Purpose SSD + ephemeral storage at the instance level you will likely pay on the order of $20 or less as assumed here.

Category 4: Dedicated

It is not wholly straightforward to price a dedicated configuration as dedicated servers can be so generously provisioned that at moderate scale (such as our “mid-scale” application) it often makes sense to run all services on a single host. “High-availability” multi-host configurations are also possible and appropriate at larger scales or in contexts that require high availability. We will assume a high-availability, multi-host configuration for our “high-scale” application and a single-host configuration for the “mid-scale” application. We will consider both the budget host OVH (good value but spotty reputation for support and even uptime) and the slightly costlier Liquid Web (US-based managed host).

OVH

OVH offers good value dedicated hosting in several datacenters around the world. Their BHS/Quebec location is most relevant for American companies. There are some big caveat to OVH’s dedicated offering and it may not suit all applications. They have a history of leaving customers relatively on their own to sort out everything short of hardware failure. I also once firsthand experienced an episode of hours-long downtime due to a road accident that severed the datacenter’s fiber line (it is worrying that there was neither better fortification nor sufficient redundancy to keep hosts online). This one episode is the only major one I experienced however and it is well known that AWS services also go down, sometimes for similar periods of time, and sometimes even across all availability zones (such as memorably occurred with S3 in 2016).

Mid-scale configuration High-scale configuration
Application Tier
  • $89.99/mo
    1x HOST-32L, Xeon D-1520 (4 core/8 thread), 32GB RAM with
    2x480GB SSD ($89.99/mo)
Application Tier
  • $179.98-$383.98/mo
    2x HOST-32L, Xeon D-1520 (4 core/8 thread), 32GB RAM with 2x480GB SSD ($179.98/mo)
    2x EG-64, Xeon D-1520 (4 core/8 thread), 64GB ECC RAM with 2x480GB NVMe SSD ($383.98/mo)
Persistence Tier
  • $0/mo
    Utilize same host
Persistence Tier
  • $89.99-$191.99/mo
    1x HOST-32L, Xeon D-1520 (4 core/8 thread), 32GB RAM with 2x480GB SSD ($89.99/mo)
    1x EG-64, Xeon D-1520 (4 core/8 thread), 64GB ECC RAM with 2x480GB NVMe SSD ($191.99/mo)
Caching Tier
  • $0/mo
    Utilize same host
Caching Tier
  • $89.99-$191.99/mo
    1x HOST-32L, Xeon D-1520 (4 core/8 thread), 32GB RAM with 2x480GB SSD ($89.99/mo)
    1x EG-64, Xeon D-1520 (4 core/8 thread), 64GB ECC RAM with 2x480GB NVMe SSD ($191.99/mo)
Total Cost
$89.99/mo.
Total Cost
$359.96-$767.96/mo.

Liquid Web

Liquid Web offers managed hosting of dedicated servers. I include them in this comparison to contrast OVH since OVH is hands-off in a degree that may not align with the requirements of some businesses or teams. You pay a premium for a hands-on managed hosting solution but value is still good despite this.

Mid-scale configuration High-scale configuration
Application Tier
  • $399/mo
    1x XEON E5-2620 v4 server, 32 GB RAM with 2x480GB SSD ($399/mo)
Application Tier
  • $798/mo
    2x XEON E5-2620 v4 server, 32 GB RAM with 2x480GB SSD ($798/mo)
Persistence Tier
  • $0/mo
    Utilize same host
Persistence Tier
  • $399/mo
    1x XEON E5-2620 v4 server, 32 GB RAM with 2x480GB SSD ($399/mo)
Caching Tier
  • $0/mo
    Utilize same host
Caching Tier
  • $399/mo
    1x XEON E5-2620 v4 server, 32 GB RAM with 2x480GB SSD ($399/mo)
Total Cost
$399/mo.
Total Cost
$1596/mo.

The Wildcard: Colocation

At a certain scale you may find your best fit to be none of the discussed solutions but rather colocation. With colocation you rent space in a data center (typically by the quarter, half or full-cabinet) and pay for power and bandwidth, supplying your own hardware. This means paying hardware costs upfront and then paying comparatively quite a lot less in your ongoing hosting costs. There are also potential accounting advantages to be realized with colocation (as you own the hardware, you can write down its depreciation). You will need to have either a team or team members who are experienced in managing physical servers, and colocation is likely not a good fit for all but very high-scale projects. If you want to read a good overview of a colocation deployment in practice, I recommend having a look at Nick Carver’s article about Stack Overflow’s infrastructure.

Conclusion

I hope this article provided a rough overview of the comparative costs of various paths you might follow with your infrastructure. Be aware that it is neither exhaustive nor conclusive. There is a lot of variability when it comes to architectures and needs, and the numbers used should be considered only a rough approximation of value. It should also be understood that in order to utilize non-PaaS infrastructure solutions well, ergonomic tooling and processes (for things such as bootstrapping new environments, database backup and restore, deployment, etc.) will be required. We have posted elsewhere on this subject (see: “How to mess up DevOps: working at the wrong level of abstraction“) so please refer to those articles or elsewhere on the web for info about infrastructure management and DevOps tooling.

We provide a matrix below to summarize the data from this article:

Heroku AWS LiquidWeb DigitalOcean Linode OVH
Mid-Scale
Application
$530 – $1,510 ~$322 – ~$375
~$193 – ~$225 *
$399 $80 – $180 $80 – $180 $89.99
High-Scale
Application
$8,700 – $22,000 ~$1,686 – ~$1,953
~$1,012 – ~$1,172 *
$1596 $540 – $660 $460 – $840 $359.96-$767.96
* indicates reserved pricing
806 x 6844

The post Dedicated vs. Cloud vs. VPS vs. PaaS – a value comparison appeared first on Superset Blog.

]]>
An intro to Encrypted Secrets in Ruby on Rails https://blog.supersetinc.com/2017/12/22/intro-encrypted-secrets-ruby-rails/ Fri, 22 Dec 2017 13:50:40 +0000 http://blog.supersetinc.com/?p=71 Rails 5.1 introduced Encrypted Secrets to help simplify the management of your application secrets (things such as service credentials and the secret_key_base). This article details the feature and its usage. Why Encrypted Secrets? Since Rails 4.1, the framework has given you the ability to centrally store secrets in the config/secrets.yml file. The glaring shortcoming of […]

The post An intro to Encrypted Secrets in Ruby on Rails appeared first on Superset Blog.

]]>
Rails 5.1 introduced Encrypted Secrets to help simplify the management of your application secrets (things such as service credentials and the secret_key_base). This article details the feature and its usage.

Why Encrypted Secrets?

Since Rails 4.1, the framework has given you the ability to centrally store secrets in the config/secrets.yml file. The glaring shortcoming of secrets.yml is that the file actually is in no way secure, and you cannot actually safely check it into version control with any production credentials. The convention for production credentials was always to load them within secrets.yml but from the host environment. Usually your secrets file would end up looking something like this:

development:
  secret_key_base: 972888f3521e5c5ec8491cd3295e51af38fc93e059c1a00e8e03804288f64d77753b66a5108baaddfe6
  some_api_key: 055ef473d6df1055

test:
  secret_key_base: 1d1be5ad7ea1e9d833e752a2de941217222fe9c6ea5467b9d63f69d38c8aa4c4219db9edc37d3b80fc4
  some_api_key: 055ef473d6df1055

production:
  secret_key_base: <%= ENV["SECRET_KEY_BASE"] %>
  some_api_key: <%= ENV["SOME_API_KEY"] %>

Rails 5.1+’s Encrypted Secrets feature means you can now keep production secrets in a second fully encrypted file (AES-256 by default), which is managed by the framework. Secrets from the encrypted secrets.yml.enc file are merged with secrets from the unencrypted secrets.yml file.

Getting started with Encrypted Secrets

Encrypted secrets is not set up by default, and in order to bootstrap it you need to run:

bin/rails secrets:setup

This will drop a few files into your project tree:

  • config/secrets.yml.key – contains the actual secret key used by the framework to AES-encrypt your secrets.
  • config/secrets.yml.enc – the encrypted digest form of your (encrypted) secrets.

It should go without saying that the config/secrets.yml.key file should be handled carefully and never checked into version control as it is all that is required to decrypt your secrets (it is accordingly gitignored by default).

To edit your secrets, invoke the command:

bin/rails secrets:edit

If you have no EDITOR variable defined in your shell environment you will need to set one. For Sublime Text, you can add the following to your .bash_profile (or similar shell configuration file).

# Export a default text editor.
# Assumes you have set up "subl": 
# https://www.sublimetext.com/docs/2/osx_command_line.html
export EDITOR="subl -w"

 

The secrets:edit task will decrypt your secrets and pop them open in your editor where you can make changes. When you quit the editor, the framework will re-encrypt the secrets and overwrite the existing secrets.yml.enc file.

Usage in production

In production, Rails will look for the decryption key either in the environment variable RAILS_MASTER_KEY or in a local copy of the key file (config/secrets.yml.key). How you get the environment variable exposed to your application or how you inject the key file is a matter that is specific to your particular hosting and infrastructure management setup.

Caveats

It is important to understand that using Encrypted Secrets over other solutions does have drawbacks. It is likely to fit best within projects that have small and very trusted teams. Because every developer who is expected to manage secrets in an application must have a local copy of the encryption key, situations like terminating an employee become somewhat complicated. More specifically you would need an efficient solution to quickly rotate your encryption key in production, and also to quickly distribute a new key to all developers.

For this reason you may want to consider another solution if your organization is of a certain scale. What the best such solution is will likely come down to details of your infrastructure management and hosting, but no matter what it will likely be a matter of having credentials exposed via the ENV. PaaS solutions like Heroku, CloudFoundry and Cloud66 all provide ENV variable management faculties, and such solutions are better equipped to handle the practical security needs of larger organizations.

The post An intro to Encrypted Secrets in Ruby on Rails appeared first on Superset Blog.

]]>
Que: high performance background jobs with fewer moving parts https://blog.supersetinc.com/2017/12/21/que-performant-background-processing-rails-fewer-moving-parts/ Thu, 21 Dec 2017 12:27:57 +0000 http://blog.supersetinc.com/?p=19 Since the introduction of ActiveJob in Ruby on Rails 4.2 many years ago, it’s been a foregone conclusion that most Rails apps will need to rely on a job queue of some sort. At a very minimum you’ll probably want to use a queue to take necessary slow-running tasks like email delivery outside of the […]

The post Que: high performance background jobs with fewer moving parts appeared first on Superset Blog.

]]>
Since the introduction of ActiveJob in Ruby on Rails 4.2 many years ago, it’s been a foregone conclusion that most Rails apps will need to rely on a job queue of some sort. At a very minimum you’ll probably want to use a queue to take necessary slow-running tasks like email delivery outside of the request lifecycle.

Mike Perham’s Sidekiq is probably the most widely-used and de-facto choice for many teams. There is, however, another queueing solution that I want to highlight because, for many apps, it cuts down on moving parts while increasing both integrity and performance (with a couple compromises that should be well-understood).

Enter Que

Que is a job queue that leverages PostgreSQL’s native advisory locks to optimize interactions with the queue’s persistence store. Sidekiq and many other job queues are built atop Redis, which, though it does have a level of durability (see https://redis.io/topics/persistence) and is certainly fast, is not fully durable by default (many default configurations sync to disk once a second). Que relies on PostgreSQL as its persistence store, which means guarantees of durability in addition to one fewer dependency. With its use of advisory locks, it also provides remarkably good performance.

There is of course a tradeoff you are making here which is that, if you are using your primary application database to house Que’s jobs table, you may subject it to a lot of writes if you have a high volume of jobs. This can be mitigated by using a secondary and dedicated database instance for Que’s jobs table. In anything short of a very large scale though this is likely a premature optimization.

Another killer feature of Que is the ability to run your worker pool within your web process. This may not be something you will do in your particular case depending on your job volume, but again, at something short of very large scale it is an incredible benefit with respect to simplifying your infrastructure and deployment. Essentially you no longer have separate queue workers that you need to configure and integrate with your deployment process. Deploying your web application code will implicitly mean that your job queues also get a code update and seamless restart.

You can find details about configuring an in-process worker pool here: https://github.com/chanks/que/blob/master/docs/advanced_setup.md

The basic configuration for Phusion Passenger is a simple matter of adding a few lines to your config.ru file:

# In config.ru
if defined?(PhusionPassenger)
  PhusionPassenger.on_event(:starting_worker_process) do |forked|
    if forked
      Que.mode = :async
    end
  end
end

Installing Que

Installing Que for your Rails app is very straightforward. Simply add it to your Gemfile:

# In Gemfile

gem 'que'

Then bundle install and, finally, use the generator supplied by the gem to create a migration that will add the que_jobs table to your database schema:

$ rails g que:install

 ActiveJob support

Que’s ActiveJob support is not as well documented as I’d like, but the necessary configuration is relatively simple.

First you need to set :que as the QueueAdapter. You will also need to set config.action_mailer.deliver_later_queue_name = '' if you want Que to pick up any emails enqueued with Rails’ stock deliver_later functionality (or add a special Que worker configuration to pick up jobs enqueued with the queue name “mailers”, since this is the default queue name used if not overridden with the deliver_later_queue_name configuration option).

# in config/application.rb

config.active_job.queue_adapter = :que
config.action_mailer.deliver_later_queue_name = ''

You can now define your jobs as subclasses of ActiveJob::Base/ApplicationJob and they should work seamlessly with Que. Note that as Que is a first-class ActiveJob integration, GlobalID is supported, so you can pass actual ActiveRecord instances into jobs rather than IDs and trust that they will be correctly deserialized when the job is picked up. Let’s take a simple job that might invoke a method to reach out over the network and fetch a user avatar from an external service to store locally. This might look as follows:

# In app/jobs/user_avatar_job.rb
class UserAvatarJob < ApplicationJob
  def perform(user)
    user.fetch_avatar!
  end
end

You would enqueue one of these via ActiveJob’s perform_later method:

UserAvatarJob.perform_later @user

This will drop it into Que’s Postgres-backed job queue to be worked by the next available worker.

Give it a try

I hope this guide has shown how quick and easy it is to get Que set up with a Ruby on Rails application. If the idea of a fast job queue with fewer moving parts and great durability appeals to you, I recommend giving Que a try.

The post Que: high performance background jobs with fewer moving parts appeared first on Superset Blog.

]]>
Caching strategies for Rails 5 applications https://blog.supersetinc.com/2017/12/21/caching-strategies-rails-5-applications/ Thu, 21 Dec 2017 08:18:04 +0000 http://supersetinc.com/?p=16 One of the tremendous benefits of building with a high-level framework like Ruby on Rails is that you are afforded both mental space and an abundance of tools to optimize your application with a thoughtful caching strategy. Caching can be done at several levels in the stack and I wanted to provide an overview of the most common caching strategies […]

The post Caching strategies for Rails 5 applications appeared first on Superset Blog.

]]>
One of the tremendous benefits of building with a high-level framework like Ruby on Rails is that you are afforded both mental space and an abundance of tools to optimize your application with a thoughtful caching strategy. Caching can be done at several levels in the stack and I wanted to provide an overview of the most common caching strategies for Rails applications and the tradeoffs inherent in each.

The Application Cache Store

Most Rails caching faculties write to and read from a cache store configured at the application-level. You can see what cache store you are using interactively by reading the value of Rails.application.config.cache_store. For production environments you typically want to use a cache store that exposes its interface on the network versus a file-based cache store or in-process-memory store so that cached content can be shared among multiple application processes or servers. Both Redis and Memcached are popular choices. Subjectively I am seeing Redis increasingly favored because it now offers comparable or better performance and more configuration options (such as a straightforward client authentication scheme).

Rails.cache

Rails offers a basic interface to the cache store via the Rails.cache object. You can write to and read from the cache, intuitively, via write/read methods:

> # to write
> Rails.cache.write "a key", "a value"
=> true
> # to read
> Rails.cache.read "a key"
=> "a value"

Note that in Rails 5 caching is disabled by default in development and the Rails.application.config.cache_store will be set to the no-op :null_store unless you first run rails dev:cache (which will drop a file called “caching-dev.txt” into your tmp directory to indicate to the framework that caching should be enabled).

Writing to and reading from the Rails.cache object directly via the write/read methods is often less a fit for performance optimization so much as a strategy for temporarily storing data that you may want to live across several requests or invocations of a job. I have, for instance, seen the Rails.cache write/read interface used effectively for storing transient airfare and hotel rates where this data must be split up among several pages for the user but where requests to the upstream API are slow, costly and return hundreds of records.

If you are using the cache store in this way, definitely make sure that you can still safely flush the application cache store at will without breaking application behavior. If not, you may want to interface directly with the underlying data store (such as via the redis gem) and scope your Rails.application.config.cache_store down with a namespace so that you can safely flush it (via Rails.cache.clear) without also flushing out this data.

Fragment Caching

Fragment Caching is Rails’ faculty for storing fully rendered snippets of HTML in the application cache store. Fragment Caching, in my experience, is one of the most common performance “quick wins” you’ll be able to achieve in a Rails codebase. Introducing Fragment Caching to a page that does not have it can easily bring a load time of 500ms down to below 50ms.

There are 2 basic elements to Rails’ Fragment Caching implementation: ActiveRecord::Base#cache_key and the cache view helper.

ActiveRecord::Base#cache_key composes a unique string from an ActiveRecord object’s class and its updated_at timestamp. The insight and significance of this is that a record’s updated_at timestamp is often a great proxy for when some piece of your UI should also be updated. With Fragment Caching we can effectively “freeze” a bit of dynamic content into the cache and only ever pay the price of re-rendering it when the record or records that it represents change.

For instance, we might fragment-cache a block of content displaying a user’s profile info:

<%= cache user do %>
  <h2><%= user.full_name %></h2>
  <%= image_tag user.avatar.url(:thumbnail), class: 'img-responsive img-circle' %>
  <h4><%= user.city %></h4>
<% end %>

In this example, the html within the cache block will be rendered out once, stored in the cache store, and then fetched from from the cache store on every subsequent render. The snippet will automatically be re-rendered and re-cached any time the user is updated, since the cache helper by default will key the fragment on the cache_key of whatever record or records it is passed.

This is a nice savings, but where Fragment Caching really shines is in situations involving relational data. For instance, if, on a user’s profile page, we wanted to display a series of posts that the user has created. Consider the following domain model:

class User < ApplicationRecord
  has_many :posts
end

class Post < ApplicationRecord
  belongs_to :user, touch: true
end

The touch: true option instructs the framework to touch the parent record (bump its updated_at timestamp) when the child is modified. The significance of this with respect to Fragment Caching is that the User record’s cache_key can now reliably be used as a top-level cache key for the whole collection of Posts that it owns. The user profile page might then look as follows:

<%= cache user do %>
  <h2><%= user.full_name %></h2>
  <%= image_tag user.avatar.url(:thumbnail), class: 'img-responsive img-circle' %>
  <h4><%= user.city %></h4>

  <ul class='user-posts'>
    <%= render user.posts %>
  </ul>
<% end %>

With a _posts partial that also caches its own content:

<%= cache post do %>
  <div class='post'>
    <h4><%= post.title %></h4>
    <div class='post-body'><%= simple_format post.body %></div>
  </div>
<% end %>

On initial render, each of the rendered _post code partials will be written to a fragment cache, in addition to the whole containing users/show template. On subsequent renders, the whole consolidated block can be read from a single cache hit (and most likely with only one sub-millisecond database query to pull the User record by it’s ID). If either the post or the user is updated, the users/show template will be re-rendered, but all except the updated Post (if any) can still be read from the cache rather than rendered.

Behind the scenes, the cache view helper also composes a MD5 hash of the template that it is used in and any referencing templates into the cache key so that any template updates that you subsequently deploy will cause cache misses and re-renders in the appropriate places. You can find details about this here, but for practical purposes it means that most of the time you can trust any updates that you make to your templates to bust appropriate caches once deployed.

Action Caching

Action Caching has actually been formally removed from the Rails framework, but in my opinion it deserves a mention here because it’s still very much in use in the wild and still may be a fit for your application. Action Caching has now been extracted to the actionpack-action_caching gem, and you must install this if you want to use the feature.

Action Caching is similar to Fragment Caching, except that it only caches at the level of entire rendered controller actions. A cache hit, in the context of an Action Cache, will result in no view rendering occurring at all. In practice this usually means a marginal performance increase over Fragment Caching, but with the introduction of some complexities around ensuring that whole page content is either fully cacheable in a singular form for all users (which means being mindful about things like csrf-param and being sure you have a strategy to bust the Action Cache when you deploy any template or asset updates), or that the appropriate identifying information for the user you are caching a document for is factored into the cache key (the :path option to the caches_action macro).

Action Caching has a close cousin in Page Caching (now in the actionpack-page_caching gem), which also functions at the level of entire documents but actually writes rendered documents to your web server’s “public” directory rather than the cache store, where these documents are subsequently served by your web server itself, skipping the Rails application stack entirely. A critical practical consideration here is that you will lose the ability to run any before_actions, authentication logic or similar.

ETags and Browser Caching

ETags are a part of the HTTP spec that allow browsers to make conditional GET requests when accessing a document more than once from the same host, and only pull down the full document body if the document has actually changed. If both host and client are configured to support conditional GET, the host will send an ETag header down with the document, which is a short identifying string for the version of the document that it is sending. On subsequent requests to the same URI, the client will send a If-None-Match request header set to the value of the ETag and if the document hasn’t changed on the server, the server will send a 304 Not Modified response down with an empty body and the client will pull the content it was previously sent out of it’s own cache to display to the user.

Rails supports ETags and conditional GET via the stale? and fresh_when controller methods. For example, to render a users#show action with support for conditional GET, we might write:

class UsersController < ApplicationController
  before_action :load_user

  def show
    fresh_when etag: @user
  end

  private

  def load_user
    @user = User.find(params[:id])
  end
end

An important consideration when implementing ETags and conditional GET in Rails is that by default ETag caches will not be busted when you deploy template changes (unless you update the stock RAILS_CACHE_ID on deploy as well, but this also often means unnecessarily clearing out your fragment caches). You should look to a library like Nathan Kontny’s bust_rails_etags, which overrides the default Rails ETag methods to also take into account an ETAG_VERSION_ID environment variable, which you can set in a way that suits your deployment scheme.

Reverse Proxy Caching

For cases where you might be considering Action Caching or Page Caching, I’d encourage you to also take a close look at reverse proxy caching. It has a similar profile in respect to the tradeoffs you’ll need to make in the content you want to cache (i.e. being mindful of user-specific data and the CSRF meta tags), but with potentially much better real-world performance (if you do it through a full-blown CDN with edge servers around the world) and other incidental benefits as well, such as being able to proxy your assets through the same host/CDN as you are reverse-proxying-caching site content through and thereby remove one more disparate piece from your infrastructure.

At a high-level, reverse proxy caching places an HTTP-speaking intermediary between your application server (“origin server”, in the vernacular of reverse proxy caching and CDNs) and the browser seeking to access your content. In the old days this HTTP-speaking intermediary was likely something like Varnish or Squid Cache running atop a server co-located in the very same data center as your application servers. Today it is more often a globally-distributed hybrid reverse proxy/CDN service like Fastly or Cloudflare.

In both cases the application-level implementation is nearly identical. Commonly with a globally distributed proxy like Fastly, you will set a long-lived TTL on content that you want to stick around in and be served up directly by edge servers in the network. Rails has a simple faculty for this with the expires_in method, which I’ll demonstrate by writing a general-purpose before_action suitable for content that you would like reverse-proxy-cached.

class ApplicationController < ActionController::Base
  private

  # Use this method as a before_action and corresponding
  # content will be served with a
  # "Cache-Control: max-age=3600, public" header
  # and stick around in your reverse proxy cache for
  # an hour or until purged. 
  def extended_ttl
    # Suppresses csrf-param - the framework's stock
    # csrf_meta_tags helper will simply render nothing.
    self.allow_forgery_protection = false
    expires_in(1.hour) unless Rails.env.development?
  end
end

Note that there do in fact exist javascript-based workarounds for issues concerning CSRF. If you truly require a CSRF token be injected into your reverse-proxy-cached content it is achievable using a strategy that Fastly outlines in this blog post. For the purposes of this post, we will simply assume CSRF does not matter to you for the content that you would like to cache via reverse proxy.

The final ingredient to a reverse proxy caching strategy is purging. Once you have deployed a new version of your application, you will likely want to purge all of the content from your caching proxies as immediately as possible. In the case of Varnish, Fastly and Cloudflare, this can be done via a simple API call. You will likely want to make it a step of your deployment process to do an automated purge of your caching proxies as soon as your new code is live.

This is a good point to note why certain traditional CDNs like Amazon’s CloudFront, while perfectly suitable for caching fingerprinted assets, is actually a very poor choice for caching application content. Purging on CloudFront is eventual – it may and often does take as long as several minutes to purge content across the network. Varnish, Fastly and Clouflare all offer practically instantaneous purging of cached content via simple programmatic interfaces (Fastly is in fact built atop a heavily-modified version of Varnish).

What Should I Do?

There is no universally optimal caching strategy for Rails applications. Every strategy has tradeoffs and what is optimal for your own application is likely a mix of several.

If 20-50 millisecond server times are acceptable to you and network latency isn’t a significant concern, you may wish to go no further than simply leveraging Fragment Caching (plus perhaps a traditional CDN for static assets).  If you can afford it and don’t mind the slight increase in complexity, I’ve found that a combination of Fragment Caching across the whole application and select reverse proxy caching through a service like Fastly for pages like your homepage, landing pages and marketing pages that new users are likely hit first can be a sweet spot as far as maximizing user experience while keeping complexity relatively minimal.

The post Caching strategies for Rails 5 applications appeared first on Superset Blog.

]]>