High-performance logging from Nginx to Postgres with Rsyslog

High-performance logging from Nginx to Postgres with Rsyslog

While there are excellent purpose-built solutions for general log storage and search, such as Librato and the ELK Stack, there are sometimes reasons to write log data directly to Postgres. A good example (and one we have recent experience of at Superset) is storing access log data to present to users for analytics purposes. If the number of records is not explosive, you can benefit greatly from keeping such data closer to your primary application schema in PostgreSQL (even scaling it out to a separate physical node if need be by using something like Foreign Data Wrappers or the multi-database faculties of your application framework).

For our own application, we have a few geographically-distributed Nginx hosts functioning as “dumb” edge servers in a CDN. These Nginx hosts proxy back to object storage to serve up large media files to users. For each file access we want to record analytics information such as IP, user agent, and a client identifier (via the Nginx userd module, to differentiate unique users to the extent that it is possible).

For the purposes of this article we will go through the manual steps of configuration (on Ubuntu Server) though you should absolutely be using a configuration management tool such as Ansible to automate these steps in a production, as we have.

The dependencies for this logging configuration are nginx-extras (which includes the userid module for analytics) and rsyslog-pgsql (the package for the Rsyslog Postgres output module, which is not part of the default Rsyslog install). You can install these with apt (either manually or via Ansible’s apt module):

Ubuntu should have a top-level Rsyslog configuration file at /etc/rsyslog.conf which should end with the line:

This instructs the Rsyslog daemon to pull in any configuration files contained in the directory /etc/rsyslog.d when it loads. We will use this to set up a special configuration to pull-in and forward our formatted nginx logs momentarily. First, let’s configure nginx to log a json payload to a unique log file.

Ubuntu’s standard nginx configuration pulls in per-site config files from the /etc/nginx/sites-available (and assumes you have sym-linked the configurations for sites you want to go live to /etc/nginx/sites-enabled/). For this example, we’ll assume a configuration for mysite.com in /etc/nginx/sites-available/mysite.com.conf:

And load the Nginx configuration with:

We now have Nginx writing a one-off access log with a json-formatted payload to /var/log/nginx/yoursite.com/access.log. We can configure Rsyslog to read this log file using the imfile module. In the past imfile defaulted to a polling mode but today defaults to filesystem events and is very performant and effectively real-time. With the imfile module reading the log file on Rsyslog’s behalf, we can then forward the log file data to Postgres using the Rsyslog ompgsql (PostgreSQL Database Output) module. The combined configuration is as follows:

You will want to name this file something like /etc/rsyslog.d/51-yoursite.conf, since Rsyslog loads config files in alphabetical order and on Ubuntu has a default configuration file in the same directory called 50-default.conf. It probably goes without saying but the ompgsql “action” line in the configuration above is using mock templatized credentials (I can recommend Ansible Vault for managing/templating credentials such as these in production). I should also note that as Rsyslog is a very long-lived project, it supports several different configuration file formats. The example above is using the “advanced” (a.k.a. “RainerScript”) format because I find this to be by far the most readable. Once you have saved the above log file, you will need to restart the Rsyslog daemon for it to take effect:

The above configuration should be pretty performant as the “linkedList” queue.type argument supplied to the ompgsql action is instructing Rsyslog to buffer/batch its writes to Postgres. You can read about the performance tweaking that is available for ompgsql in an excellent article, “Handling a massive syslog database insert rate with Rsyslog“, which was written by Rainer Gerhards himself (primary author of Rsyslog).

Nicholas

Hi! I'm Nicholas. I am a software developer and the founder of Superset Inc. I keep a personal homepage at nicholas.zaillian.com and I can be reached by email at [email protected] (public key here if you want to encrypt your message).

1 Comment

Ahmer Mansoor

about 6 years ago

Quiet informative Post. I am using a relatively simple approach http://ahmermansoor.blogspot.com/2018/08/configure-central-logging-server-in-linux.html. But I will now modify the configurations according to your Post.

Reply

Leave a Comment

Leave a Reply to Ahmer Mansoor or Cancel Reply

Please be polite. We appreciate that.
Your email address will not be published and required fields are marked