High-performance logging from Nginx to Postgres with Rsyslog
While there are excellent purpose-built solutions for general log storage and search, such as Librato and the ELK Stack, there are sometimes reasons to write log data directly to Postgres. A good example (and one we have recent experience of at Superset) is storing access log data to present to users for analytics purposes. If the number of records is not explosive, you can benefit greatly from keeping such data closer to your primary application schema in PostgreSQL (even scaling it out to a separate physical node if need be by using something like Foreign Data Wrappers or the multi-database faculties of your application framework).
For our own application, we have a few geographically-distributed Nginx hosts functioning as “dumb” edge servers in a CDN. These Nginx hosts proxy back to object storage to serve up large media files to users. For each file access we want to record analytics information such as IP, user agent, and a client identifier (via the Nginx userd module, to differentiate unique users to the extent that it is possible).
For the purposes of this article we will go through the manual steps of configuration (on Ubuntu Server) though you should absolutely be using a configuration management tool such as Ansible to automate these steps in a production, as we have.
The dependencies for this logging configuration are nginx-extras
(which includes the userid module for analytics) and rsyslog-pgsql
(the package for the Rsyslog Postgres output module, which is not part of the default Rsyslog install). You can install these with apt (either manually or via Ansible’s apt module):
1 |
sudo apt-get install nginx-extras rsyslog-pgsql |
Ubuntu should have a top-level Rsyslog configuration file at /etc/rsyslog.conf
which should end with the line:
1 2 |
# ... $IncludeConfig /etc/rsyslog.d/*.conf |
This instructs the Rsyslog daemon to pull in any configuration files contained in the directory /etc/rsyslog.d
when it loads. We will use this to set up a special configuration to pull-in and forward our formatted nginx logs momentarily. First, let’s configure nginx to log a json payload to a unique log file.
Ubuntu’s standard nginx configuration pulls in per-site config files from the /etc/nginx/sites-available
(and assumes you have sym-linked the configurations for sites you want to go live to /etc/nginx/sites-enabled/). For this example, we’ll assume a configuration for mysite.com in /etc/nginx/sites-available/mysite.com.conf
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# in /etc/nginx/sites-available/yoursite.com: log_format json_combined '{"time_local": "$time_local", ' '"path": "$request_uri", ' '"ip": "$remote_addr", ' '"time": "$time_iso8601", ' '"user_agent": "$http_user_agent", ' '"user_id_got": "$uid_got", ' '"user_id_set": "$uid_set", ' '"remote_user": "$remote_user", ' '"request": "$request", ' '"status": "$status", ' '"body_bytes_sent": "$body_bytes_sent", ' '"request_time": "$request_time", ' '"http_referrer": "$http_referer" }'; server { listen 80; # + SSL configuration... # Optional: Nginx userid module, useful for analytics. # (see http://nginx.org/en/docs/http/ngx_http_userid_module.html) userid on; userid_name uid; userid_expires 365d; server_name yoursite.com; # any additional server-level configuration such as site root, etc... location / { access_log /var/log/nginx/yoursite.com/access.log json_combined; } # You will probably want to add some gzip, cache, etc standard header rules for performance... } |
And load the Nginx configuration with:
1 |
sudo service nginx reload |
We now have Nginx writing a one-off access log with a json-formatted payload to /var/log/nginx/yoursite.com/access.log
. We can configure Rsyslog to read this log file using the imfile module. In the past imfile defaulted to a polling mode but today defaults to filesystem events and is very performant and effectively real-time. With the imfile module reading the log file on Rsyslog’s behalf, we can then forward the log file data to Postgres using the Rsyslog ompgsql (PostgreSQL Database Output) module. The combined configuration is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# Load the imfile input module module(load="imfile") # Load the imfile input module input(type="imfile" File="/var/log/nginx/yoursite.com/access.log" Tag="yoursite:") # Load the ompgsql output module module(load="ompgsql") # Define a template for row insertion of your data. # The template below assumes you have a table called # "access_log" and are inserting columns named # "log_line" (with the log payload) and "created_at" (with the timestamp). template(name="sql-syslog" type="list" option.sql="on") { constant(value="INSERT INTO access_log (log_line, created_at) values ('") property(name="msg") constant(value="','") property(name="timereported" dateformat="pgsql" date.inUTC="on") constant(value="')") } # The output "action". This line instructs rsyslog # to check if the log line is tagged "yoursite:" (a tag # which we set with the imfile module configuration above) # and if so to use the sql-syslog template we defined # above to insert it into Postgres. if( $syslogtag == 'yoursite:') then { action(type="ompgsql" server="{{ postgres_host }}" user="{{ postgres_user }}" pass="{{ postgres_password }}" db="{{ postgres_db_name }}" template="sql-syslog" queue.type="linkedList") } |
You will want to name this file something like /etc/rsyslog.d/51-yoursite.conf
, since Rsyslog loads config files in alphabetical order and on Ubuntu has a default configuration file in the same directory called 50-default.conf
. It probably goes without saying but the ompgsql “action” line in the configuration above is using mock templatized credentials (I can recommend Ansible Vault for managing/templating credentials such as these in production). I should also note that as Rsyslog is a very long-lived project, it supports several different configuration file formats. The example above is using the “advanced” (a.k.a. “RainerScript”) format because I find this to be by far the most readable. Once you have saved the above log file, you will need to restart the Rsyslog daemon for it to take effect:
1 |
sudo service rsyslog restart |
The above configuration should be pretty performant as the “linkedList” queue.type argument supplied to the ompgsql action is instructing Rsyslog to buffer/batch its writes to Postgres. You can read about the performance tweaking that is available for ompgsql in an excellent article, “Handling a massive syslog database insert rate with Rsyslog“, which was written by Rainer Gerhards himself (primary author of Rsyslog).
1 Comment
Ahmer Mansoor
August 12, 2018Quiet informative Post. I am using a relatively simple approach http://ahmermansoor.blogspot.com/2018/08/configure-central-logging-server-in-linux.html. But I will now modify the configurations according to your Post.