Wednesday, 10 January 2018

TUSD server on Production

TUS ( is HTTP based protocol used for resumable file upload. TUSD ( is official implementation of the TUS protocol.
For one of our project, we have decided to use TUSD for uploading large number of files from many locations over unreliable internet connection.
For running TUSD on production, first thing came to our mind was how to make it secure. TUSD does not accept HTTPS connection, neither it has any built in layer for verification/authentication checks. Authentication can be done using the hooks system of TUSD.
So we decided to move the security layer out of TUSD and let it focus on its primary task of resumable file uploading.
Accordingly we introduced HAProxy in front of TUSD server.

  • TUSD server will run as normal user on default port 1080 and HAProxy will listen on default HTTP/HTTPS ports and proxy pass the requests to TUSD server.
  • SSL Security certificate will be deployed on HAProxy and HAProxy will do the SSL offloading. TUSD will receive plain HTTP traffic from HAProxy.
  • We will enable basic HTTP authentication in HAProxy and HAProxy will authenticate the incoming connections before forwarding it to TUSD server.
  • HAProxy will only proxy a specific URL traffic to the backend TUSD server. It will not forward the whole traffic to TUSD server, so any connection attempt to the default HTTP/HTTPS port on the public IP will not be forwarded to TUSD.

For this document I used Ubuntu 16.04, TUSD Version: 0.9.0 and HAProxy Version 1.7.
As we will run TUSD behind HAProxy, so have to add -behind-proxy flag while starting TUSD to inform TUSD that it is running behind a proxy and have to give attention to the special headers sent by the proxy.

Let’s see the configs for this setup

TUSD service:

Description= TUSD File Upload Server

ExecStart=/bin/bash -ce "exec /app/tusd/tusd -dir /data/tusupload  -hooks-dir /app/tusd/hooks -behind-proxy  >> /logs/tusd/tusd.log 2>&1"
# file size
# cpu time
# virtual memory size
# open files
# processes/threads
# total threads (user+kernel)


HAProxy config file:

userlist UL1
                user httpuser insecure-password abcdefghijklmnop

                log /dev/log       local0
                log /dev/log       local1 notice
                chroot /var/lib/haproxy
                stats socket /run/haproxy/admin.sock mode 660 level admin
                stats timeout 30s
                user haproxy
                group haproxy

                # Default SSL material locations
                ca-base /etc/ssl/certs
                crt-base /etc/ssl/private

                # Default ciphers to use on SSL-enabled listening sockets.
                # For more information, see ciphers(1SSL). This list is from:
                # An alternative list with additional directives can be obtained from
                ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
                ssl-default-bind-options no-sslv3

                log          global
                mode    http
                option   httplog
                option   dontlognull
                timeout connect 5000
                timeout client  50000
                timeout server  50000
                errorfile 400 /etc/haproxy/errors/400.http
                errorfile 403 /etc/haproxy/errors/403.http
                errorfile 408 /etc/haproxy/errors/408.http
                errorfile 500 /etc/haproxy/errors/500.http
                errorfile 502 /etc/haproxy/errors/502.http
                errorfile 503 /etc/haproxy/errors/503.http
                errorfile 504 /etc/haproxy/errors/504.http

frontend localhost
    bind *:80
    bind *:443 ssl crt /etc/ssl/my_certs/my_cert.pem
    redirect scheme https if !{ ssl_fc }
    mode http

    acl tusdsvr hdr(host) -i   
    use_backend tus-backend if tusdsvr

#########################Backend Settings##########################
backend tus-backend
#HTTP basic Authentication check
      acl AuthOkay_UsersAuth http_auth(UL1)
      http-request auth realm UserAuth if !AuthOkay_UsersAuth

#Setting X-Forwarded-Proto header to https to let TUSD know that it is behind HTTPS proxy and should return https URLs.   
     http-request set-header X-Forwarded-Proto "https"
     http-request add-header X-Forwarded-For %[src]

     mode     http      

    server   server1 localhost:1080  check fall 3 rise 2

Saturday, 6 January 2018

Strange ext4 error: cannot create regular file … No space left on device !!!

Today I was copying a folder containing large number of small files (around 1,25,000) into a backup partition (/dev/mapper/vg_ema_data-lv_ema_data) on my Ubuntu 16.04 VM with ext4 file system. That folder size was 250MB, and the backup partition had around 45GB of free space. One of the sub-directory of that folder has around 45,000 files with long file names like 3_8_25438833_11_3_4081_2017_12_08_13_07_55_2017_12_09_11_15_01_2018_01_02_13_38_42_2018_01_06_11_29_12_2018_01_06_11_37_07_2018_01_06_11_48_09_2018_01_06_13_16_01_2018_01_06_13_21_13_2018_01_06_13_52_55_2018_01_06_14_47_22.json

When I started copying, after some time the copy operation started giving me error of  "cannot create regular file" and“no space left on device” on my backup partition (while copying the files of the folder containing long file names). As that partition had 45GB space, so running out of space was out of question, so I thought maybe I consumed the number of Inodes in the backup partition. I checked Inodes of that partition and found that large number of inodes were free.

After searching a lot I found one excellent blog which wrote about dir_index of ext4. Please read that blog for details of the issue and I am not going to repeat that here.

So I decided to check and disable dir_index for my backup partition (/dev/mapper/vg_ema_data-lv_ema_data).
To check whether dir_index is enabled or not, I used tune2fs command as mentioned in the blog.

# tune2fs -l /dev/mapper/vg_ema_data-lv_ema_data | grep -o dir_index

If the above command outputs dir_index, that means dir_index is enabled for that partition, if it outputs nothing that means dir_index is not enabled.
Once I came to know that dir_index is enabled in my /dev/mapper/vg_ema_data-lv_ema_data partition, so I decied to disable it.

So I used the following command:

# tune2fs -O "^dir_index" /dev/mapper/vg_ema_data-lv_ema_data
to disable dir_index.

After disabling dir_index I tried to copy that directory again, and wow my copy operation successfully completed this time. 

Wednesday, 20 September 2017

Ubuntu upstart service for my golang web application

I have one web application written in go and need to deploy it as a service in Ubuntu server, say the name of the app is hello.
Copy the app to some directory in the server (e.g. /app directory, so the application binary is /app/hello)
Create an upstart script (e.g. hello.conf) and place it in /etc/init.
We run the binary using the following line:

exec start-stop-daemon --start \
--pidfile /var/run/ \
--make-pidfile \

To send the stdout and stderr of the application to a log file (e.g. /logs/app/hello/hello.log), we can edit the start-stop-daemon command line:

exec start-stop-daemon --start \
--pidfile /var/run/ \
--make-pidfile \
--startas /bin/bash -- -c "exec $DAEMON $DAEMON_OPTS >> /logs/app/hello/hello.log 2>&1"

Now say we want to collect the summary of garbage collection and go scheduler trace in the log file. We have to change the Go’s runtime environment variable GODEBUG.
GODEBUG=gctrace=1 enables garbage collector (GC) trace. The garbage collector emits a single line to STDERR at each collection. The collector summarizes the amount of memory collected and the length of the garbage collection pause.
To investigate the operation of the runtime scheduler directly, and to get insights into dynamic behaviour of the goroutine scheduler, we can enable the scheduler trace. To enable the scheduler trace we can set:
The value 1000 is in milliseconds. So the above setting will make the scheduler to emit a single line to standard error every second.
We can combine both garbage collection and scheduler trace as GODEBUG=gctrace=1,schedtrace=30000

So again editing the start-stop-daemon command line:

exec start-stop-daemon --start \
--pidfile /var/run/ \
--make-pidfile \
--startas /bin/bash -- -c "exec /usr/bin/env GODEBUG=gctrace=1,schedtrace=30000 $DAEMON $DAEMON_OPTS >> /logs/app/hello/hello.log 2>&1"

A garbage collection log line looks like:
gc 56 @27.196s 0%: 0.010+3.7+0.014 ms clock, 0.010+0.80/2.6/0+0.014 ms cpu, 4->4->0 MB, 5 MB goal, 1 P
gc 57 @27.260s 0%: 0.007+2.1+0.010 ms clock, 0.007+0.35/1.0/0+0.010 ms cpu, 4->4->0 MB, 5 MB goal, 1 P

56: the GC number, incremented at each GC
@27.196s: time in seconds since program start
0%: percentage of time spent in GC since program start
0.010+3.7+0.014 ms clock: wall-clock times for the phases of the GC
0.007+0.35/1.0/0+0.010 ms cpu: CPU times for the phases of the GC
4->4->0 MB: heap size at GC start (4MB), at GC end (4MB), and live heap (0MB)
5 MB goal: goal heap size
1 P: number of processors used, here 1 processor used

A scheduler trace line looks like:
SCHED 25137ms: gomaxprocs=1 idleprocs=0 threads=4 spinningthreads=0 idlethreads=1 runqueue=2 [98]

25137ms : Time since program start
gomaxprocs=1: Gomaxprocs is the current value of GOMAXPROCS. The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. Starting with Go 1.5, GOMAXPROCS is set to number of CPUs available by default.
idleprocs=0: Number of processors that are not busy. So here 0 processors are idle.
threads=4: Number of threads that the runtime is managing.
spinningthreads=0: Number of spinning threads.
idlethreads=1: Number of threads that are not busy. 1 thread is idle and (3 are running).
runqueue=2: Runqueue is the length of global queue with runnable goroutines.
[98]: Number of goroutines in the local run queue. For a machine with multiple processors we can see multiple values for each processor e.g. [2 2 2 3].
The init script is available here

Saturday, 5 August 2017

Script to create a test mongodb sharded cluster

Many times for some testing we need to start a MongoDB sharded cluster. It is very tedious to manually starting all the config servers, shards and mongos. I wrote a shell script which creates the test sharded MongoSB cluster in a single machine. The script is available in my github account

Monday, 24 July 2017

Kubernetes: Flannel Docker IP issue

I have a test kubernetes cluster with 1 master and 2 nodes, running on coreos. I have Kubernetes DNS service running in the cluster and also some test pods in the cluster. I faced a strange problem, in some containers DNS was getting resolved and in some containers it was not. Checking the containers where DNS was not getting resolved, I found that these containers unable to connect the DNS pod. While checking the IPs of the pods using kubectl command, I found that even two pods are having the same IPs but they were running on different nodes, which is quite not possible in a Kubernetes cluster. So definitely there was network misconfiguration in the cluster. The pods were getting IPs which are not from the flannel service IP range, but IPs were from local docker IP range So clearly docker is not picking up the IP range from flannel service.
Investigating the issue, I found that etcd master was not in proper start sequence and was listening to the localhost interface not on the Ethernet interface. Because of that the client etcd services on each node were unable to connect to the master etcd. As a result flannel service also error out and not started on each node.
When flannel service starts properly it creates a file /run/flannel/flannel_docker_opts.env. This file contains the host system’s docker0 network interface IP (bridge IP)
When docker starts it reads the file
and loads the environment variables of the file /run/flannel/flannel_docker_opts.env and configures itself accordingly.
When flannel service does not start properly, the environment variables required to start docker are not added to the file /run/flannel/flannel_docker_opts.env. Because of that the docker service was getting with the default bip
To resolve the issue, changes were made on the master node, so that etcd master starts properly. In my cloud config file /var/lib/coreos-install/user_data I added one unit to restart the etcd service once my static network interface is configured.
- name: etcd2.service
command: restart
Now after booting, etcd was properly listening on my static IP.
Also I have edited docker service to start after the flannel service
systemctl edit docker.service

After=containerd.service docker.socket flanneld.service flannel-docker-opts.service

Requires=containerd.service docker.socket flanneld.service

Next on each node of kubernetes cluster, I have changed the cloud config file /var/lib/coreos-install/user_data.
First I have added a script to check whether a port is open, if the port is not open then check again after 1 second. I will use this script to check whether the etcd master service is up.
- path: /opt/bin/checkport
permissions: '0755'
content: |
# This script waits till the port is accessible
[ -n "$1" ] && [ -n "$2" ] && while ! curl -s http://${1}:${2} > /dev/null; \
do sleep 1 && echo -n .; done;
exit $?
- name: etcd2.service
command: restart
- name: 30-wait-for-server.conf
content: |
# wait for kubernetes master to be up and ready
ExecStartPre=/opt/bin/checkport 2380
The above part is checking whether our etcd master ( is up and listening to port 2380. If master etcd service is not up, then it waits for the master service.
Again edited docker service in each node to start after the flannel service
systemctl edit docker.service

After=containerd.service docker.socket flanneld.service flannel-docker-opts.service

Requires=containerd.service docker.socket flanneld.service

After restarting everything, docker service starts picking up the bridge IP from flannel service.
Also the pods are getting the correct IP from the flannel IP range
1. Use ifconfig and check if flannel and docker interface IPs are in sync.
2. Check the IP subnet ranges in etcd and whether each flannel node using the correct subnet
3. Check the IPs of the pods.

Tuesday, 27 June 2017

MongoDB Recipes: Change mongod’s log level

To view the current log verbosity levels, use the db.getLogComponents() method.
In the below mongod instance, the verbosity is 0 (default informational level). Here we can see that the verbosity of individual components are -1, this means the component inherits the log level of its parent.
We can configure log verbosity levels by

  1. Method 1: Using mongod’s startup settings
  2. Method 2: Using the logComponentVerbosity parameter
  3. Method 3: Using the db.setLogLevel() method.

Method 1:

We can configure global verbosity level using: the systemLog.verbosity settings
Also we can change verbosity level of an individual log component using the systemLog.component.<name>.verbosity setting for that component.

For example we are changing the verbosity level of network component to 0.


Method 2:

To change log verbosity level using the logComponentVerbosity parameter, pass a document with the required verbosity settings.
Example, we are going to set the default verbosity level to 2, verbosity level of storage to 1, and the storage.journal to 0
> use admin
> db.runCommand( { setParameter: 1, logComponentVerbosity:
verbosity: 2,
storage: {
verbosity: 1,

verbosity: 0

} )


Method 3:

We can use the db.setLogLevel() method to update a single component’s log level.
  • Change the default verbosity level to 0:
  • Change the storage.journal log level to 4:

Wednesday, 7 June 2017

MongoDB Receipes: Change chunk size in a shared cluster

The default chunk size in a shared cluster is 64MB. If we want to change it to a smaller/larger size (the allowed range of the chunk size is between 1 and 1024 MB):

Method 1:

This method works before/after the shared cluster is initialized and is recommended method
  • Login to any mongos of the shared cluster:
  • To change the global chunk size, update the value field of chunksize in the config database, here we are changing the chunk size to 5MB:

Method 2:

This method works only when you initialize the cluster for the first time. After the cluster is initialized, this method will not change the chunk size of the cluster.
Start the mongos with sharding.chunkSize (using config file) or --chunkSize (command line option) and set this option to desired chunk size value.