How to monitor the disk usage on your infrastructure

Techniques for Managing Storage Efficiently and Avoiding Outages

The most important part of disk usage monitoring is to check the utilization of each filesystem and each mount point which can reveal existing or impending issues with the storage space on your infrastructure.

What Does Disk Usage (DU) Mean?

Disk usage (DU) refers to the portion or percentage of computer storage that is currently in use. It contrasts with disk space or capacity, which is the total amount of space that a given disk is capable of storing. Disk usage is a crucial metric to any computing system, as it gives the user the information needed not only for storage, but also software requirements and overall operation. Although it usually refers to a computer’s hard disk, it may also refer to external storage, such as a USB drive or compact disc (CD).

How to check the disk usage?

If you are using Linux (or other Unix based OS) the most popular way to check the disk usage is by executing the “df -H” command (the disk free command is pretty versatile so don’t be confused by its name) which will report how much space is used, available, percentage used, and the mount point of every disk attached to your system:

user@hostname:~$ df -H
Filesystem               Size  Used Avail Use% Mounted on
udev                     3.9G     0  3.9G   0% /dev
tmpfs                    796M  584K  796M   1% /run
/dev/mapper/cm--vg-root   63G   21G   39G  36% /
tmpfs                    3.9G  344K  3.9G   1% /dev/shm
tmpfs                    5.0M     0  5.0M   0% /run/lock
/dev/sda1                470M  122M  324M  28% /boot
tmpfs                    796M     0  796M   0% /run/user/1000

Alternatively, you can use the du command which displays the disk usage. This tool can display disk usage for individual directories in Linux, giving you a finer-grained view of your disk usage. You can use the -h option to get a human readable output

cm@satya-vm:~$ du -h /etc/netdata/
68K	/etc/netdata/health.d
4.0K	/etc/netdata/charts.d
12K	/etc/netdata/python.d
4.0K	/etc/netdata/ebpf.d
4.0K	/etc/netdata/custom-plugins.d
4.0K	/etc/netdata/ssl
20K	/etc/netdata/go.d
4.0K	/etc/netdata/statsd.d
176K	/etc/netdata/

and optionally the -s option to simply display the total disk usage of a directory / filesystem.

cm@satya-vm:~$ du -hs /etc/netdata/
176K	/etc/netdata/

How to monitor your disk usage across your Infrastructure?

Now that we know how to check the disk usage on every individual server but checking this from time to time and monitoring the disk usage are two completely different things. While the commands we talked about are very useful they are not built for 24x7 monitoring or for going back in time to the root cause of an issue. And if your infrastructure comprises 100s of nodes, logging into each node, executing ‘du’ / ‘df’ linux commands, and monitoring the disk usage is an almost impossible task. The ideal solution is to use a distributed monitoring tool like Netdata which collects various metrics related to disks and one of them being the disk usage. The disk.space chart is located under the “Mount Points” section and by default it shows the disk utilization of all the mount points across all your nodes in the infrastructure.

Disk Usage 1 - Mount Points

It is a good idea to group the chart by nodes to get an account of the disk utilization per node.

Disk Usage 2 - Mount Points

You can also drill deeper by filtering out specific nodes through the Global node filter on the top right corner for further troubleshooting.

Disk Usage 3 - Mount Points

And with the multiple Group by options available on the disk usage chart, you can look at the specific aspect of disk usage based on the issue being inspected.

Disc Usage 4 - Mount Points

Disk Usage 5 - Mount Points

The most important part of disk utilization monitoring is to check the utilization of each filesystem and each mount point which can reveal existing or impending issues with the storage space on your infrastructure.

There are default alerts which get triggered on reaching the thresholds and you can modify the thresholds based on your requirement.

Disk Usage 6 - Disk space usage

# low disk space
 
# checking the latest collected values
# raise an alarm if the disk is low on
# available disk space
 
    template: disk_space_usage
          on: disk.space
       class: Utilization
        type: System
   component: Disk
          os: linux freebsd
       hosts: *
chart labels: mount_point=!/dev !/dev/* !/run !/run/* *
        calc: $used * 100 / ($avail + $used)
       units: %
       every: 1m
        warn: $this > (($status >= $WARNING ) ? (80) : (90))
        crit: $this > (($status == $CRITICAL) ? (90) : (98))
       delay: up 1m down 15m multiplier 1.5 max 1h
        info: disk ${label:mount_point} space utilization
          to: sysadmin

The alert above monitors the disk.space chart / metric, excludes some of the filesystems like /dev, /dev/, /run and /run/ and raises a Warning alert when the disk utilization exceeds 80% in all cases and if the current state is already Warning or higher, raises an alert when the disk utilization exceeds 90%. Similarly, a critical alert is raised when the disk utilization exceeds 90% in all cases and if the current alert state is Critical, it raises an alert when the disk utilization exceeds 98%. The alert will remain active for 20 minutes (5 + the 15 minute down in the delay hysteresis) unless the alert gets raised again.

Note: Netdata currently does not monitor directories, but this functionality is coming “soon”.

Let us hear from you

If you haven’t already, sign up now for a free Netdata account!

We’d love to hear from you – if you have any questions, complaints or feedback please reach out to us on Discord or Github. Happy Troubleshooting!

Discover More