Backups

Backups are a necessity in case of failure. Whether it is disk/drive failure or software failure, you want to have, well, backups in place to make sure you are able to get back on track. One of the most popular personal backup strategies is the 3-2-1 backup method. This snippet from Seagate says it well:

3 Copies of Data – Maintain three copies of data—the original, and at least two copies.

2 Different Media – Use two different media types for storage. This can help reduce any impact that may be attributable to one specific storage media type. It’s your decision as to which storage medium will contain the original data and which will contain any of the additional copies.

1 Copy Offsite – Keep one copy offsite to prevent the possibility of data loss due to a site-specific failure.

Using this format will help you get back up and running much faster than starting off from scratch. One thing to note with backups is that they work off of a proactive approach. This means you have to not only do this once, but maintain a healthy backup lifestyle in order to prevent data loss as much as you can. You can also have your own "3-2-1-#-#-#rule", where you add your own customization to the basic strategy. You do have to take the cost of the storage devices into mind, whether it is cloud storage providers or even for a hard drive purchase.

There are multiple ways of following the rule. Here are some of the other location suggestions:

  • Backups on disk (DAS, SAN, NAS and appliances)

  • Backups on tape

  • Backups on removable storage

  • Storage snapshots

  • External storage stored at a friends/relatives house

  • External storage stored in a bank vault

There are a lot of complications to consider for backups as well:

  • Weather

  • Potential Fire

  • Electric Static (while moving the device(s))

  • Home Invasion

  • Police Confescation

  • Drive/Disk Failure

  • etc.

Using the 3-2-1 is a way of being proactive and making your data be redundant in case of any of the aforementioned complications.

File System Formats

For data backups, and even for data storage in general, it is a good idea to take a look at the properties that a file system format gives you. Keeping these in mind can help you choose what format is better for your situation.

NTFSAPFS HFS+ exFATFAT32

OS Readable

Windows (Native), macOS (Read-Only), Linux

macOS (Native)

macOS (Native)

Windows, macOS, Linux

Windows, macOS, Linux

Journaling

X

X

Permissions Tracking

X

X

X

Max File Size

16TB Default (4KB Cluster Size), 2MB Cluster Size 8 PB

8 EB

slightly less than 8 EB

16 EB

4GB

Max Volume Size

16TB Default (4KB Cluster Size), 2MB Cluster Size 8 PB

slightly less than 8 EB

64 ZB

512 MB to 16 TB

Legend

X = yes

Encryption

From the CIA triad (Confidentiality, Integrity, and Availability), encryption falls under the Confidentiality principle. This means that only authorized users are allowed to access/modify the data. In order to apply this principle to our data, we need to encrypt our data, so we can only see it. Although there are brute force attempts possible to find out a password, I will not go deep into that here. Here are the tools I use for encryption:

  • VeraCrypt (for Full Disk Encryption)

  • Cryptomator (for when I have to store files in the cloud)

There are more software located at: https://www.privacyguides.org/encryption/ if you are interested. For encryption passwords, make sure you use a password manager and create long passwords or create long passphrases. Here is an xkcd comic that might help clarify whether to choose password vs passphrase (if you have a password manager, then this might not matter too much):

Backup Lifestyle

After your data is encrypted, and your file system of choice is chosen, the last step is to actually backup the data. The first time you are backing up data, it usually is a full backup. "The primary advantage to performing a full backup during every operation is that a complete copy of all data is available with a single set of media" . This takes the longest time due to the amount of data being transferred. After the full backup is completed, the nest two backup operations become available to you: incremental backups and differential backups. Incremental backups will copy only the data that has changed since the last backup operation of any type . Incremental backups keep on adding files that are new or modified from the last backup. Incremental backups do not take into account if you previously did a full backup or an incremental backup. Differential backups on the other hand, do take that into account. A differential is similar to incremental in the first time it is run. "However, each time it is run afterwards, it will continue to copy all data changed since the previous full backup. Thus, it will store more backed up data than an incremental on subsequent operations, although typically far less than a full backup. Moreover, differential backups require more space and time to complete than incremental backups, although less than full backups". With that being said, here are some tools I use for backups:

Tools:

  • Déjà Dup Backups (for backing up Linux home directory files)

  • FreeFileSync (for syncing files from one folder to another)

After the backups are complete, you have to setup a cadence at which you will keep backing up on. For some it's daily, for others, its monthly. It is different for everybody.

Large Volume Backups

There are times where the volume or storage size of your data is big enough where it will be really costly to put it in the cloud. An example of this if you were a data archivist and were downloading TB's of data. Putting that in the cloud can be costly. See the following information (grabbed from https://www.backblaze.com/cloud-storage/pricing#calculator so it might be biased, as they are a cloud provider themselves):

Backblaze B2AWS S3AzureGoogle Cloud

Storage (TB/month)

$6

$26

$20

$23

Egress ($/GB)

Free (up to 3x average monthly data stored, then $0.01/GB)

$0.09

$0.08

$0.11

2TB Storage / year

$144

$624

$499

$552

10TB Storage / year

$720

$3120

$2496

$2760

I put the 2 TB and 10 TB on the bottom to make people aware of how much the cost can increase exponentially. Data archival can get expensive really quickly. In addition, if you want to follow the 3-2-1 protocol, then that is even more storage to consider. As such, I think it is cost-efficient to then think locally for solutions for this. If you are comfortable paying for cloud with TBs of data (or have a NAS setup), then, by all means go ahead. The following is just to have backups in the Terabytes, while also not spending too much money. If you have money to allocate for this, then cloud and/or NAS solutions would work great. If not then the following includes buying hard drives (internal and external) and also other devices to make backups. Based on my experience here are the following things you need:

Stress Testing

After purchasing a new (or even used) hard drive, it is important to test the drive to make sure it functions properly under stress before fully utilizing the drive in operation. There are 2 main things I consider here: S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology a.k.a SMART) data and stress testing outputs. S.M.A.R.T. detects and reports drive information which can potentially report hardware failures to you. It is kind of like a health monitor for a drive in a way. On Linux a great tool for this is smartctl, and on Windows there is CrytalDiskMark.

As stated previously, S.M.A.R.T. is the first main part to consider here, and stress testing is the other. I use fio for stress testing. I run fio on Linux, but this can be used on Windows as well: https://github.com/axboe/fio/releases. The stress testing allows you to push your drive to see it's read and write capacity under load for you to understand if it is performing as expected or not.

The following is my procedure (current to April 2024):

Stress Testing Procedure

/dev/sdc is the location of my drive. Yours might be different. Triple check to make sure you choose the correct drive to make these changes on. On Linux, use the Disks app to make sure or lsblk command to do so.

# 1 - Perform a Long S.M.A.R.T test with smartctl
# MAKE SURE TO RUN THE SECOND COMMAND WHILE THE FIRST IS RUNNING
# Second command makes sure to keep the device awake, otherwise your device will go to sleep. 
# I recommend terminator on Linux to create side-by-side windows
sudo smartctl -t long -C /dev/sdc # Long test; Takes 656 minutes (~11 hours) to complete 
sudo watch -d -n 60 smartctl -a /dev/sdc # This runs smartctl -a /dev/sdc every minute to check your drive; This keeps your device awake as well.
# Run the fio stress test
# Source for version of command: https://www.reddit.com/r/DataHoarder/comments/7wh4a6/comment/du2elge/
# Following runs for 7200 seconds (~2 hours)
sudo fio --filename=/dev/sdc --name=randwrite --ioengine=sync --iodepth=1 --rw=randrw --rwmixread=50 --rwmixwrite=50 --bs=4k --direct=0 --numjobs=8 --size=300G --runtime=7200 --group_reporting
# A final small S.M.A.R.T test to make sure drive is healthy
sudo smartctl -t short -C /dev/sdc

Encryption

After stress testing, you can use a software to encrypt either files in batches (ex. Cryptomator) or full-disk encrypt (ex. VeraCrypt) your whole drive. In my experience, using VeraCrypt on a 6 TB hard drive took 37 hours to complete, and was at a rate of 42 MiB/s. Currently, not sure if this took long due to encryption, size of drive, or the hard drive dock I was using.

Data Transfer

Here are a couple of tools that help transferring huge files from one location to another:

Storage

After the main data maintenance is completed, storage is the last part. Here is where the optional hard drive case, fire-proof, and water-proof containers/safes come into play. If you have a drive in your house, and it is only used to backup occasionally, you need a place to store it where the elements or other items will not come into contact with it. These containers allow a solution for that. A hard drive case allows to protect from dirt and to protect your drive from drops. A fire and water-proof safe will protect from, well fire and water, but also from the elements like sunlight. Our goal is not to make a Fort Knox in our house for hard drives, but just try to preserve it as much as we can, so we don't have to buy new drives as frequently. If you have more than one backup, then my recommendation would be to give it to a friend, store it in a bank safety deposit box, or in a 24/7 temperature-controlled storage unit. These would be away from your house, allowing a copy of your data to be safe in case an intruder takes everything or if nature takes all of your things away. These options would be more suited financially for you than cloud storage. However, if none of these are feasible for you, then I would recommend looking into cloud.

Sources:

Last updated