Search The Hostwinds Guides Knowledge Base

Why is my Object Storage bigger (or smaller) than my data?

Share This Article [TheChamp-Sharing]

If you have the Hostwinds Cloud Backup service it shouldn’t be more than a few days before you see that the Object Storage being used is a different size than the data being backed up.

Why is my Object Storage larger than my data?

For Shared, Business, and Reseller hosting this is easily explained: each daily backup is a copy of the entire cPanel account. If you have a Shared account with 100MB of website/email/database, your Cloud Backups would grow by 100MB each day until your retention limit is hit, then it would sit at 100MB x Days stored. You can adjust the number of days stored if you would rather pay less for storage and don’t think you’ll need as many backups.

For a VPS or Dedicated Server, however, the answers are a little more complicated. The backup software (restic) doesn’t make a complete backup of the server each day, but the backups captured are not the same as a traditional “full backup weekly, incremental backup daily” system one might be familiar with. Restic takes backups in ‘snapshots’ each day, but only stores de-duplicated data. If the oldest backup is older than the retention period (the default is 60 days and used as an example for the rest of this article), it removes and purges the oldest snapshot. This is not the same as deleting the ‘oldest full backup’, but rather it just throws away the records of changes to the files before 60 days ago.

For example, if you have “today.txt” that is automatically updated with today’s date each day, restic will have 60 copies of it stored. When the oldest snapshot is removed, it will throw away the previous versions but still allow you to restore the file to any snapshot in the last 60 days. If you have “start.txt” that records the date the server was started and never changes, it will be kept, and restoring it from any snapshot will give the same data.

If you have a large database of products that’s not updated often, it won’t contribute much more to the backups than the size of the database. If you have a database of users, forum posts, etc, that changes daily/hourly/every minute — this kind of database will contribute greatly to the size of a restic backup in Object Storage, even if the overall database size doesn’t grow quickly.

Let’s take a look at a real server. These examples are for Linux but the ideas are the same for Windows. One big difference with Windows is that it takes several snapshots per day, one for each directory in C:\, so pay attention to the date of the snapshots in Windows and not the total number of them.

Here we have a fresh Linux VPS with 1.5GB used in the storage:

The main drive of the VPS has 1.5GB of data.

After taking the first backup, Object Storage shows about the same 1.5GB:

Object Storage shows 1.421GB of data

What happens if we add about 1.1GB of data and run a new backup?

A 1.1GB random text file is generated
Don’t worry about the openssl command, it’s just an easy way to generate a random file we can easily edit later

The object storage has grown by about 1.1GB:

Object Storage shows 2.512GB of data

Let’s make a simple edit to the file, replacing some of the text at the beginning(but not changing the file size):

The first line of text in the file is edited with "This is a test" and the same number of characters are deleted from the existing text.

A new backup doesn’t take up much more space, because we only made one small change. Restic breaks files into ‘blobs’ between 512KB and 8MB, so it only has to store one more ‘blob’ for this difference.

Object Storage shows 2.513GB of data

A more complicated edit, replacing all ‘QQ’ in the file with ‘zz’ will cause a lot more new blobs to be stored however:

The sed command is used to replace every instance of QQ with zz, but ls -l shows the same 1.1GB file size
The file is the same size
Object Storage is now 3.604GB
But the backup size has grown significantly

This changed about 250,000 of the 16 million lines in the file, but even a 1.5% change in data spread out through the whole file will greatly contribute to the number of blobs restic has to store for the change.

And of course, deleting the file frees up a lot of space on the drive

rm sample.txt
The VPS now has 1.5GB used storage

But a fresh backup doesn’t shrink the Object Storage size. Obviously one of the big reasons to have backups is to be able to recover from accidental (or malicious) deletion of data.

We can manually ‘forget’ a snapshot and ‘prune’ the data associated with it. This is a snapshot that had one of the versions of the 1.1GB file.

restic forget --prune <snapshot id> to forget a snapshot and remove the data

And the backup storage size shrinks appropriately:

Object Storage shows 2.513GB

The Hostwinds Cloud Backup scripts will automatically ‘forget’ and ‘prune’ each time it’s run, keeping one snapshot per day for the last number-of-days specified in /root/.restic_var or C:\Windows\System32\restic_repo.ps1.

In summary:

ActionVPS storage sizeObject Storage Size
Initial1.5GB1.421GB
1.1GB file generated2.6GB2.512GB
Single line changed2.6GB2.513GB
“QQ” -> “zz”2.6GB3.604GB
1.1GB file deleted1.5GB3.604GB
snapshot deleted1.5GB2.513GB

While small changes won’t necessarily contribute to extra backup space used, lots of small changes and of course big changes will greatly affect the amount stored.

Why is Object Storage Smaller Than My Data?

There are occasions where the storage on disk may be larger than the backup data. Our backup scripts automatically exclude directories like /tmp and /var/tmp in Linux and the Recycle Bin in Windows. If you ‘delete’ a file in Windows and it goes to the recycle bin, then don’t empty the Recycle Bin for 60 days, your Object Storage may be smaller than the space used in the c:\ drive.

I’ve placed a 260MB version of the sample.txt in /tmp in Linux, then run a backup:

A 260GB text file is put in /tmp
The VPS shows 1.8GB used space
Object Storage shows 1.47GB used for backups

Simply, the backup is smaller than the space used because not all directories are backed up.

The excluded directories in Linux are

/dev,/media,/mnt,/proc,/run,/sys,/tmp,/var/tmp,/var/log,/backup,/home/virtfs

And in Windows, restic backs up non-hidden directories that are ‘ClientAccessable’, so directories like c:\$Recycler and files like c:\pagefile.sys don’t get backed up.

Hopefully this helps explain the discrepancies in your data vs. the size of the backups.

Related Articles