If you have the Hostwinds Cloud Backup service it shouldn’t be more than a few days before you see that the Object Storage being used is a different size than the data being backed up.
Why is my Object Storage larger than my data?
For Shared, Business, and Reseller hosting this is easily explained: each daily backup is a copy of the entire cPanel account. If you have a Shared account with 100MB of website/email/database, your Cloud Backups would grow by 100MB each day until your retention limit is hit, then it would sit at 100MB x Days stored. You can adjust the number of days stored if you would rather pay less for storage and don’t think you’ll need as many backups.
For a VPS or Dedicated Server, however, the answers are a little more complicated. The backup software (restic) doesn’t make a complete backup of the server each day, but the backups captured are not the same as a traditional “full backup weekly, incremental backup daily” system one might be familiar with. Restic takes backups in ‘snapshots’ each day, but only stores de-duplicated data. If the oldest backup is older than the retention period (the default is 60 days and used as an example for the rest of this article), it removes and purges the oldest snapshot. This is not the same as deleting the ‘oldest full backup’, but rather it just throws away the records of changes to the files before 60 days ago.
For example, if you have “today.txt” that is automatically updated with today’s date each day, restic will have 60 copies of it stored. When the oldest snapshot is removed, it will throw away the previous versions but still allow you to restore the file to any snapshot in the last 60 days. If you have “start.txt” that records the date the server was started and never changes, it will be kept, and restoring it from any snapshot will give the same data.
If you have a large database of products that’s not updated often, it won’t contribute much more to the backups than the size of the database. If you have a database of users, forum posts, etc, that changes daily/hourly/every minute — this kind of database will contribute greatly to the size of a restic backup in Object Storage, even if the overall database size doesn’t grow quickly.
Let’s take a look at a real server. These examples are for Linux but the ideas are the same for Windows. One big difference with Windows is that it takes several snapshots per day, one for each directory in C:\, so pay attention to the date of the snapshots in Windows and not the total number of them.
Here we have a fresh Linux VPS with 1.5GB used in the storage:
After taking the first backup, Object Storage shows about the same 1.5GB:
What happens if we add about 1.1GB of data and run a new backup?
The object storage has grown by about 1.1GB:
Let’s make a simple edit to the file, replacing some of the text at the beginning(but not changing the file size):
A new backup doesn’t take up much more space, because we only made one small change. Restic breaks files into ‘blobs’ between 512KB and 8MB, so it only has to store one more ‘blob’ for this difference.
A more complicated edit, replacing all ‘QQ’ in the file with ‘zz’ will cause a lot more new blobs to be stored however:
This changed about 250,000 of the 16 million lines in the file, but even a 1.5% change in data spread out through the whole file will greatly contribute to the number of blobs restic has to store for the change.
And of course, deleting the file frees up a lot of space on the drive
But a fresh backup doesn’t shrink the Object Storage size. Obviously one of the big reasons to have backups is to be able to recover from accidental (or malicious) deletion of data.
We can manually ‘forget’ a snapshot and ‘prune’ the data associated with it. This is a snapshot that had one of the versions of the 1.1GB file.
And the backup storage size shrinks appropriately:
The Hostwinds Cloud Backup scripts will automatically ‘forget’ and ‘prune’ each time it’s run, keeping one snapshot per day for the last number-of-days specified in /root/.restic_var or C:\Windows\System32\restic_repo.ps1.
|Action||VPS storage size||Object Storage Size|
|1.1GB file generated||2.6GB||2.512GB|
|Single line changed||2.6GB||2.513GB|
|“QQ” -> “zz”||2.6GB||3.604GB|
|1.1GB file deleted||1.5GB||3.604GB|
While small changes won’t necessarily contribute to extra backup space used, lots of small changes and of course big changes will greatly affect the amount stored.
Why is Object Storage Smaller Than My Data?
There are occasions where the storage on disk may be larger than the backup data. Our backup scripts automatically exclude directories like /tmp and /var/tmp in Linux and the Recycle Bin in Windows. If you ‘delete’ a file in Windows and it goes to the recycle bin, then don’t empty the Recycle Bin for 60 days, your Object Storage may be smaller than the space used in the c:\ drive.
I’ve placed a 260MB version of the sample.txt in /tmp in Linux, then run a backup:
Simply, the backup is smaller than the space used because not all directories are backed up.
The excluded directories in Linux are
And in Windows, restic backs up non-hidden directories that are ‘ClientAccessable’, so directories like c:\$Recycler and files like c:\pagefile.sys don’t get backed up.
Hopefully this helps explain the discrepancies in your data vs. the size of the backups.