Skip to main content

Data validation

There are three types of integrity verification in Comet:

Referential integrity

Referential integrity means that for each snapshot, all its matching chunks exist; that all the chunks are indexed; and so on. This is verified client-side every time the app runs a retention pass, so you should ensure that retention passes run successfully from time-to-time.

Data file integrity at rest

Data file integrity ensures that each file in the Storage Vault is readable and has not been corrupted at rest (e.g. hash mismatch / decrypt errors).

Comet stores files inside the Storage Vault data location as opaque, encrypted, compressed files. The filenames are the SHA256 hash of the file content. Comet automatically verifies file integrity client-side, every time a file is accessed during backup and restore operations (i.e. non-exhaustively) by calculating the SHA256 hash of the content and comparing it to the filename.

Corruption of files at rest is a rare scenario; it's unlikely you need to worry about this, unless you are using local storage and you believe your disk drives are failing. However, for additional peace-of-mind, you can verify the integrity of the files on disk at any time, by comparing the filename to their SHA256 hash.

Example data validation commands

The following equivalent commands read all files in the current directory, take the SHA256 hash, and compare it to the filename.

These commands exclude the config file, as these are known to be safe for other reasons.

These commands do not exclude any other temporary files (e.g. /tmp/ subdirectory, or ~-named files) that may be used by some storage location types for temporary uploaded data. Such temporary files will almost certainly cause a hash mismatch, but do not interfere with normal backup or restore operations.

On Linux, you can use the following command:

find . ! -name 'config' -type f -exec sha256sum '{}' \; | awk '{ sub("^.*/", "", $2) ; if ($1 == $2) { print $2,"ok" } else { print "[!!!]",$2,"MISMATCH",$1 } }'

On Windows, you can use the following PowerShell (4.0 or later) command:

Get-ChildItem -Recurse -File | Where-Object { $_.Name -ne "config" } | ForEach-Object {
$h = (Get-FileHash -Path $_.FullName -Algorithm SHA256)
if ($_.Name -eq $h.Hash) { echo "$($_.Name) ok"; } else { echo "[!!!] $($_.Name) MISMATCH $($h.Hash)"; }
}

Cloud storage

Taking hash values of files in this way requires fully reading the file from the storage location. If the storage location is on cloud storage, this is equivalent to fully downloading the entire contents of the storage location. This may result in significant network traffic. In this case, we recommend relying on Comet's normal verification that happens automatically during backup- and restore- operations.

Data file integrity at generation-time

It is possible that a malfunctioning Comet Backup client would generate bad data, and then save it into the Storage Vault with a valid hash and valid encryption. For instance, this could happen in some rare situations where the Comet Backup client is installed on a PC with malfunctioning RAM.

In this situation, Comet would try to run a future backup/restore job, load data from the vault, but fail to parse it with a couldn't load tree [...] hash mismatch error message or a Load(<index/...>): Decode [...] invalid character \x00 error message.

In this situation, it is possible to recover the Storage Vault by removing all the corrupted data. The remaining data is restoreable. However, it's not possible to identify the corrupted data using the data validation commands above.

Data validation steps

Different methods are available to identify the corrupted files.

  • Use the "Deep verify Vault contents" feature

    • This feature is available in Comet via the Comet Management Console live connected device actions dialog when the "Advanced options" setting is enabled. It is not exposed to the client in the Comet Backup app.
    • This will cause the client to download parts of the Storage Vault and perform a deeper type of hash checking than is possible via the existing data validation steps. It should alert you to which data files are corrupt, and the Storage Vault can then be repaired following the existing documented steps.
    • The "deep verify" feature downloads only index/tree parts of the Storage Vault, and caches temporary files to reduce total network roundtrips.
  • Files mentioned in error message

    • Data files (e.g. couldn't load tree [...] hash mismatch error message)
      • The couldn't load tree error message will also indicate the exact corrupted pack file, if possible.
      • You can then delete the file from the /data/ subdirectory, and run a retention pass to validate the remaining content, as described below. This may assist with repairing the Storage Vault.
      • However, this only detects the corrupted directory trees that were immediately referenced by a running backup job; other past and future backup jobs may still be unrestorable.
    • Index files (e.g. Load(<index/...>): Decode [...] invalid character \x00)
      • The index files contain only non-essential metadata to accelerate performance. Index files can be safely regenerated via the "Rebuild indexes" option on a Storage Vault. This is a relatively fast operation.
    • Compared to the "Deep verify Vault contents" feature, repairing single files in this way does avoid the immediate bandwidth-intensive step of downloading the entire vault content; however, it is not a guarantee that all data in the Vault is safe. Use of this method should be coupled with a (bandwidth-equivalent) complete test restore.
  • Files by modification date

    • New backup jobs only add additional files into the Storage Vault. Another possible way to repair the Storage Vault is to assume files in the Storage Vault are affected after a given point in time.
    • This is only an option if your Storage Vault type exposes file modification timestamps (e.g. local disk or SFTP; and some limited number of cloud storage providers)
    • Specifically
      1. Ensure that no backup/restore/retention operations are currently running to the Storage Vault
      2. The corrupted data was created by the job prior to the one in which errors were first reported; Find the start time of this prior job
      3. Delete all files in the Storage Vault with any modification time greater than when that job started
      4. If you used the "Rebuild indexes" option or ran a retention pass since the errors began, the contents of the /index/ directory may have been consolidated into fewer files. If there are no files in the /index/ subdirectory, you should then initiate a "Rebuild indexes" operation
      5. Initiate a retention pass afterward, to ensure referential integrity of the remaining files, as described below
  • The other alternative is to start a new Storage Vault.

Recovering from file corruption

If you encounter a hash mismatch error, a data file has been corrupted inside the Storage Vault. Data has been lost.

If the issue only occurred recently, it's highly likely that most backed-up data is safe, and that only recent backup jobs are affected.

Because Comet is a deduplicating backup engine, future backups may silently depend on this corrupted data. You should immediately take the following steps to re-establish data integrity of the remaining data in the Storage Vault:

  1. Identify and delete all corrupted files from the Storage Vault's data location
    • You can use the example data validation commands above to find all hash mismatches
  2. Run a retention pass for the Storage Vault
    • This will check referential integrity (as mentioned above)
    • The job will fail, with a number of warnings in the format
      • <snapshot/ABCDEFGH> depends on missing [...] or
      • Packindex 'AAAA' for snapshot 'BBBB' refers to unknown pack 'CCCC', shouldn't happen
  3. Delete snapshots that are missing content data
    • Delete files from the /snapshots/ subdirectory in the Storage Vault's data location that match the corruption warnings from the backup job report
  4. Repeat step 2 until no issues occur

Some data has been lost, but by carefully removing the corrupted data and everything that references it, the remaining backup snapshots will all be restorable. Future backup jobs should be safe.

Missing files from Storage Vault

Please take care to ensure that files do not go missing from the Storage Vault. The loss of any file within the Storage Vault compromises the integrity of your backup data. You are likely to be in a data-loss situation.

More specifically:

If there are missing files from the snapshots subdirectory, Comet will not know that there is a backup snapshot available to be restored. There is no solution for this, other than ensuring all the files are present. This is unfortunate, but should only have a limited impact.

If there are missing files from the packindex subdirectory, some operations will be slower until Comet runs an optimization pass during the next retention pass. This is not a significant problem.

If there are missing files from the locks subdirectory, Comet may perform a dangerous operation, such as deleting in-use data during a retention pass. This is potentially a significant problem.

If there are missing files from the index subdirectory, Comet will re-upload some data that it could have otherwise deduplicated. This is unfortunate, but should only have a limited impact. Excess data will be cleaned up by a future retention pass.

If there are missing files from the keys subdirectory, Comet will be entirely unable to access the Storage Vault. There is no solution for this, other than ensuring all the files are present. This could be a significant problem. However, because the files in this directory do not change often, it's very likely that their replication is up-to-date.

If there are missing files from the data subdirectory, Comet will be unable to restore some data. When a retention pass next runs, Comet should detect this problem, and alert you to which backup snapshots are unrecoverable. You can then delete the corresponding files from the snapshots subdirectory and continue on with what is left of the Storage Vault.

For further information about specific error messages, see Recovering from Unsafe Unlock Operations