Our backup system has worked flawlessly for many years now. But on the 8th of October the backup system failed in an unusual way, it still created backups of everything except our Survival server.
Usually I check our backups about once a month and the last time I checked them was at the start of October. Yesterday Plinkster lost his inventory and I went into the backups to retrieve it for him when I discovered there were no backups of the survival server for an entire month.
It would appear that the backup hard drive became corrupted in some way and it was unable to move the Survival server into the backup drive directory for archiving. (It first moves the servers over, then compresses them into an archive file).
Thankfully the issue was discovered before anything catastrophic occurred to our servers SSD that could have resulted in us losing 30 days of work.
I'm writing this here to tell you about this because I believe in full disclosure, this was a very serious issue that I intend to make sure won't happen again. To that end I have written a piece of software which simply monitors our backup archives and verifies that they are the right size (always 34GB+ when compressed) and will email me if it detects that the archives have fallen below that size. I will also be manually checking the backups every 10 days, I've set a reminder in my phone to make sure I don't forget.
As of today a new backup has been made, the corrupted folder issue on our backup array has been corrected and everything is working correctly, no data has actually been lost of course this was only affecting our backups and not the main files that the server uses.
Thanks for reading
