A couple of nights ago I was trying to set up a new WordPress site for a family member and failing miserably. After a while I figured that I had a problem with a rather subtle kind of database corruption, caused by me taking a few shortcuts when transferring some databases from an old server to this one without updating the tables to match the slightly newer version database engine on the new system (my first mistake of many). No problem, I thought: this is not my first database rodeo. Being the cautious kind, despite having two quite independent backup systems seemingly running, I used rsync to make a copy of all the (at this point working) databases before proceeding. You can never be too careful.
Inevitably, things didn’t go according to plan.
While trying to repair it, I managed to destroy almost every database on the system, including the all-important MySQL database itself. Turns out MySQL’ s repair tools are a bit vicious and/or I don’t know enough about how to use them. Anyway, no problem, I could just restore my fresh backup and try again.
Nope.
In fact, MySQL itself would not even start. I can only guess that tables that are in current use are skipped by rsync, because a few were OK but most were missing. This is something I need to investigate further because it has worked for me before: maybe I missed a flag or something.
No problem. I had two alternative restore options. I’m a belt, braces, and elasticated waist kind of a guy.
My first-line backup turned out never to have been saved all the database tables, though it had saved some of them, and almost everything else. That took a while to figure out. I guess the reason might be similar to the issue with rsync but it is another thing I need to investigate.
No worries, I had my second line backup ready, created by a smart WordPress plug-in that exports everything on a nightly basis in friendly zipped up SQL format to a totally different server.
At least, it did do that until late 2018.
And it didn’t include the MySQL system tables.
Not wanting to lose well over a year and a half’s worth of changes, I went through an increasingly desperate range of potential fixes, using the in-built tools, playing with removal of various system tables (some of which can be recreated when needed, some not, as it turns out), updating MySQL, and culminating in reinitializing the MySQL database itself. That got the database management system up and running again so I set about recreating the users and permissions – luckily not a huge issue as I had sensibly recorded the relevant info in a separate file. I then tried many ways to combine the backups I had to fix the errant databases, but all failed.
But at least I had my 2018 backup. So I restored that to the site – which worked fine, and I even remembered to allow for database version changes – and then resyndicated the last 20 months or so of posts from the Landing, which is normally where I post things first.
Oops.
I am sorry, Twitter followers, for the dozens of notifications you then received in the middle of the night (did I mention that it was considerably after midnight by this time?) because I had forgotten to switch off the WordPress tool that automatically tweets new posts. I don’t know whether it helped that I then deleted all of those tweets from my Twitter account. Probably not too much: the notifications had already been sent.
I eventually found a previously forgotten proper SQL backup I’d made back in January this year, as a precaution some weeks before moving the sites to their new home. I was then able to resyndicate at least most of my posts that had originated on the Landing since then, remembering to switch off the automated tweeting tool this time.
I have lost a fair number of tweaks and pages that I had created or modified on the site itself over the past few months, and I expect I’ll be finding quite a few of these over the next little while, but they are mostly not too troublesome to recreate. I’m going to need to work out how to deal with media files (that were unaffected but that have become divorced from WordPress) but I think that shouldn’t be too painful. A few comments from visitors have been permanently lost, but otherwise all seems (touch wood) to be more or less OK now. Fortunately, the owners of the other sites I host on my server haven’t been using them much since January, so (though a few comments may have been lost), they are not too cross.
Have I learned anything useful from this?
Probably not. Not enough, anyway, and likely not persistently. Technical skills are very transient and fade fast when you don’t use them. I had to look up many things that used to be second nature to me this time round. Even if I had remembered well, the technologies I was using had evolved. When you develop the skills to be part of a machine they are of limited use when the machine itself changes. This, as it happens, had a lot to do with the problem I was trying to fix in the first place. As in all things technological, including all things educational, the tools matter far less than the assembly and orchestration of them.
I have managed database management systems for approaching 30 years. In fact, if you count DBaseII/III/IV systems on personal computers, it’s closer to 35 years. I have suffered as a result of carelessness, stupidity, bugs, malice, bad software companies and natural decay enough times to know that you should regularly restore your backups and go through the rest of your recovery procedures on a regular basis, and certainly after every significant change (a version upgrade, for instance). In fact, I teach database management at graduate level and I enthusiastically correct students who forget it. But managing actual databases is very much an occasional hobby for me now, and I don’t have the time, money, or patience to do what I know should be done. A full restore is a big job that carries its own risks, so I try to avoid it, and take risks rather than lose a good night or two of work.
I’m contemplating containerization so that I only have to deal with big self-contained chunks though (with my current setup and limited experience of containers) that is a daunting option. I’ve considered paying someone else but, as a lot of the reason for doing this is not just the ability to control my own systems but to learn how to do so (I hate teaching things when my own skills are rusty) that’s a bit of a cop-out. I am vaguely wondering about setting up a cluster, though that’s a great deal more effort and only protects against a limited, if important, range of problems (it would have helped here, because I would have switched off mirroring before doing this, and could have reversed it when things went wrong). I welcome any thoughts anyone might have on the subject! Any rock solid (Linux) backup tools that are more straightforward than Bacula? Any smart strategies for keeping systems safe without major effort or skill needed?