Crumpled paper

Drupal 7 introduced the brilliant feature of letting users cancel their own account and with it various options for what to do with content they've created when they are cancelled. One of these options is to:

Delete the account and its content.

Which can prove somewhat problematic if used incorrectly.

You see, Drupal is very good at the latter part: deleting all the content created by the user. It's not very good at warning someone that they are about to delete potentially a lot of important content.

The scenario

Let me set the scene for you. Someone had an account on a Drupal site and did a lot of work, making pages etc. Then they left the organisation. Someone else comes along and after a while thinks: I should clean up all these old user accounts and delete them, we don't need them any more.
Unfortunately they use the aforementioned Delete the account and its content option.

A few days pass and then they notice that the cookie policy page has gone missing. And they are sure that the FAQ section had more than 3 questions in it.
Oh dear.

They now face a serious problem. They have two 'easy' options to resolve it:

  1. Restore a database backup from before they deleted the user to recover all the lost content.
  2. Attempt to manually re-create all the content that was deleted.

However, they've been using the site in the interim and have changed lots of content. So have other users of the site. They can't simply restore a database backup from before all the content was deleted because they'd lose all the changes since then. But they also size up the volumes of missing content, and they simply aren't sure what content has gone missing, but know that it's hundreds of pages. Also the references between content have been broken, content that still exists on the site is trying to reference content that isn't there. So now not only do they need to re-create content but they have to go around fixing all the other site content that references that content. Oh my.

The third option

There is another way:

  1. Automatically re-create all the content that was deleted.

But how?

If you've got a decent backup from before the deletion happened then you contact your friendly ComputerMinds and we'll help you out by following something along the lines of the below. If you don't have a decent backup, then you're toast: Learn your lesson and start making backups of your data that you can restore from!

But you've got that backup, right? Ideally from as close as possible to, but not after, the account and content being deleted. So let's see what you/we do with it:

We're going to repeat the deletion and work out how to put it all back.

Begin by restoring the code, files and database from your backup to a development machine.
Load up the site in your browser and get ready to perform the exact same operation that caused the problem in the first place, but don't perform it yet!

Now, identify tables that contain changes that you don't really care about, the more the merrier. I'm thinking the watchdog table, any cache_* tables etc. You might need expert knowledge of the site to make this list as long as possible, it'll help later because you can really cut down the amount of noise and work you'll have to do later.

Once you've done that you want to make a 'pre-delete' database dump. Something like this:

drush sql-dump --structure-tables-list='sessions,cache,watchdog' > pre-delete.sql

Now, go back to your browser and cancel the account in the same way that was done before, so: Delete the account and its content.

Once the deletion has happened we want to run the same drush command as before, but save the results to another file.

drush sql-dump --structure-tables-list='sessions,cache,watchdog' > post-delete.sql

Now we essentially have two database snapshots, the difference between the two is all the content that was deleted. So we'll aim to produce a set of SQL queries to restore all that to the production database.

I had very mixed results with trying to get two MySQL dump files that would diff easily in a way that would leave the correct INSERT/UPDATE statements to put all the content back. Comparing the two dump files pre-delete.sql and post-delete.sql directly just didn't seem to work.

Percona to the rescue!

There's a tool in the Percona suite called pt-table-sync that will diff two databases and produce a set of SQL statements that would make the data consistent between the two, i.e. the SQL 'diff'.

There's a final wrinkle that means that you actually need another database server at this point, because pt-table-sync can only sync from one server to another, not between two databases on the same server. However, in the age of Vagrant or Docker getting multiple MySQL servers running on your machine is no big issue. I'm going to suppose you have two database servers running on ports 3306 and 3307 on your local machine.

Restore each of the SQL dump files from before to an identically named database on the servers respectively. Then you can get pt-table-sync to produce the magic:

pt-table-sync --print --databases=db_name h=127.0.0.1,P=3306 h=127.0.0.1,P=3307 > content-restore.sql

To make the diff go the right 'way' make sure the server with the post-delete.sql file is listed first in the command line. And you may need to adjust the command to get it to connect to your servers correctly.

Once you've done that content-restore.sql should contain a set of SQL commands that you could run on the production server to restore all the deleted content. However, I'd recommend doing one final manual look through the file and making sure that nothing is going to run against tables that don't really matter or that can't be recovered in other ways.
It's a text file so review it line by line and understand what each line is going to do and make sure they are the expected changes!

Once you've done all that you can execute the content-restore.sql file on your production server and that should restore everything that was deleted from the database!

Wrap up

So we've done this twice now, for different clients. We were happy that we were able to recover their content and not force them to either lose all other changes made or have to re-create a lot of pages.
We learnt so much the first time we did this, that the second time it was actually a fairly smooth process that didn't take very long at all despite having to restore thousands of pieces of content. We've also taken steps to stop people from using this particularly dangerous option when cancelling a users account.

Obviously all of the above relies on having backups of your database, and being able to retrieve a point-in-time, not just the 'latest' one. If you don't have this in place already, go now and get that sorted!
If you have backups, maybe bookmark this page so that if you ever need to recover a large amount of accidentally deleted content you'll know a (fairly) easy way that works well.