An Unusual Postgres Upgrade

At the end of 2020 we decided to migrate from postgres 9.6 to postgres 12 in my company. We were talking a lot about it since 2018 but since it was working we keep it running. The long wait was basically because postgres 9.6 was compiled with the flag --disable-integer-datetimes and thus we couldn’t use the pg_upgrade tool. No idea who decided to do that but it was way before I started working and very likely it was compiled that way to keep it simpler to upgrade to newer versions, until it was not. The initial idea was to migrate to postgres 10 first, but I decided to go further and started my testings directly with postgres 12.

The first steps were to try some tools like Slony and pglogical, but we had no success with them. The only viable way was to perform a dump + restore, but that would lead to a downtime. But that would be acceptable according to our SLA since we noticed all the customers with antecedence.

Spawning new server from latest snapshot

In order to test the upgrade we spawn another server with the latest snapshot. NEVER test such things in your production database! Thus, we had our 9.6 server ready. We also setup another clean server with postgres 12. We decided not to run different postgres versions on the same machine to reduce the chance of mistakes.

The ideia now was very simple: dump the database from 9.6 machine; send it to 12 machine; restore the dump. However, the database was 640GB and under the normal circunstances that would take a lot of time, specially the transfer step. So, I came up with the idea of using an EBS volume that would be attached to the source machine and store the dump. Once the dump became ready we would detach it and then attach it in the target machine :noice:!

Then we realized that standard EBS volumes are slow and we need a boost! We decided then to use an io1 volume type with 5000 IOPS to improve the speed. And now we were ready to test it!

Dumping the database

The dump process is pretty straight forward. First we attached the new volume to the instance and mounted it at /pg_dump.

sudo mkdir /pg_dump
sudo mount /dev/nvme2n1p1 /pg_dump
sudo chown -R postgres:postgres /pg_dump # grant permissions to the postgres user that will perform the dump

In order to preseve everything, we first dump the globals since the pg_dump does not save those objects (roles, tablespaces and other properties). This step is very quick. Then we need to dump the data and to speed up things we used all the processor cores available, in our case 16 cores. In order to use parallelism to dump data we need to specify the directory format, so each table will be dumped to a different file.

sudo su - postgres # switch to postgres user or any other user that will perform the dump
pg_dumpall -g --file=/pg_dump/globals.sql # first we dump the globals
pg_dump -j 16 -Fd --file=/pg_dump/data.dump database_name # then we dump the data

At this point you can face some issues, specially if your database stores old data. In our case, we found some corrupted entries and, even if the process didn’t fail, we had to fix that before the restore. Every time I faced an warning or error I stopped the process, fixed the data and started all over. This is important because sometimes one error may shadow another one and if you try to fix them only at the end you may have a surprise during the restore. Jot down every time you had an error. You will need to fix them before the real dump on the production database!

Sometimes it’s possible to fix that data, sometimes it’s not. You need to decided what to do in order to continue the process. In our case we decided to delete the corrupted entries, since they are stored in a table that was meant to store temporary data only. After that we had to reindex the toast table and the table itself. We also performed a VACUUM FULL VERBOSE ANALYZE in that table, but I’m not sure if that was needed at all. I did it just to be sure the data was ok.

Once your data is ok, you are ready to test the restore process, but first umount and detach the disk.

sudo umount /pg_dump

Restoring the database

Before restoring the database we need to prepare it. We installed the desired version, which was postgres 12 and configured the postgresql.conf. I found this old post that was really helpful and my config was set as the following:

shared_buffers = 16GB #1/2 of what you'd usually set (postgres recommends 25% of total memory, thus 32GB during normal operation for 128GB database, 16gb during pg_restore)
maintenance_work_mem = 2GB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
work_mem = 32MB

Don’t forget to restart postgres after changing the config file and don’t forget to write down the previous values. Once the server is running with the new configuration, attach and mount the disk with the dump and then switch to the postgres user.

sudo mkdir /pg_dump
sudo mount /dev/nvme2n1p1 /pg_dump
sudo su - postgres

Now, we need to restore the globals first so the roles and other global configurations are created. In our case we also had to create the database again, before loading the dump.

psql < /pg_dump/globals.sql
psql # enter the postgres console

Also, create the database that will receive the data and grant the owner permission to your application user, if necessary.

CREATE DATABASE target_database WITH TEMPLATE = template0;
ALTER DATABASE target_database OWNER TO user;

Now the time has come! Let’s restore the database!

pg_restore -j 16 --disable-triggers -Fd -d database_name /pg_dump/data.dump

We were using a config to disable the triggers but it also disable any referential integrity checks like foreign keys. This setting will give us a boost since our data is supposed to be intact and no validation is required.

While restoring, you may notice some warning or errors. We had some of those, probably because of outdated indexes and we had to fix them on flight.

Once the restore process was finished and we had all the necessary data, we planned our execution plan for the production upgrade. Those were the key learnings from this exercise:

  • Keep track of the time necessary to perform the dump and restore process. Use the time command to measure it;
  • Double check all the services that use that database. In our case we had 3 different services and we had to stop them during the upgrade;
  • If you are using a replica database, and you should, you will need to create a new snapshot from the newly restored database;
  • Once you got the necessary estimated time to perform the upgrade process double it. It’s better to have enough time to deal with unexpected issues;
  • Notify your customers with antecedence, according to your SLA;
  • Schedule a maintenance window when the system has low activity. We decided to perform the upgrade on Saturday morning.

Production Execution

In this section I’ll detail my experience while executing the postgres upgrade.

We started the upgrade process at 7:30am and placed the maintenance status on our systems. We stopped our services so new data would not be inserted during the dump. We double checked everything before moving on. At 8:05am we started the dump process and it has finished at 9:35 with no warnings or errors.

We attached the dump disk to the new server and started the restore process. We got some warnings/errors while doing it and we fixed them on flight. The issues were related to duplicated ids or missing references on very old records. The restore process finished at 11:36am, including the prewarm and analyze of some big tables. We then configured postgres to the optimal settings again, overriding the restore settings we previously set. At 12:05 all the services were running. While creating the replica server and we faced an issue that took me some time to figure it out: the streaming replication had changed from 9.6 to 12. In postgres 9.6 it was enough to set the main server to accept all connections from the replica in pg_hba.conf but in postgres 12 you need to explicitly tell main server to accept replication connections. That said, read the changelog carefully!

Now everything is good right? No. I created a snapshot from the main server while it was configured with wal_level = minimal and thus no data is send to wal files. I had to create a new snapshot and then I was able to properly spawn the new replica server. Remember the extra time I said before? 😆