Matrix Booking - disruption to service

Incident Report for Matrix Booking

Postmortem

During the release process on the afternoon of Wednesday June 15th 2022, we encountered an issue with the Production system resulting Matrix Booking’s core system being unavailable for 1 hour, this impacted all applications using the core booking platform.

Fault

  • During the running of the normal release process, an update to the database schema was attempted as part of the deployment which caused a contentious lock on the database resulting in all read/writes being blocked.
  • It was also discovered that during the automated deployment process the new overnight job had been successfully added to the scheduler, however this should have been removed when reverting to the previous version of software.

Recovery

The system was restarted and rolled back to the previous deployed release.

Root Cause

The ALTER table statement required a table lock on the Organisation table and with long running processes happening it had to wait and while waiting blocked any other reads/writes, the fact this table is so present in so many interactions with the database was the cause of this incident.

Because the update had taken place to the Quartz job scheduler prior to the failure of the deployment subsequent errors where encountered the following day.

Posted Jun 24, 2022 - 09:20 UTC

Resolved

We are pleased to inform you that the Matrix Booking applications are now all fully restored and running as usual as of 17:37 BST. All applications should automatically recover and reconnect to Matrix Booking but if they do not then please raise a ticket with the support desk at https://support.matrixbooking.com
Apologies again for the inconvenience we will provide more data once it is available.
Posted Jun 15, 2022 - 16:41 UTC

Investigating

We are currently experiencing some issue with the Matrix Booking database and as a result you may no longer be able to access the system. We are investigating this urgently and will post an update as soon as we have one. This is issue started at approximately 16:45 (BST) is affecting all the applications listed below.
Posted Jun 15, 2022 - 16:16 UTC
This incident affected: Web applications, Mobile applications, Email delivery service, API service, Matrix Welcome apps (inc Digital Signage), Support portal, Public website, and Matrix Booking - Sensors.