During the release process on the afternoon of Wednesday June 15th 2022, we encountered an issue with the Production system resulting Matrix Booking’s core system being unavailable for 1 hour, this impacted all applications using the core booking platform.
Fault
Recovery
The system was restarted and rolled back to the previous deployed release.
Root Cause
The ALTER table statement required a table lock on the Organisation table and with long running processes happening it had to wait and while waiting blocked any other reads/writes, the fact this table is so present in so many interactions with the database was the cause of this incident.
Because the update had taken place to the Quartz job scheduler prior to the failure of the deployment subsequent errors where encountered the following day.