Brief loss of service for some customers and reduced performance for others

Incident Report for Matrix Booking

Postmortem

On the afternoon of the 9th of August a an infrastructure change was applied to the Matrix Booking backend which resulted in a temporary reduction in service availability and performance as the system correct itself.

Fault

The infrastructure change although fully tested resulted in a behaviour that was not accounted for.

Recovery

The Matrix Booking platform recovered itself.

Root Cause

The change to the application load balancer caused a reset of the target groups which temporarily disconnected the load balancer from the application pods.

Follow-up

We have made changes to our internal processes to safeguard from this happening again.

15:52 - Application was unavailable

15:54 - System was available but slow

16:02 - System back to normal

Posted Aug 15, 2023 - 07:58 UTC

Resolved

After a scheduled change some customers experienced very slow performance. This was then followed by a very brief period where the application was unavailable.
This impacted all the major applications using Matrix Booking.

The Support portal, Marketing website and Sensor solutions remained working as normal.
Posted Aug 09, 2023 - 16:02 UTC