System unavailable during planned release

Incident Report for Matrix Booking

Postmortem

During an update to our EKS cluster, an issue arose that had not been encountered in any other environment. Specifically, our Production environment contained an unexpected condition within the Karpenter Custom Resource Definitions.

When the update was applied, an error was returned which resulted in nodes began disconnected from the Kubernetes cluster, causing backend pods to terminate unexpectedly.

This has now been resolved with a configuration change to the infrastructure..

Posted Feb 14, 2025 - 16:02 UTC

Resolved

During a scheduled update to the Matrix Booking application infrastructure, an issue arose which caused the system to be unavailable to some users for 9 minutes.
Posted Feb 14, 2025 - 14:59 UTC