System Outage
Incident Report for Interseller
Postmortem

At around 15:31 EDT on May 11, 2021, we started seeing outage notifications that our service was unresponsive. Our engineering team was notified shortly afterwards and an investigation was kicked off.

Our team restored service shortly afterwards at 15:36 EDT which was due to our self-healing infrastructure. However, at 16:01 EDT, we started seeing similar unresponsive issues. Once again, service was restored automatically.

From our investigation, we were able to isolate the cause which occurred when calculating the approximate time when a message would next be sent to a contact. When set to a high value, this caused our web servers to calculate a time that was infeasible and eventually would cause our web servers to crash. When this request was repeated, it would cause our entire web server fleet to become unresponsive, which then caused our entire service to respond with a 503 error.

This issue was first sighted on May 10, 2021 at approximately 10:07 EDT and is also related to a previous incident on May 10, 18:25 EDT.

Our team patched that code and released it into production at around 18:07 EDT.

We know that outages can cause frustration during the work day. Please know that our team is working diligently to investigate how we can handle and prevent these issues from occurring in the future.

Posted May 11, 2021 - 18:26 EDT

Resolved
This incident has been resolved.
Posted May 11, 2021 - 18:12 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 11, 2021 - 18:07 EDT
Update
A fix is being deployed to prevent further issues.
Posted May 11, 2021 - 17:44 EDT
Update
We are continuing to work on a fix for this issue.
Posted May 11, 2021 - 17:21 EDT
Identified
We've found the root cause of the issue and we're working on a fix.
Posted May 11, 2021 - 17:05 EDT
Update
Service is continuing to drop at times. We're continuing to investigate.
Posted May 11, 2021 - 17:00 EDT
Update
We are continuing to investigate this issue.
Posted May 11, 2021 - 16:35 EDT
Update
We are continuing to investigate this issue.
Posted May 11, 2021 - 16:17 EDT
Update
We are continuing to investigate this issue.
Posted May 11, 2021 - 16:17 EDT
Update
We are continuing to investigate this issue.
Posted May 11, 2021 - 15:56 EDT
Update
Service has been restored and we're taking a look into the underlying cause. We'll update this page every 15 minutes.
Posted May 11, 2021 - 15:41 EDT
Investigating
We are currently investigating this issue.
Posted May 11, 2021 - 15:31 EDT
This incident affected: Website & API.