Investigating Data Connectivity Degraded Service
Incident Report for Hologram
Postmortem

Summary

On 2019/04/23 00:20 UTC Hologram internal network monitors detected an unplanned outage experienced by one of our global packet gateway providers while they were engaged in routine maintenance. As a result, signalling traffic failed for Hologram and other providers worldwide, and affected SIMs were unable to attach to towers and open new sessions at the time. The outage was resolved by the upstream provider and service was restored to full operation as of 02:00 UTC.

Cause of Failure

At 2019/04/23 00:00 UTC the gateway provider commenced regular maintenance, which involved an offload migration of signalling traffic. Under normal maintenance or fault conditions, this occurs seamlessly, without affecting existing or new connections. During this upgrade, changes being made to offline signalling services erroneously also affected their online counterparts. As a result, HLR/HSS services became unusable and signalling requests failed.

Resolution and Recovery

Automated monitoring systems at Hologram and at the upstream provider both reported failures occurring for signalling requests. These reports of service disruption were escalated to Hologram's engineers, CTO, and Product Lead. Upstream providers were notified and resolution efforts were coordinated, and Hologram Status Page was updated. Upstream provider identified and fixed the issue to restore signalling traffic as of 02:00 UTC. Affected devices, including Hologram automated monitoring devices, automatically reconnected successfully following resolution. After internal monitoring systems and tests were confirmed as clear, Hologram team verified normal system performance before clearing incident status as Resolved. Upstream provider has since coordinated with equipment vendor(s) and has successfully tested a revised upgrade plan to ensure isolation in the event of any possible upgrade interruption, and has communicated changes to Hologram.

Posted May 01, 2019 - 02:30 UTC

Resolved
Incident resolved and network connectivity restored to normal service levels. Hologram team has scheduled debrief with upstream network provider on 4-23-19 and will report post-mortem when available here.
Posted Apr 23, 2019 - 03:47 UTC
Monitoring
Network connectivity health monitoring at normal service levels. Hologram team continuing to monitor.
Posted Apr 23, 2019 - 02:44 UTC
Update
Network health restoring for certain affected devices. Hologram team continuing to monitor for all clear and update from network partner resolution.
Posted Apr 23, 2019 - 01:50 UTC
Identified
Degraded service affecting certain customer devices. Network team continuing to investigate and isolate. Hologram has already escalated investigation to network partners who have team dispatched to resolve.
Posted Apr 23, 2019 - 01:03 UTC
Investigating
We are currently investigating reports of degraded service for network connectivity. Network team actively investigating impact.
Posted Apr 23, 2019 - 00:40 UTC
This incident affected: Cellular Networking (Global Cellular Data Network, SMS over cellular - Device Terminated).