Reports of high packet loss

Incident Report for Hologram

Postmortem

SUMMARY

On 2019/01/07 20:30 UTC, a Distributed Denial of Service (DDoS) attack degraded service for one of our global packet gateway providers’ data centers. This saturation of traffic caused elevated packet loss rates which degraded service for certain Hologram customers. The outage was resolved by the upstream provider and service was restored to full operation as of 2019/01/08 03:00 UTC.

CAUSE OF FAILURE

The DDoS attack began 2019/01/07 20:30 UTC at one of our global packet gateway providers’ data centers that saturated traffic between a core network switch and firewall. Packet loss was observed to spike up to >80% during the attack. Due to the failure being tied to specific IP addresses and not individual systems or hardware, redundant systems configured to otherwise failover were insufficient to address the root cause (an external attack, not an internal failure).

RESOLUTION AND RECOVERY

Reports of degraded service were escalated to Hologram engineering team and initial investigation of network health indicated that there was elevated packet loss for certain network nodes. Upstream providers were notified, resolution efforts were coordinated, and Hologram Status Page incident was updated. Upstream provider identified the DDoS attack at a firewall and gateway site, and restored data service as of 2019/01/08 03:00 UTC by diverting affected sessions and connections away from the targeted IP addresses and onto a different site. After internal monitoring systems and tests were confirmed as clear, Hologram team then confirmed normal system performance before clearing incident status as Resolved.

Posted Jan 15, 2019 - 19:58 UTC

Resolved

Issue resolved by network provider. Issue caused degraded service to providers in addition to Hologram's link to network gateway. RFO to follow and be posted here when available.

Posted Jan 08, 2019 - 03:26 UTC

Monitoring

Provider engineering term monitoring resolution to network issue. Packet loss rates on network improving to normal levels.

Posted Jan 08, 2019 - 03:04 UTC

Identified

An upstream connection is confirmed to be experiencing degraded performance. Upstream engineers are currently working to resolve the issue. A subset of our customers are affected.

Posted Jan 07, 2019 - 23:33 UTC

Update

Issue escalated and now under active investigation with gateway provider engineering team.

Posted Jan 07, 2019 - 22:28 UTC

Investigating

Internal monitoring identified unexpected high packet loss rates. We are currently investigating and will update here.

Posted Jan 07, 2019 - 21:50 UTC

This incident affected: Cellular Networking (Global Cellular Data Network).