On 4/15/17 11:15am CST one of our global packet gateway providers experienced a critical failure. This prevented SIMs from registering new packet switched (data) connections during the incident. Existing active cellular sessions were unaffected. The outage was resolved by the upstream provider and service was restored to full operation as of 12:35pm CST.
Cause of Failure
At 11:15am CST 4/15/17 the gateway provider's primary registration server failed due to exhausting its memory while buffering requests that became undeliverable following a connection disruption. The secondary registration server then experienced a similar failure due to 1.8M requests waiting to be synced.
Resolution and Recovery
Initial reports of service disruption were escalated to Hologram CTO and Engineering Lead. After internal monitoring systems and tests were confirmed as clear, Hologram upstream providers were contacted and Hologram engineering team was able to reproduce service outage conditions with new connection attempts that isolated the issue as being related to registration/authentication. Hologram team was then notified at 12:11pm CST by gateway provider that they were investigating the service incident, and it was resolved by the provider at 12:35pm CST. Hologram team then confirmed full system operations before marking incident status as Monitoring.
Fixes and Next Steps:
1. Follow-up call with gateway provider management (Complete 4/18/17)
2. Technical call with provider engineering team (Next week)
3. Additional Hologram monitoring and watchdogs for gateway connections (In progress)
Apr 18, 18:45 CDT
We are investigating an incident that briefly occurred early this Saturday afternoon, where there was degraded performance affecting some users for devices attempting to attach (connect) to a tower. Already-connected devices were not affected. Customers should no longer be experiencing difficulty, but we are continuing to monitor along with network core engineers to ensure there are no continuing issues as well as looking into the underlying cause. If you do notice any issue, please contact Hologram Support.
Apr 15, 12:57 CDT