On Jan 27, 2010 at noon our servers began experiencing unexpected load. When we dug into the issue, we discovered that a remote attack was being performed against our service, causing a fault in one of the core modules on our primary server. We were forced to perform emergency maintenance that brought down our hosting service for a number of hours.
Our hosting service has been steadily growing for the last 12 months, and our infrastructure has been able to deal with this growth with ease. On Thursday when the attack occurred, the APC module that handles in-memory caching began eating up all free memory on the server causing it to be unresponsive to new requests. We were forced to immediately shut down our core web server to diagnose the problem.
The quick fix for the problem was to increase the memory on that specific server so it could handle the extra load, and decrease usage of our APC module across all sites. The reason it took so long to get back online was purely due to the fact that resizing a server on our hosting provider took longer than expected.
We are now in the process of updating our infrastructure so that this problem won’t happen again. In the coming weeks we will be implementing this change, and the downtime associated with these changes will be minimal. We will be notifying all hosting customers as the maintenance window for upgrades approaches.
We sincerely apologize for this unexpected downtime, and we appreciate your understanding as we work to prevent future attacks from causing similar problems.