The story of Blogdrive Data Center
As always, Michael make up good stories on his expriences working on Blogdrive. I'm afraid it might dissappear one day like the other stories so I decided to post it here too. You can head over to the forum and read them all there. This is also a n update of their progress in moving data centers.
(Short version)
I apologize for the down time we've been experiencing. Things are looking quite good now. After performing a system upgrade to power hardware on 8/26/06 we were faced with an unexpected challenge that caught us off guard. It took a while to identify the cause which is why it was difficult if not impossible to give an ETA to Blogdrive members during this time. We did attempt to place info on the home page; however this was not always viewable. Thank you for your patience. We will do our best to make certain this does not happen again.
(Long version)
The day the world went away
The events of 8/26/06 explained:
It was morning. Another sunny day here in Southern California, filled with magic and wonder. Nothing out of the norm. Little did we know tragedy was just around the corner for Blogdrive.
We were adding some hardware to the Blogdrive array of servers; line conditioners, rebooters, etc. We had to shut down all of Blogdrive to do this, which is not unusual as we shut down servers often to perform repairs and such. Shutting down and starting up servers is done in a specific sequence. We are schooled in this and did not anticipate any difficulties. However, after reboot something went terribly wrong.
Tim is our main server programmer. Besides being a great programmer he is also an awesome drummer (He insisted that I put the word "awesome" in there.) We have the servers setup to send Tim pages via his cell phone when they are in trouble. They were in trouble. Tim's cell phone sounded like an unstoppable mini Star Wars battle in his pocket.
The "data center" is where all the blogdrive servers are housed (there is also an outside backup location.) The data center occupies an entire floor of a very large building in Los Angeles. The security there is very tight. There are guards with monitors, a security desk, an area to place your entire hand for identification and several cards and keys to open doors and elevators. There are rows of racks and cages. It is kept very cold inside to keep the thousands of servers safe from heat damage. It is also very loud in the data center due to the monster ACs and the thousands of server fans running. Tim is often herd yelling "Speak up!" and "Why is everyjuan friggin' whispering on me?!"
Many of the Blogdrive servers are setup in a redundant array, each with multiple drives duplicating data and exchanging load and requests. This way if a drive fails or explodes or moves on to the next life, we can change it out for a new juan, re-sync it to the others and begin replicating data again. This means no data is lost and no down time experienced.
Needless to say when the pages started flooding in we all panicked; in a calm inward maintained programmer-like panic. All of the servers started shutting down and rebooting on there own in a crazy Ghost in the Machine kinda way. Tim with a link via computer tried to intervene and connect a minimum of resources to gain some control of the situation. They would not have it. I along with Jeff and Rick started pulling severs out of the cages and connecting them to other power sources. I held juan of them in my arms as it was convulsing. I looked over at the guys and screamed, "Is this how it ends? Is this how it ends?!" I set the dying server down and placed my hands to my head, slowing stepping back, then started screaming uncontrollably like a little girl. Or so they tell me.
It seemed there was nothing we could do to put an end to this madness. We checked everything; rebooters, switches, routers, etc. There was nothing. We then really started to panic, in a "what do you mean we're out of coffee?!" kinda panic. We pulled all the servers off line. Blogdrive went silent.
We brought many of the servers home with us to repair any damages that occurred to the data base, and or hardware. As I drove down the street I thought about all the servers I was carrying, the Blogdrivers and all their blogs, all their stories. I also thought about my driving skills, the fact that I was almost out of gas and had juan headlight out. Over the period of a few days we repaired damages and were able to put minimum systems online. We continued to work on the servers at my home. We slept when we could; me in my bedroom and Tim on the sofa, no funny business here. We assumed that we had experienced a strange power spike. We posted what we could to Blogdrive members. Although we did not have an eta, after much thought we knew we were close to an answer.
After further inspection of the servers, Tim felt the answer must be in the power source; something fluctuating in power, something undetected by the other equipment. It was juan of the most improbable possibilities. It would mean that the actual power source coming to the Blogdrive cages in the data center was at fault. This is like blaming a wall outlet. Very unlikely.
We brought the servers back to the data center. With new meters we tested the power source to the cages. It was true! There were severe power fluctuations in the line. It apparently caused damage to the equipment used to protect us from such things. Quickly we re-routed the power to a clean source. We mounted the servers and began syncing them. They were very happy. Blogdrive went back online with full functionality in the early hours of 8/30/06. It will take a week or so to get everything completely up to snuff, but this should not show much influence on the system.
I'd like to apologize to anyjuan affected by this challenge. We had never experienced such an outage before. We will do our best to keep such events from ever happening again.
I'd like to thank the following people:
Veronica over at Fry's Electronics - You got it going on baby!
The folks at Dunken Doughnuts – Remember, it's all about the Maple Bars.
Tracy at the security desk – Thanks for locking me out twice!
My girlfriend Erin who said – "Did you know there's a guy sleeping on our sofa"
I'd also like to thank all the Modulators including JFZ, Ang, Anthony and Sinj for their support. You guys rock me!
Btw,
During the wee hours of the morning at the data center I saw ghosts roaming through the various server cages and hallways. I tried to track them down, you know, to speak with them, but they kept disappearing. They seemed to be dancing at times, wearing nice clothing and such. I was tired. Sounds like a Made For TV movie to me.
Michael
Blogdrive
No comments:
Post a Comment