Get Timers Now!
X
 
May 18 - 08:04:52
-1
Page:  1 
MR SERVERS: Our Plans Started by: Squishy on Jan 12, '11 03:25

After our server crash, a few things became blatantly obvious. We were not properly prepared for this. The game has reached a level where downtime and loss of work is no longer excusable. We recognize this and fully take responsibility for our downtime. We have outlined a plan to augment our hardware and backup practices so that it will take multiple failures in multiple areas in order for the game to fail like it just did.

The server that failed had a single hard drive - which contained all of our data and backups, offsite backups were done on a weekly basis to my home network. This is why we had to resort to a January 2nd, 17:30 backup rather than the 1.5 hour old backup that died with the server.

The server we just installed contains a total of 14 hard drives. 2 are reserved for the OS and random storage, 8 are used to host the DB (4 drives, each mirrored to its own backup drive), and the remaining 4 are designated as hotswaps so if any of the mirrored drives fail, it will instantly go live on its own.

This gives us a new level of redundancy as far as data goes, however there are still a few points of single failure that can bring us down.

We will be repairing our dead server and introducing that into the datacenter again to replicate the existing DB, which will keep a live, up to the second copy of the database on a 2nd server that can act as a failover if the first should fail. We will also be introducing multiple backups on a daily basis to two other locations outside of the datacenter for offsite storing.

If any of the two boxes go down, the other will take its place with minimal downtime and the game will go back up live as soon as the IPs can be switched out.

I want to take even further precautions, which involve buying an exact duplicate of the server that died, as well as three new drives; one to replace the dead one, an exact cloned one in a new future server, as well as a ready to go hot swap that can be installed requiring only the time it takes to drive to the datacenter.

I am currently in negotiations with the datacenter we Colo with in the hopes that we can greatly increase our current bandwidth allotment to allow us to do even more regular offsite backups, including possible live replication to an offsite location which would mean that we have a live copy of the DB - up to the second, at any given point in time.

With proper saving, we should be able to do this in a few months. Please see the (OOC's CREDIT SALE: Perks, Toys, Tylers) thread on how you can help.

Report Post Tip

Not entirely the same topic but closely related since you're talking about DR etc.

Do you use any kind of live-clone DB for releasing your software changes/regression testing? It just always came across that any code changes or new features were introduced on the fly to the main game/server. That's obviously a glaring risk if you are serious about the game being at a level where downtime is no longer acceptable (I don't just mean server downtime, I mean functionality). I know running a parallel copy uses more resources and getting things thoroughly tested can be a pain, but knowing the MR community, I'm sure you could still have a sizeable chunk of users mucking around in any test environment for you.

Report Post Tip

This Forum Is For Non RP Talk About The Game (AKA OOC)
Replying to: MR SERVERS: Our Plans
Compose Body:

@Mention Notifications: On More info
How much do you want to tip for this post?

Minimum $20,000

(NaN)
G2
G1
L
H
D
C
Private Conversations
0 PLAYERS IN CHANNEL