Monday, 1 August 2011

Developing a Messenger Client - Part 1

A Messenger Client
Are you looking to develop a messenger client for you company, or friends? Developing a client and server seems simple on the outside, but once it becomes popular, there are some things you really need to consider, otherwise your server WILL become overloaded.  Take my word for it, it happened to me.

At Baselan 20, lanHUB was first used as the official tournament software. It had lots of bells and whistles that gamers and tournament organizers wanted.  Tournament organizers had a complete solution to organize and notify attendees, and gamers had a "player" list, with which they can chat together.  There are tons of other features, but that's a story for another day.

I had the server on a AMD Athlon II X2 3.2 ghz processor, 4gb corsair ddr2 on a miniITX board. And it went without a hitch.. almost.

How it Failed
I developed along the idea that "everyone should know immediately who is online, away, playing games, or signed off," which is the holy grail of instant messaging.  I used RemObjects SDK's brilliantly implemented event push technology to run this. However, as I stared at the server CPU monitor showing 100% utilization right before a large Starcraft tournament,  the technology can't get me all the way to my goal. I need to slimline my product.

As I arrived, the server was already slow as molasses. Asking the tournament organizer about how it came to be so slow, he said that "it was fine, but then it got slower.. and slower.. then it just stopped."

Hmm.

Behind the Scenes
When someone calls the server, its nicely threaded. Every call gets its own thread, and is squared away without a problem. However, if part of that call requires to take a peek at the player list that thread needs to finish looking at that player list before the rest of the server can continue.As long as I get my player list quickly to allow the app to continue, everything should go smoothly.

When someone signs in, the server broadcasts to that person the list of available players, and to everyone else that the person is now online. There was a bit of a bump in CPU when clients logged in, but not too much to be considerable.

The problem was that I was not there when people signed up.

At the time, when someone signs up for the first time, everyone who is signed in gets a nice fresh list of players. If 10 people sign up, its sending hundreds of player lists around event. The poor soul who signed up first would receive 9 updated player lists! Clearly that's not efficient.

And to top it off, the player list was threadlocking the server application while it was broadcasting the playerlist.

It was the perfect storm.

The Big Night
So, lets look back at the event. On night one, I noticed a 15% CPU usage, but since it was the first large scale run, I have insufficient data to compare against, and 15% wasn't bad. However, the next day right before the main Starcraft tournament, 30 people signed up within a span 2 minutes. You guessed it, it went down faster than.. well it was pretty gradual (according to the tournament organizer).

The service is designed to restart if there are multiple errors within a minute, to prevent a single error from stopping all services. So, as everyone signed up, the server slowed down, and the services restarted automatically (as a failsafe). Everyone auto-logged in, system slowed down, rinse and repeat. It appeared that people signing in was a catalyst to lock the server. During the event, I disabled the "send broadcast when someone logs in" and the rest of the event went without a hitch, albeit with limited player visibility support.

The Drawing Board
One of the biggest concerns I had was how to address this "signing in" issue. First thing I did was create a macro application that opens the client, auto-creates an account, and then signs in. All I need to do is say how many, and it will open that many clients on the same computer.  Here was the results from my first test:

Attempted: 50
Number of clients opened in 30 seconds: 15
Notes: Workstation went to a crawl after 10 clients opened

Excellent! I finally have this bug reproduced!  Its exactly as he said, after several created an account and signed in, the system went to a crawl!  Now, to find and squish this bug...

Stay tuned for Part 2!

No comments:

Post a Comment