Home About EIS →

Mining the World of Warcraft Armory

Mining is hard work! (thanks RipTen)

Many people have mined the World of Warcraft Armory for all sorts of information. Until recently, the data being mined out of the Armory was largely something I call “descriptive statistics”: statistics that are interesting in and of themselves, but don’t really require much more than counting up instances of labels. How many Orc hunters are there? Are there more Death Knights or Warlocks? What talents are most popular? Some deeper data mining, joining some of these statistics together, or using some more advanced techniques, could yield exciting insights into the game design.

I say “until recently”, because Zardoz and Darush, who I covered previously, began some of that exciting, deeper data mining. I’ve been concurrently working on similar problems, and with my acceptance of a paper to the Foundations of Digital Games 2010, I can talk about them.

Additionally, I have open-sourced my Armory crawler, which I call WoWSpyder.

Now, without further ado, let’s look at some of those results.

About the data

I collected 136,047 characters from the US and Europe, between 16th April 2009 and 19th August 2009, which means the collection began just after patch 3.1 and ended just after patch 3.2. A breakdown of numbers of players within the server types is in the paper.

Time to Level 80 by class

Chronological time to level (click to see enlarged)

Let’s begin with everyone’s favorite: which class levels fastest? Now, when we usually ask this, what we really mean is “Which class levels fastest in play time?” (ie. what happens when you type /played). This information is not available in the Armory. Instead, I use the time it takes to go from the Level 10 achievement to the Level 80 achievement, so I am measuring chronological time instead. There were 6244 characters who leveled from 10-80 in this sample.

Unfortunately, previous research from PlayOn@PARC (PDF) shows that certain classes are played more often by their players than others, so we can’t necessarily assume that this tells us about the play time. However, that research found that the most played (Rogue) was only played three more hours per week than the least played (Warlock), so this shouldn’t introduce any errors inside our 24 hour boundary, so I am going to assume that the classes get roughly the same play time.

What’s interesting about this graph is that it is not interesting. At all. It all looks just about the same. Nothing jumps out. Every class has about the same outliers, percentiles, medians and averages. It certainly appears that your class has no bearing on your leveling speed. Sorry Hunters, the bragging has to stop!

Number of deaths by class on the way to 80

Number of deaths by class (click to see enlarged)

Using the same 6244 characters from the 10-80 level sample, I looked at how many PvE deaths occurred on the way. All PvP and dungeon deaths are removed. Unfortunately, I was not able to filter out deaths that occur at Level 80, so take this with a pinch of salt. The sample from 10-79 was not large enough to generate any useful results.

However, there is some variance in this chart. Paladins, Druids and Shaman all survive better in PvE than other classes. They’ve got a pretty low median, and tight interquartile ranges (showing that there isn’t much variation from that median). I’d take a stab and say this is probably the hybrid class at work: able to deal damage to get through the leveling process, but able to heal out of trouble. Warrior as a class with the most deaths is not a surprise, but the squishy Warlock is a bit of an eyebrow-raiser. Perhaps this is a reflection of the class being slightly harder to play than others, or that maybe Warlocks are more rash than other classes (they do have Soulstones).

Classifying classes by the items they’re wearing

This one doesn’t have a fancy chart. I was able to build a Naive Bayes classifier that was able to correctly predict the class of a character based on their items 94.7% of the time. This means that most players are putting the same sorts of armor on their classes. This shouldn’t be surprising… WoW’s itemization is almost always single-class targeted. When the classifier was failing, it was inside armor classes, so Warlocks/Mages/Priests were most commonly confused, due to their cloth armor for spell casters. Sometimes a player was just doing something out of the ordinary, like a Rogue carrying a Venomstrike, which meant the classifier thought he/she was a hunter.

This is one of those times where scientific results match the intuitive guess, so it sounds all very obvious. However, nothing in science is obvious! If everything we thought we knew was true, Mythbusters would be a pretty boring show. Getting strong, empirical evidence about how a game design operates is something that is very important. Remember that not even game designers really know how a game operates without data; that’s why we’re seeing more and more advanced betas for game balancing.

WoW is just one game!

I’ve presented empirical proof or refutation of some common WoW design questions. This is important for all sorts of people, from game designers, game studies scholars to social scientists. I could go on and on trawling through the data I’ve got, but to find those really interesting questions, the expertise of those in other academic fields is required.

WoW is not the only game we could do this for. We could query the Lord of the Rings Online API, or try crawling Bungie’s extensive Halo stats. More and more games are publishing stats just because they can (Battlefield: Bad Company 2 comes to mind), so the mining technique is going to be more and more valuable. My hope is that, in the future, we won’t have to perform data mining at all, and game companies begin to open up their data for querying to interested scientific parties to allow research to progress at a faster pace. However, Sony Online Entertainment was brave enough to try that, and look where it got them: gamers upset that privacy has been invaded. I’m not going to get into that debate here, but the short story is that I wouldn’t really call play data private… especially if its anonymized and can’t be traced to actual individuals.

If you want to find out more, you can always check out the paper. If you’re interested in seeing how I mined the Armory, or even brave enough to try it yourself, the source code for the crawler is available at GitHub.

This entry was posted in Academics, Deconstructions, Gaming Culture and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.