Home About EIS →

Retrospective on the CIG 2010 Level Design Competition

At the recent 2010 Computational Intelligence in Games conference in Copenhagen, Denmark, there were competitions for making race car controllers, human-like FPS bots, and Ms. Pac Man players, among others. The competition that drew my interest, however, was the Mario level design competition, which challenged entrants to create procedural level generators that could generate fun and interesting levels based on information about a particular player’s style. The restrictions on entries (use only fixed numbers of gaps, coin blocks, and Koopas) meant that the winner would have to be cunning: it wouldn’t be easy to just estimate player skill and make the level more or less difficult by adding or removing obstacles; you would instead have to find ways to place obstacles that made the same obstacles more or less difficult. And because entries would be played by players at different skill levels, they would have to be flexible and adjust their output over a broad range of difficulties to get a high score. Finally, because score was based on the audience’s relative rating of enjoyment between level pairs, there would be no way to game the system and optimize some metric set without making truly enjoyable levels. Between these constraints, the winning entry should have been a demonstration of the power of procedural content generation to adapt to players of different skill levels, which is one of several reasons that PCG is useful in games. Unfortunately, the competition design may have been a bit too clever.

I say this because the winning entry (submitted by my fellow EIS student Ben Weber) didn’t have any of the desirable properties mentioned above (you can read about its algorithm in the previous post). Ben hacked together a simple generator in a couple of hours of spare time, which built levels by scattering things about randomly over the course of a few passes. Ben’s entry ignored the data about the player, and didn’t try to work with the constraints, instead just ignoring them and then adding a final processing pass that made sure they were satisfied (which my entry did as well, to be honest). Thus rather than showcasing smart PCG that adapts to the player in order to provide an enjoyable experience, the competition seemed to give evidence that a good randomized (or even static) level could beat the best efforts at adaption.

So what does this tell us about the competition? Did it have a flawed design? Is the best that considerable research into procedural content generation has to offer worse than a few hours of hacking?

Personally, I think that there are a couple of things to take away from this. First and foremost, the competition was a success because it resulted in six practical functioning generation systems. One of the main goals of any academic competition is to motivate the creation of working systems from research ideas, and this one succeeded at that. Second, I think that the design of the competition could have used some more work. It focused on adaption, which is pretty difficult to measure, and the evaluation framework wasn’t quite up to the task. A really great competition not only provides motivation for system building, but also becomes a means of comparing the practical results of different approaches to a problem, and I don’t think that the CIG 2010 results were strong enough to be valid in this regard.

On the other hand, Ben’s victory should be taken seriously. It’s evidence that a strong level design (embedded in his algorithm, in this case) can have a large influence on fun, and that adaption (at least as done by the other competition entries) may pale in comparison. Of course, I don’t think that the competition results are quite strong enough to say that conclusively, but it certainly is worth investigating the tradeoffs between raw level design and adaption. Perhaps future competitions should pick a different aspect of PCG to encourage, or at least think hard about how they are designed as experiments. Of course, part of the problem may lie with the entrants: only one of the entries from EIS stressed adaption, since our work in this lab is more focused on having a large and interesting output space for our generators, and on working with humans during the design process. If other entrants treated the issue similarly, it may have been the case that there simply weren’t any entries with strong adaption techniques in the competition.

Despite mixed results, the competition was fun: I got to show off my own generation algorithm (which unfortunately broke during the competition itself) and meet other people to learn about their efforts. And even though the experiment design wasn’t ideal in practice, the constraints motivated me to address the issue of adaption within my system, which wouldn’t have happened without them. When I gave a talk that explained my entry the response was pretty positive, and Alex Champanard from AiGameDev.com mentioned the competitions prominently in his review of the conference. If the level generation competition is repeated next year at CIG 2011 in Seoul, it’s likely to see continued participation from EIS, and maybe we’ll even become perennial favorites.

(For the curious, the slides from my talk at CIG and the paper that presents the algorithm I used for my entry are available from my website at http://www.cs.hmc.edu/~pmawhorter/research.html)


About the author:  Peter is a 2nd year PhD student interested in most of what goes on in the lab. He's done some work with StarCraft and level generation, and is working on joint generation of levels with stories right now. Read more from this author


This entry was posted in Academics. Bookmark the permalink. Both comments and trackbacks are currently closed.

4 Comments

  1. Rune
    Posted September 4, 2010 at 3:19 PM | Permalink

    I would argue that the competition did well in measuring player enjoyment rather than measuring successful player adaptation. I’d say that player adaptation is a means to achieving player enjoyment, not a goal in itself. And furthermore, it’s more of an hypothesis than a fact that player adaptation does increase player enjoyment at all (although I’m sure it has proven to be a fact under certain circumstances), or at least that it’s a particularly important factor in player enjoyment.

    So rather than seeing adaptation as a goal in itself, it should be tested and experimented with, among other parameters, as a means to achieving player enjoyment. If adaptation truly does matter, it should show from the enjoyment rankings, and incorporating player adaptation would be an important part of a winning strategy.

    Naturally increased enjoyment from adaptation may be overshadowed by even bigger increased enjoyment from other factors that another entry implements. I think the competition could gain a lot from being iterative. After round 1, everyone could steal from each other’s entries and try to further improve upon them – and someone could try to take Ben’s winning entry and augment it to adapt to the player. The new ratings could then reveal how that would compare relative to the original version.

    In conclusion I don’t think there was any inherent problem with the competition (this is all based on what I’ve read in this blog post) but rather that the competition would need to be run many times before any interesting conclusions could be drawn. It’s sort of like how a genetic algorithm needs to run for many iterations before you can say anything about which traits are the actual winning ones.

  2. Posted September 4, 2010 at 5:34 PM | Permalink
  3. Posted September 5, 2010 at 3:18 PM | Permalink

    I’m pretty sure I just broke it. Tried for quite a while to get to 0.9, but only managed 0.84, but at some point I started getting levels without any enemies in them. Maybe something to do with garbage collection of sprites?

    The main thing that I would change is to use mushroom availability (or as a proxy for that, frequency) to adjust difficulty. The “easier” levels have fewer enemies, but they also seem to have fewer powerups. You could make the low end of the scale much easier by providing more powerups.

    Of course, I also think you should eliminate the glitchy enemies (and maybe add bullet towers). Getting killed by invisible sliding bullets isn’t much fun, and the fact that you can jump on the sliding piranha flowers isn’t obvious. You might also consider not using the flying goombas, because their hitboxes are messed up somehow.

  4. Posted September 5, 2010 at 3:25 PM | Permalink

    The problem is that the question: “Can we make enjoyale levels using PCG?” isn’t a very interesting one. The answer is trivially “yes”, but knowing that doesn’t benefit us particularly. A more interesting question would be “What can we do with PCG that we can’t with current level authoring techniques?” Most research focuses on a particular answer to this question (like “adaption”) and those bits of knowledge are informative.

    It’s certainly true that whether adaption contributes to player enjoyment isn’t necessarily settled, but I think that an experiment wondering about that should try to isolate the adaption from other factors (like the various base levels of fun in different competition entries relative to the averaged audience). Unfortunately the best way of isolating adaption is to run a comparative study between adaptive and non-adaptive versions of the same generator, which isn’t possible in a competition format.

    (Thinking about it, you could actually give a prize for adding adaption code to a specific generator, but that wouldn’t encourage diverse approaches to generator design.)