Computational Machinima Analysis

In my previous post, I outlined a classification system for machinima complexity. The idea of tiers arose from watching a lot of machinima and following production trends in the community. Alongside the existence of tiers, I also hypothesized that machinima production tools grew more complicated over time and that there existed significantly more simple machinimas than technologically sophisticated ones. To quantify this trend, I wanted to classify machinimas by their tiers and then plot how many existed as a stacked bar chart over their release years. This graph never came to fruition because gathering the data for this diagram would be quite challenging. I did learn a bit about computational film analysis in researching how to accomplish my project so I wanted to share my plans anyways. In this technical post, I’m going to explain how I planned to gather data on machinima tiers.

Step 1: Collecting Raw Video Data

Firstly, I need raw video to analyze. Machinima tiers are a label that I contrived so there is no video metadata that will indicate what tier it is. The only way to get a machinima’s tier is to analyze the video myself. Unfortunately, it’s hard to assemble a machinima collection nowadays. Back when I conceived this theory, there existed websites dedicated to hosting gaming content. It was easy to find and download machinima because it wasn’t mixed with non-gaming content. We all assume the internet is forever, but my journey into internet archeology has shown me that’s not guaranteed. Like early motion picture, I suspect many pioneering works are lost to the sands of time. Some common reasons I had difficultly finding works are as follows:

The website that hosted the video is now defunct
YouTube took down the work due to copyright infringement
The video was unindexed or has a very low search relevancy due to its age

Fortunately, I do have some records of these works. Being an avid machinima fan, I kept a catalog of creators I’ve encountered. Some machinimists create more than just machinima so there may be other types of content mixed into their filmography. Thus, to assemble a clean data set, I would need to manually filter through their published works and select the machinimas. It’s tedious work, but there usually isn’t that much to shift through for every machinimist. Being a labour intensive hobby, most machinimists don’t produce that many videos over the course of their career. Using my shortlist heavily biases the data set towards popular English-speaking creators using mainstream games. It’s not representative of the entire community, but this all I have to work with. This bootstrapped data set could then be used to train a bot on the keywords and metadata to target in YouTube searches. Creating the bot, let alone validating it, is a lot of work, but let’s say we overcame this obstacle. How do we process this video data?

Step 2: Video Analysis Tools

We need to gather the tools for video analysis. We can leverage existing computer vision libraries to help us number-crunch pixels. Most of my analysis will strive to use content agnostic processes that can run without needing to do object recognition on game objects. For example, one primary operation I will rely on is frame differencing where we subtract the pixels of one frame from the neighbouring frame to show what is moving.

Building on top of this operation, we can divide videos into shots using shot detection algorithms. This will enable us to analyze footage in terms of its cinematic technique. We can use AI to classify types of cinematic shots, but we may need to train it ourselves if models trained on live action data do not generalize to animation. Investigating cinematic technique is important because I theorize that the technically advanced films use a larger variety of shot types than simple films. Assembling these tools will greatly help us scale up our machinima tier pipeline.

Step 3: Identifying Tier Attributes

Each tier has certain characteristics that can be programmatically identified. All of these features are untested and derived solely from my experience with machinima. If a video’s characteristics matches multiple tiers, I will classify it as the highest observed tier because high tier tools build on top of low tier ones.

Tier 0: Gameplay Footage

This is the easiest tier to identify because we can use frame differencing alone. Gameplay footage will have UI elements such as health bars, inventory, timers, etc., on screen for long durations of the video. We can look for outlines of these UI elements by examining the persistent pixels from frame differencing. This will also pick up other video features like watermarks, so we must take into consideration that the UI elements can be partially animated and scattered in various places on the screen. If these persistent pixels are found throughout the video, we can classify the video as gameplay footage.

Tier 1: In-Game Shoots

To tell if a video was filmed in-game, we’ll need to employ some more sophisticated tools to analyze player movements and camera movements. There is a limited palette of motions available to puppeteers. Most games have a set of emote animations the player can use. Beyond that, you are stuck with just the basic motions like running, jumping, and head bobbing. Trying to recognize emotes through humanoid pose estimation would take a lot of effort so I think we should focus on something less game specific. Since camera operators are also a player, we can expect to see camera movement and angles that are normally found in-game. For example, most games have player cameras that cannot do close-up shots. One must use an asset extractor to obtain such a shot. It’s unclear if shot detection algorithms would work on game footage, but if it did, we can expect to see a lot of medium to long shots. The lack of close-ups is a characteristic of tier 1.

The other aspect of camera work that might be noteworthy camera movement. I’ve noticed tier 1 videos tend to use a lot of static shots compared to other tiers because they’re easy to shoot. When the camera does move, however, there isn’t that much variability in speed – it is just the character’s running speed. Moreover, player cameras don’t have the smooth animation paths unlike digitally scripted ones do. If camera solving shows a consistent movement speed, sharp changes in velocity, or any jerkiness, then it’s likely we’re looking at a player controlled camera and can conclude the video is from an in-game recording. As we move into higher tiers, we gain access to more shot types.

Tier 2: Asset Compositing

Extracting and compositing assets into a video gives the director more camera angles and shots to work with. We could examine shots again, but it might be easier to just detect chroma key artifacts. I think the simplest way to define tier 2 is presence of tier 1 attributes with the use of chroma key. If it satisfies the tier 1 filter and uses chroma key, then it can be classified as a tier 2 machinima.

Tier 3: 3D Animation Suites

Directors have a lot of available tools at this tier. Expanding on the importance of camera work, we will set aside the character animations again and focus on the cinematography for clues. In tier 3, all shot types and camera angles are available because the camera can be placed anywhere. I expect to see more close ups and less eye level shots now that the camera is detached from the player. This also means more variation in camera motion. For example, directors can now do slow pans, arc shots, and change focus distance. Lastly, real-time ray-tracing is not common (yet) in games so finding evidence of ray-tracing is a good indicator it was not rendered in-game. Detecting ray-tracing artifacts may become less indicatory as more games adopt next generation graphics, but I still expect cinematographic variety to be a good indicator for tier 3.

Tier 4: Experimental Tools and Mods

This is the most nebulous and rare category. Since I consider motion capture an experimental tool, we can look for its usage by running pose detection and examining the character animation. Multiple body parts moving in unison with natural acceleration and deceleration is a good sign that motion capture is being used. To detect mods, we have to go through the absurdly difficult task of training an object recognition bot on the labeled game asset data, like they do for self-driving cars, to find objects that look like they are part of the game but have a low confidence score. Given how rare tier 3 and 4 machinima is, it might be best to just defer to human judgement and have a reviewer manually label tier 4 machinimas.

Step 4: Plotting the Data

I wanted to draw a chart plotting the growth of machinima communities over time. As time advances, I expect to high tier machinima to emerge after the community hits a certain size. If I graphed the tiers of machinimas created by a community over time, I think the chart would look something like this:

Project Abandoned

As time progressed in my own life, other projects took precedence so instead of building this out, I left this as a write up. I set out to show that my proposed tiers can describe a trend of increasing production complexity in the machinima community. In doing this research, this thought exercise has prompted to think about analyzing videos programmatically. Many movie recommendation algorithms work on film metadata as a proxy for the films’ contents, but not the actual video data itself. I haven’t heard of any recommendation models using computational film analysis as I suspect metadata alone already provides good performance. Nonetheless, there’s many interesting questions about film trends that can be answered by using computer vision techniques to quantify and scale the film analysis. Though I don’t expect anyone to take up my classification system, I hope this post has given you some ideas about computational film analysis.

Machinima Tech Tiers

Note: It will take a little over an hour to watch all videos linked in this post.

Video games are a convenient place to film. There’s no need to rent trucks and expensive camera equipment. Just have your friends log in to your favourite game and meet at the virtual filming location. As long as you can screen capture your game, you can make a machinima.

What is machinima?

Machinima is the art form of making films using video games. The word itself is a portmanteau of machine and cinema. Traditional machinima is essentially an in-game performance filmed using a game’s real-time graphics engine. Machinimists puppeteer game characters and record footage that is then dubbed over in post-production. Over time, machinimists have expanded their toolboxes beyond simple in-game capture to include more elaborate methods of manipulating game assets. This post aims to outline some broad production paradigms, but let’s first provide some motivation.

An example of an Animal Crossing machinima

Why make machinima?

There are many ways to tell a story and film is an effortful way to do so. You can absolutely start shooting with just a modern smartphone, but I’d argue that machinima is an easier approach to film production for the following reasons:

Existing game assets can be selected instead of creating or renting props and sets
Virtual logistics are significantly easier than in real life allowing you to collaborate regardless of geography
Puppeteering is easier than acting so you can recruit fellow gamers into your production crew

Among many reasons, the most notable advantages machinima has are related to its low barriers to entry. Though it’s more accessible than traditional film, it doesn’t mean it’s easy to make. You’re still carrying out the full video production process with some unusual drawbacks.

What are the drawbacks?

There are some critical restrictions on machinima studios:

Machinima is not monetizable because it is derivative work without licensing frameworks
Viewership is limited to game’s player base as machinima seldom appeals to the mainstream
Expressiveness is constrained by the game’s capabilities (e.g. limited character animations)

The nature of machinima being fan labour means this medium will likely fail to gain traction beyond a games’ player base. More importantly, no machinimists can make money from their work because their use of games’ intellectual property is not protected under fair use. Most production teams stop because it’s not possible to financially sustain the project and justify production time. Thus, hobbyists will often use this art form as a stepping stone to greater endeavours. With a few exceptions, machinima has been and will always be a passion project.

As a personal passion, I’ve studied machinima and its methods for a long time. Over the years, I’ve seen many different approaches to get more expressiveness out of game assets. The main motivation for writing this post is to share my framework for organizing these approaches.

Production complexity tiers

I have devised a hierarchy to categorize machinima based on the how advanced their creation process is. I sort videos into their respective tiers based on what I perceive to be the technological complexity of their production pipeline. Each tier builds on the toolbox of the level below it and compounds in difficulty the more one advances. There are no hard boundaries as you can have productions situated between tiers. Higher tier productions are not necessarily better than lower tier ones. They look may more polished, but technical execution is only one aspect of a successful video. My organizational theory is gleaned from years of watching machinima and my own cringeworthy attempts of making them. It has been a useful tool for grouping works with their peers and comparing apples to apples.

Tier 0: Gameplay Footage

Gameplay footage is a precursor to machinima. Unscripted content like let’s plays and e-sports fall into this category. It is raw or lightly edited footage from the player camera that often has the game UI overlaid on top of it. It isn’t until we start using the game as a tool to serve a narrative that we start getting into the first tier.

Tier 1: In-Game Shoots

Traditional machinima is shot in the game environment. The production team puts together costumes and meets at the in-game film location. On set, a player camera captures character actions acted out by puppeteers. The captured footage is then edited in post-production with dubbing from voice actors. This is the most straightforward approach to making a machinima.

Tier 2: Asset Compositing

Machinimists can rip assets from the game and composite them into their video. Often, extracted assets are captured in an asset viewer and then chroma keyed on top of in-game shots. Some machinimists even use solid colour backgrounds found in the game environment to shoot their subjects. Asset viewers are particularly popular for games where no one is allowed to host their own game servers so that film crews don’t need to procure props through the in-game economy. This technique also allows for perspectives of models that are not normally accessible in the game. Beyond this tier, videos become significantly harder to make.

Tier 3: 3D Animation Suites

3D animation suites allow machinimists to tweak all the aspects of their shots. In addition to skeletal animation controls, these programs allow directors to adjust lighting and camera settings. Game assets are imported into a 3D animation tool or virtual studio environment and laboriously animated. The virtual studio environment might utilize the game engine for rendering, but a lot of the advantages of simplicity in traditional machinima are lost at this tier. At this point, machinima is more akin to animation than film.

Tier 4: Experimental Tools and Mods

This tier pushes the limits of machinima. Few have ventured into this territory as it requires a lot of research to carry out. Techniques like using match moving to make a live-action hybrid or motion capture for skeletal animation are rare tools in an animator’s toolbox. Anything involving a mod that alters the game rules or hacks modified assets back into the game would be part of this category because of the technical wizardry required. Each one of these will challenge what you thought was possible from a machinima.

Examples

To illustrate my point, here are some examples videos from each tier for popular games:

	World of Warcraft	Halo (all titles)	Team Fortress 2	Minecraft
1	The Ballad of the Noob	Red vs. Blue Season 1 Episode 1	Ignis Solus	Solitude
2	Moonglade Beat	Gun for Hire	When Im Hudduh	In Search of Diamonds
3	Despair	Red vs. Blue Revelation Episode 3	Survival of the Fittest	Silly Endertainment
4	Keytal’s War of the Ancients Scene: Sargeras’ Gift	Metronome Trailer	Practical Problems	Minecraft Acid Interstate V3

Other production methods

The game community makes videos using many creative production methods, but I would not consider them machinima unless they use a game’s visual assets. For example, traditional animation and live-action videos are not machinima even though they use a game’s likeness. They appeal to a similar audience, but their use of the game has more to do with leveraging the games’ conworld rather than using the game software as a storytelling tool. I will acknowledge that this definition does get blurry because some tier 3 machinimas are actually modelled in the likeness of the game with no actual assets being used. If the animation looks like the actual game, I will label it as machinima. Conversely, a machinima does not have to be set in the game’s story universe. All that matters is that the games’ assets are used in the production.

These are arbitrary boundaries I have drawn based on my observations of the machinima community. I had plans to plot a pyramid graph of the tiers to show how the higher tiers tapered off, but the task proved to be too large an undertaking. In a more technical follow-up post, I’m going to outline how I planned to gather these statistics in the hopes that someone else picks it up.

The future of machinima

Machinima will remain a niche art form. Its viewership declines as the player base thins out. It’s inspiring to see small pockets of machinimists still practicing their craft in old games, but they will never receive the same viewership they did at the peak of a game’s popularity. Many move on to pursue more financially sustainable endeavours. Even if licensing agreements were struck, I suspect machinima will have a hard time competing in the attention economy because it is no where near as cost effective as live-streamed game content. Part of the charm of machinima is that almost all of them are passion projects. Nobody makes machinima for profit – they do it to play with friends, sharpen their production skills, garner internet fame, or tell stories they’ve been itching to share.

The growing and waning of machinima communities is inevitable. The next generation of machinimists is making videos right now on the games de jour. I think my organizational theory would still extrapolate out to these new games, but this framework might break with the emergence of new tools for VR and games with cinematic quality graphics. That said, I hope this post gives you some idea of the different ways to tell stories through machinima. I also hope this has potentially sparked a curiosity to watch or make machinima. As long as there are games and the desire to tell stories, there will always be someone making machinima.