In my previous post, I outlined a classification system for machinima complexity. The idea of tiers arose from watching a lot of machinima and following production trends in the community. Alongside the existence of tiers, I also hypothesized that machinima production tools grew more complicated over time and that there existed significantly more simple machinimas than technologically sophisticated ones. To quantify this trend, I wanted to classify machinimas by their tiers and then plot how many existed as a stacked bar chart over their release years. This graph never came to fruition because gathering the data for this diagram would be quite challenging. I did learn a bit about computational film analysis in researching how to accomplish my project so I wanted to share my plans anyways. In this technical post, I’m going to explain how I planned to gather data on machinima tiers.
Step 1: Collecting Raw Video Data
Firstly, I need raw video to analyze. Machinima tiers are a label that I contrived so there is no video metadata that will indicate what tier it is. The only way to get a machinima’s tier is to analyze the video myself. Unfortunately, it’s hard to assemble a machinima collection nowadays. Back when I conceived this theory, there existed websites dedicated to hosting gaming content. It was easy to find and download machinima because it wasn’t mixed with non-gaming content. We all assume the internet is forever, but my journey into internet archeology has shown me that’s not guaranteed. Like early motion picture, I suspect many pioneering works are lost to the sands of time. Some common reasons I had difficultly finding works are as follows:
- The website that hosted the video is now defunct
- YouTube took down the work due to copyright infringement
- The video was unindexed or has a very low search relevancy due to its age
Fortunately, I do have some records of these works. Being an avid machinima fan, I kept a catalog of creators I’ve encountered. Some machinimists create more than just machinima so there may be other types of content mixed into their filmography. Thus, to assemble a clean data set, I would need to manually filter through their published works and select the machinimas. It’s tedious work, but there usually isn’t that much to shift through for every machinimist. Being a labour intensive hobby, most machinimists don’t produce that many videos over the course of their career. Using my shortlist heavily biases the data set towards popular English-speaking creators using mainstream games. It’s not representative of the entire community, but this all I have to work with. This bootstrapped data set could then be used to train a bot on the keywords and metadata to target in YouTube searches. Creating the bot, let alone validating it, is a lot of work, but let’s say we overcame this obstacle. How do we process this video data?
Step 2: Video Analysis Tools
We need to gather the tools for video analysis. We can leverage existing computer vision libraries to help us number-crunch pixels. Most of my analysis will strive to use content agnostic processes that can run without needing to do object recognition on game objects. For example, one primary operation I will rely on is frame differencing where we subtract the pixels of one frame from the neighbouring frame to show what is moving.

Original Footage 
Frame Differencing Applied
Building on top of this operation, we can divide videos into shots using shot detection algorithms. This will enable us to analyze footage in terms of its cinematic technique. We can use AI to classify types of cinematic shots, but we may need to train it ourselves if models trained on live action data do not generalize to animation. Investigating cinematic technique is important because I theorize that the technically advanced films use a larger variety of shot types than simple films. Assembling these tools will greatly help us scale up our machinima tier pipeline.
Step 3: Identifying Tier Attributes
Each tier has certain characteristics that can be programmatically identified. All of these features are untested and derived solely from my experience with machinima. If a video’s characteristics matches multiple tiers, I will classify it as the highest observed tier because high tier tools build on top of low tier ones.
Tier 0: Gameplay Footage
This is the easiest tier to identify because we can use frame differencing alone. Gameplay footage will have UI elements such as health bars, inventory, timers, etc., on screen for long durations of the video. We can look for outlines of these UI elements by examining the persistent pixels from frame differencing. This will also pick up other video features like watermarks, so we must take into consideration that the UI elements can be partially animated and scattered in various places on the screen. If these persistent pixels are found throughout the video, we can classify the video as gameplay footage.
Tier 1: In-Game Shoots
To tell if a video was filmed in-game, we’ll need to employ some more sophisticated tools to analyze player movements and camera movements. There is a limited palette of motions available to puppeteers. Most games have a set of emote animations the player can use. Beyond that, you are stuck with just the basic motions like running, jumping, and head bobbing. Trying to recognize emotes through humanoid pose estimation would take a lot of effort so I think we should focus on something less game specific. Since camera operators are also a player, we can expect to see camera movement and angles that are normally found in-game. For example, most games have player cameras that cannot do close-up shots. One must use an asset extractor to obtain such a shot. It’s unclear if shot detection algorithms would work on game footage, but if it did, we can expect to see a lot of medium to long shots. The lack of close-ups is a characteristic of tier 1.
The other aspect of camera work that might be noteworthy camera movement. I’ve noticed tier 1 videos tend to use a lot of static shots compared to other tiers because they’re easy to shoot. When the camera does move, however, there isn’t that much variability in speed – it is just the character’s running speed. Moreover, player cameras don’t have the smooth animation paths unlike digitally scripted ones do. If camera solving shows a consistent movement speed, sharp changes in velocity, or any jerkiness, then it’s likely we’re looking at a player controlled camera and can conclude the video is from an in-game recording. As we move into higher tiers, we gain access to more shot types.
Tier 2: Asset Compositing
Extracting and compositing assets into a video gives the director more camera angles and shots to work with. We could examine shots again, but it might be easier to just detect chroma key artifacts. I think the simplest way to define tier 2 is presence of tier 1 attributes with the use of chroma key. If it satisfies the tier 1 filter and uses chroma key, then it can be classified as a tier 2 machinima.
Tier 3: 3D Animation Suites
Directors have a lot of available tools at this tier. Expanding on the importance of camera work, we will set aside the character animations again and focus on the cinematography for clues. In tier 3, all shot types and camera angles are available because the camera can be placed anywhere. I expect to see more close ups and less eye level shots now that the camera is detached from the player. This also means more variation in camera motion. For example, directors can now do slow pans, arc shots, and change focus distance. Lastly, real-time ray-tracing is not common (yet) in games so finding evidence of ray-tracing is a good indicator it was not rendered in-game. Detecting ray-tracing artifacts may become less indicatory as more games adopt next generation graphics, but I still expect cinematographic variety to be a good indicator for tier 3.
Tier 4: Experimental Tools and Mods
This is the most nebulous and rare category. Since I consider motion capture an experimental tool, we can look for its usage by running pose detection and examining the character animation. Multiple body parts moving in unison with natural acceleration and deceleration is a good sign that motion capture is being used. To detect mods, we have to go through the absurdly difficult task of training an object recognition bot on the labeled game asset data, like they do for self-driving cars, to find objects that look like they are part of the game but have a low confidence score. Given how rare tier 3 and 4 machinima is, it might be best to just defer to human judgement and have a reviewer manually label tier 4 machinimas.
Step 4: Plotting the Data
I wanted to draw a chart plotting the growth of machinima communities over time. As time advances, I expect to high tier machinima to emerge after the community hits a certain size. If I graphed the tiers of machinimas created by a community over time, I think the chart would look something like this:

Project Abandoned
As time progressed in my own life, other projects took precedence so instead of building this out, I left this as a write up. I set out to show that my proposed tiers can describe a trend of increasing production complexity in the machinima community. In doing this research, this thought exercise has prompted to think about analyzing videos programmatically. Many movie recommendation algorithms work on film metadata as a proxy for the films’ contents, but not the actual video data itself. I haven’t heard of any recommendation models using computational film analysis as I suspect metadata alone already provides good performance. Nonetheless, there’s many interesting questions about film trends that can be answered by using computer vision techniques to quantify and scale the film analysis. Though I don’t expect anyone to take up my classification system, I hope this post has given you some ideas about computational film analysis.
