Everything is a remix” – Kirby Ferguson (and probably a bunch of other people, too.)
I’ve produced a lot of content for YouTube since graduating college in 2007. Driver Digital was the first company I worked for, where I produced a few thousand videos for moms and kids. When I worked at Frederator Networks as the VP of Audience Development, I oversaw all non-animation production and programming for our YouTube channels. At Little Monster, my YouTube production and consulting agency, we’ve produced hundreds of videos for clients.
And until about two years ago, when I began to develop The Taxonomy of Digital Video, I’d find myself in a room with an entire team of 20 or more people all asking, “What should we make?” It looked a lot like this:
My teams and I would sit around the conference table pitching ideas on different shows we could make and most of the time it was relatively fruitless. People would either pitch stuff that had been done to death, slight variations on what we were already doing, or stuff that might do okay on TV, but would never have a chance of success in digital. Occasionally a bolt of lightning would strike – a la 107 Facts on Frederator’s Leaderboard or Cartoon Conspiracy on Fredreator’s main channel – and we’d create a show that generated millions of views.
After enough of these meetings – and admittedly a few shows that did not generate millions of views – I realized I needed a framework with which to understand YouTube content. The problem my teams and I had wasn’t that our ideas were bad, it was that we didn’t have a box in which to develop content. There was no structure. No framework. We thoughtfully wandered into good ideas and that lead to the hit and miss nature of what we produced.
I looked on blogs and in bookstores for writings on digital video formats, structures, programming, anything(!), but I couldn’t find any substantive works and nothing at all about how to develop content using this knowledge. And of course I couldn’t find anything. Digital video is a relatively new medium still in its infancy – or at least early childhood.
So, I decided to do what I had done in the past with my writings here on Tubefilter — I’d just make the thing I wanted to read my own damn self. What I’ve developed is The Taxonomy of Digital Video.
The Taxonomy is a structure. It’s a way of understanding YouTube content that boils mysterious “X factors” down into easily perceivable, and repeatable, processes or Formats.
This will allow you to go to your creative teams, your companies, your businesses, your studios, etc. with an understanding and way of analyzing what content is currently doing well in your vertical, what’s missing from your vertical, and how the content you make can stand out, feel completely original, and generate millions of views.
Essentially, it’s a guide to developing unique content for YouTube.
Furthermore, I’ll show why the understanding of these core Formats is key to building a long term sustainable audience on YouTube.
[One quick disclaimer before we dive in: Every brand, creator, or show I mention in this presentation is mentioned because I am a big fan of what they’ve done. They are all incredibly talented and creative people and have succeeded for many reasons beyond what I mention here. I’m simply trying to show what lies beneath the surface and demystify a little of why their content is popular.]
Let’s start with the more basic stuff first. When thinking about classifying a YouTube channel, series, or video or creating your own, we typically start at the “Vertical” category – as in what is the general area of interest (automotive, beauty, etc.). That’s followed by Format, Style, Length, Personality, and then Topic.
With that in mind, I believe the classification model of a YouTube video might look like this:
Don’t get me wrong, this is not a Matter of Importance chart (e.g. Vertical is no more important than Style). It’s just a systematic way of classifying and developing content. Personally, I think the personalities and characters are the most important thing to any media brand, show, or video. But they’re in this Classification Model because different archetypes or personalities are better suited for and spans different types of content.
Most of these elements are pretty self-explanatory. However, the category of Formats is where many YouTube series and popular creators really distinguish themselves and make content that feels fresh and unique.
The 8 Formats
First, let’s establish how we determined these Formats. We strip away all of the stylistic elements of a video, and ask what the shared or primary structural characteristics of each video are. For example, the primary structural characteristic of a Listicle video is a list of things. Essentially we can classify videos In the same way that we classify plants and animals based on their shared primary characteristics.
These Formats are the Listicle, Explainer, Commentary, Interview, Music Video, Challenge, Reaction, and Narrative. These eight formats comprise the vast majority and potentially all of the popular formats on YouTube.
You may be thinking things along the lines of “vlogging / let’s plays / beauty tutorials aren’t a format?!” and you wouldn’t be wrong for thinking that.
Let’s think about an example though. Is Lily Singh a vlogger? Are people who do “trying things” videos also vlogging as they’re often talking directly to the camera and giving commentary on something just like a vlog? I think the answer to both of these questions is no. Vlogging is a combination of the commentary format and the direct to camera style. Trying videos are typically a challenge or reaction video– or a combination of both.
Let’s start with an easy one. Everyone should be familiar with the Listicle format. We’ve all read listicles whether it’s on Buzzfeed, Cracked, or any of the thousands of other sites that pump them out. This format is as old as time– ever read the Ten Commandments? That’s just a listicle.
The Listicle format is familiar to all of us and that’s one of the reasons why this format works so well on YouTube. The theory basically goes if an audience understands what they’re watching from a structure standpoint, they are more likely to enjoy and continue watching that content.
Essentially, if your content meets viewers expectations in format, they will be far more likely to be “sticky” and watch for extended periods of time.
For example, if a film bills itself as an action movie, you know that the format will basically be: We’ll start with seeing the hero in their everyday life, an inciting incident will set them out on their “hero journey”, they’ll have to overcome some adversities, and then they’ll take on some bad dudes and ultimately win or not.
If a film bills itself as an action movie and instead you get a romantic drama you would likely walk out of the theater pretty quickly.
Similarly, if an audience clicks on a YouTube video expecting a Listicle, and it’s a basic makeup tutorial, they’re probably going to click away pretty quickly. Some great examples of Listicle videos can be found on the channels WatchMojo, Dark 5, and Matthew Santoro.
However, some content on YouTube disguises the Listicle component. For example, what if I told you Cinema Sins, with its 8.2 million subscribers and over 2.6 billion video views is just a Listicle? Here take a look:
They are literally just listing the “sins” of the movie, albeit quite humorously.
Expanding upon each format, these are their definitions and common components:
Listicle Video: A video that lists or ranks items.
Top ### Video
Things you don’t know
Many compilation videos
Ranking and providing commentary as to why
Usually only 1 – 2 minutes per list item
Reading off of wikipedia
Playing with or against the audience’s expectations / knowledge
Direct to camera with over the shoulder images / videos
Cutaways to video
V.O. on top of images / videos
Primary Format Example:
Music Video: A video where a song or music plays, and it’s also the primary purpose of the video.
Official music video
Telling a story
Over-the-top costumes / situations
Primary Format Example:
Narrative: A video that depicts fiction or fictionalized events.
Clips from film / tv
Parodies / Sketch comedy
Dress Up Play
Characters / Props / Sets
Primary Format Example:
Interview: a video where questions are asked of a subject or interviewee.
1 on 1 interview
Answering pre-written questions
Q&A with fans
Interviewer / Interviewee
Away from camera
Primary Format Example:
Explainer: A video that explains or teaches a topic, or in some instances answers a question.
The video poses a question to the audience it then answers
Simplifying complex ideas
Direct to camera
Direct to camera with over the shoulder images/videos
Cutaways to video
V.O. on top of images / videos
Overhead of hands
Primary Format Example:
Challenge Video*: A video where one or more subjects are challenged to perform a task in some way, be it a physical, mental or a competition between two or more people.
Try not to laugh
What’s in the box
Eating “gross” things
I tried XYZ
Victory / Loss conditions
Things normal people don’t do
Things a category of person doesn’t normally do
Direct to camera
Away from camera / to another person
Primary Format Example:
Reaction*: A video where the primary purpose is to show reactions to an event.
What’s in the box
Multiple things being reacted to
Shock / Gross-out factor
Direct to camera
Primary Format Example:
*Most Challenge and Reaction videos these days are hybrids taking elements from both formats.
Commentary: A video that comments or provides opinion on a topic.
Analysis/Commentary of tv / film / sports / books / etc.
Sitting in a room in a house
Direct to camera
V.O. on top of images / video
Primary Format Example:
If we only follow this model, we’ll just be making the same thing that thousands (millions?) of other people have already made.
To make something that at least feels fresh and unique, we have to create Hybrid Formats.
This is what some of the biggest channels on YouTube have done. They’ve created shows or hybrid formats that feel unique, new, or original and audiences have rewarded them for it. You can apply this same exact formula, the mixing and matching of format elements, on your channels and at your companies.
These are some of the best examples of Hybrid Formats, from extremely popular YouTube channels:
Commentary / Narrative Hybrid
A great example of someone creating a Hybrid Format is Lilly Singh or Superwoman. Lilly has one of the largest followings on YouTube and every video she posts does millions upon millions of views. There’s no doubt that she’s incredibly talented and funny. But I’d argue that her true genius, or at least the spark that set her career ablaze is in the unique Format concept she developed (or at least she was the first to really succeed with it).
From a strictly Format perspective, all she’s done is take the Commentary format in its primary style (direct to camera) and added parts of the Narrative format, specifically sketch comedy, through the characters she portrays and the sketches.
So, when we boil it down to its base elements we see that while this feels incredibly unique upon a quick view, it’s really just two incredibly popular Formats weaved together.
Listicle / Explainer Hybrid
Here’s an example of the blending of an Explainer video and a Listicle from 5 Minute Crafts, one of the most viewed channels in 2018.
Challenge / Reaction Hybrid
Rhett & Link make a lot of content in a lot of different Formats. One of their most popular Formats is a Challenge and Reaction hybrid.
Music Video / Challenge / Reaction Hybrid
One area where we’ve seen little innovation is in music videos. If you go back a few years there was a group called CDZA, which did some really amazing work. In this video they combine the Music Video with a Challenge and Reaction video.
Narrative / Multiple-Formats Hybrid
One of my favorite examples right now is Miranda Sings. Miranda also plays with a lot of different Formats such as the Explainer format, the Challenge format and so on, but always mixing in an element of the Narrative format in the form of sketch comedy through her character.
The examples above are great at showing how various creators and media brands have taken standard base Formats, and added elements from other formats to make a Hybrid Format. These hybrid formats have helped them stand out significantly on YouTube. It makes their content feel fresh and unique, and can help drive more audience.
Question: Why Hybrids Matter?
Beyond not wanting to make content that feels stale before its even uploaded, Hybrid Formats can drive huge and sustainable audiences. Let’s take one of the worst performing base Formats in YouTube history, the Interview, as an example.
Many people have tried to make an Interview show on YouTube successful, some pouring millions of dollars into it, some featuring huge celebrities, and some not. But regardless of the various components and budgets, 99 times out of 100 they’ve failed.
However, if we look at what Complex did with Hot Ones and what Condé Nast has done across multiple channels and multiple shows, it’s truly phenomenal. They’ve managed to take one of the oldest and worst performing formats on YouTube – the Interview – and make incredibly successful shows.
First, let’s talk about Hot Ones. Hot Ones is a show on the First We Feast channel. The basic concept is a standard Interview: the interviewer asks questions of the interviewee. However, the brilliance and success of this show lies in the slight adjustment they made to the Format. They added elements of the Challenge format.
Essentially, during each interview, the guests eat hotter and hotter hot wings, until they get to the hottest one. It’s essentially just a Challenge video.
So two Formats – Interview and Challenge – married together make this successful show.
Let’s put a pin in that for now and come back to it in one second.
Next, let’s look at 73 Qs. 73 Questions marries the Listicle with the Interview. This may have been enough to make this show incredibly successful, but they went three steps beyond.
First, they change the style of the standard interview from two people talking to each other with a static camera or multiple cameras, and instead have the subject speak directly into the camera – which we know is the most successful style on YouTube.
Second, in each video they go on a house or office tour, where the viewer gets to see where the subject lives or works. This is essentially the Format of a classic Commentary video: the “room / dorm / house tour.” Here’s what it looks like when it’s all put together:
Answer: The Algorithm and View Velocity
Other than the fact that these shows married two Formats to make something stale feel very new, what do these shows have in common?
Both of these shows give the AUDIENCE a reason to watch that has absolutely nothing to do with the TOPIC. This is incredibly important for long term sustainable growth because it massively contributes to View Velocity.
View Velocity – the rate at which a NEW UPLOAD gains viewership – is incredibly important for how many views that video will ultimately get, as illustrated in my previous research “Reverse Engineering the YouTube Algorithm,” and “Cracking YouTube.” View velocity is essentially a product of how many impressions your Title and Thumbnail get, the Click Through Rate on those impressions, and how quickly that happens. The greater your View Velocity and the greater chance you have of YouTube’s algorithm putting your video in front of a broader audience (by way of appearing in YouTube’s Suggested and Recommended Videos sections, search results, and more).
So, if View Velocity is an essential component for the success of my channel, the next rational question is, “How do I get the most View Velocity?” Well, the real questions you’re asking are, “What makes someone make the choice to click and watch a video? What are the reasons?”
Well I think our Taxonomy explains the possible reasons. They either:
Like the Format and/or Style
Are interested in the topic
Like the talent
or a combination of the above.
(I’m using “like” here to mean the viewer gets their desired emotion from watching, be it happiness, sadness, anger, etc. This doesn’t mean the user clicks a heart on the website.)
So let’s go back to our Hybrid Interview Formats. Again, both of these shows give the AUDIENCE a reason to watch that has absolutely nothing to do with the TOPIC.
Not interested in Bill Burr? Well you can still enjoy the video to see what happens when he eats that super spicy hot wing. Couldn’t care less about Kendall Jenner? Well you can still enjoy seeing how a multi-millionaire lives.
The effect of having fans of your show or channel as a whole is incredibly powerful for View Velocity. You’re no longer topic- or talent-dependent for views. You have real fans that will watch every episode, not just the episodes/videos they’re interested in at a topical level.
For example, if you had a channel that talked about a large number of different topics, and there wasn’t an underlying talent or format reason for the audience to watch, you will have a segmented audience (like the one on the left in the image below). In this scenario, your channel will have significant trouble growing and may eventually enter a YouTube death spiral because it will not generate enough View Velocity on any individual video to let YouTube’s algorithm know the video should be shown to a wide audience.
Conversely, if you have fans of a Format (or talent), they’ll watch just about anything you upload, (like the circle on the right, which generates far more View Velocity).
In closing, this the best piece of Algorithm or audience development advice I can give you: Make a hybrid-format, in a style endemic to the platform, with good talent. If you do that, you’ll be way ahead of the vast majority of YouTube channels.
[This research was initially published on Tubefilter.com in June 2017}
YouTube’s promotional algorithms have changed drastically over the years. In the beginning, the algorithm was largely reliant on metrics that were easily manipulated, like views, clicks, likes, and comments. And the goal was primarily to drive more views.
In 2013 that changed in a big way. YouTube shifted the primary goal of its algorithm to reward “Watch Time”, or time spent on the YouTube platform.
In a previous study here on Tubefilter, we discussed what metrics YouTube considered on the publisher side to calculate Watch Time. We then wrote a follow up article last year in June, “Reverse Engineering The YouTube Algorithm (Part I),” which shared a number of insights we found in determining the metrics that drive Watch Time. Then, in the fall of 2016, Google released a white paper, “Deep Neural Networks for YouTube Recommendations,” which lead to Reverse Engineering The YouTube Algorithm (Part II), further shedding light on what powers the YouTube algorithm.
Now, we’re thrilled to share with you our latest research into the YouTube promotional algorithms in our presentation, “Cracking YouTube in 2017.”
Some of the big highlights from the presentation include:
Videos that are between 7 and 16 minutes perform up to 50% better than videos that are shorter or longer.
Videos with an average view duration of 5 – 8 minutes receive the most views.
There is no correlation between views and length of title, number of tags, or length of description.
There is a strong correlation between number of tags and number of creator suggested videos in that creator’s suggested video column.
But that’s not all! You can check out the entire presentation, replete with interesting data points and tangible takeaways, right here:
[Editor’s Note: You can read Reverse Engineering the YouTube Algorithm: Part I right here. You don’t need to read it before reading Part II, but you should check it out at some point. It’s excellent.]
[This paper was original published on Tubfilter.com in February of 2017]
A team of Google researchers presented a paper in Boston, Massachusetts on September 18, 2016 titled Deep Neural Networks for YouTube Recommendations at the 10th annual Association for Computing Machinery conference on Recommender Systems (or, as the cool kids would call it, the ACM’s RecSys ‘16).
This paper was written by Paul Covington (currently a Senior Software Engineer at Google), Jay Adams (currently a Software Engineer at Google), and Embre Sargin (currently a Senior Software Engineer at Google) to show other engineers how YouTube uses Deep Neural Networks for Machine Learning. It gets into some pretty technical, high-level stuff, but what this paper ultimately illustrates is how the entire YouTube recommendation algorithm works(!!!). It gives a careful and prudent reader insight into how YouTube’s Browse, Suggested Videos, and Recommended Videos features actually function.
An Engineering Paper On The YouTube Algorithm For Dummies
While it was not necessarily the intent of the authors, it is our belief the Deep Neural paper can be read and interpreted by and for YouTube video publishers. The below is how we (and when I say we, I mean me and my team at my shiny new company Little Monster Media Co.) interpret this paper as a video publisher.
In a previous post I co-wrote here on Tubefilter, Reverse Engineering The YouTube Algorithm, we focused on the primary driver of the algorithm, Watch Time. We looked at the data from our videos on our channel to try to gain insight into how the YouTube algorithm worked. One of the limiting factors to this approach, however, is that it’s coming from a video publisher’s point of view. In an attempt to gain some insight into the YouTube algorithm we asked ourselves and then answered the question, “Why are our videos successful?” We were doing our best with the information we had, but our initial premise wasn’t ideal. And while I stand by our findings 100%, the problem with our previous approach is primarily twofold:
Looking at an individual set of channel metrics means there’s a massive blind spot in our data, as we don’t have access to competitive metrics, session metrics, and clickthrough rates.
The YouTube algorithm gives very little weight to video publisher-based metrics. It’s far more concerned with audience and individual-video-based metrics. Or, in laymen’s terms, the algorithm doesn’t really care about the videos you’re posting, but it cares a LOT about the videos you (and everyone else) are watching.
But at the time we wrote our original paper, there had been nothing released from YouTube or Google in years that would shed any light onto the algorithm in a meaningful way. Again, we did what we could with what we had. Fortunately for us though, the paper recently released by Google gives us a glimpse into exactly how the algorithm works and some of its most important metrics. Hopefully this begins to allow us to answer the more poignant question, “Why are videossuccessful?”
Staring Into The Deep Learning Abyss
The big takeaway from the paper’s introduction is that YouTube is using Deep Learning to power its algorithm. This isn’t exactly news, but it’s a confirmation of what many have believed for some time. The authors make the reveal in their intro:
In this paper we will focus on the immense impact deep learning has recently had on the YouTube video recommendations system….In conjugation with other product areas across Google, YouTube has undergone a fundamental paradigm shift towards using deep learning as a general-purpose solution for nearly all learning problems.
What this means is that with an increasing likelihood there’s going to be no humans actually making algorithmic tweaks, measuring those tweaks, and then implementing those tweaks across the world’s largest video sharing site. The algorithm is ingesting data in real time, ranking videos, and then providing recommendations based on those rankings. So, when YouTube claims they can’t really say why the algorithm does what it does, they probably mean that very literally.
The Two Neural Networks
The paper begins by laying out the basic structure of the algorithm. This is the author’s first illustration:
Essentially there are two large filters, with varying inputs. The authors write:
The system is comprised of two neural networks: one for candidate generation and one for ranking.
These two filters and their inputs essentially decide every video a viewer sees in YouTube’s Suggested Videos, Recommend Videos, and Browse features.
The first filter is Candidate Generation. The paper states this is determined by “the user’s YouTube activity history,” which can be read as the user’s Watch History and Watch Time. Candidate Generation is also determined by what other similar viewers have watched, which the authors refer to as Collaborative Filtering. This algorithm decides who’s a similar viewer through “coarse features such as IDs of video watches, search query tokens, and demographics”.
To boil this down, in order for a video to be one of the “hundreds” of videos that makes it through first filter of Candidate Generation, that video must be relevant to the user’s Watch History and it must also be a video that similar viewers have watched.
The second filter is the Ranking filter. The paper goes into a lot of depth around the Ranking Filter and cites a few meaningful factors of which it’s composed. The Ranking filter, the authors write, ranks videos by:
…assigning a score to each video according to a desired objective function using a rich set of features describing the video and user. The highest scoring videos are presented to the user, ranked by their score.
Since Watch Time is the top objective of YouTube for viewers, we have to assume it’s the “desired objective function” referenced. Therefore, the score is based on how well a video, given the various user inputs, is going to be at generating Watch Time. But, unfortunately, it’s not quite that simple. The authors reveal there’s a lot more that goes into the algorithm’s calculus.
We typically use hundreds of features in our ranking models.
How the algorithm ranks videos is where the math gets really complex. The paper also isn’t explicit about the hundreds of factors considered in the ranking models, nor how those factors are weighted. It does cite the three elements mentioned in the Candidate Generation filter, however, (which are Watch History, Search History, and Demographic Inforomation) and several others including “freshness”:
Many hours worth of videos are uploaded each second to YouTube. Recommending this recently uploaded (“fresh”) content is extremely important for YouTube as a product. We consistently observe that users prefer fresh content, though not at the expense of relevance.
One interesting wrinkle the paper notes is that the algorithm isn’t necessarily influenced by the very last thing you watched (unless you have a very limited history). The authors write:
We “rollback” a user’s history by choosing a random watch and only input actions the user took before the held-out label watch.
In a later section of the paper they discuss clickthrough rates (aka CTR) on video impressions (aka Video Thumbnails and Video Titles). It states:
For example, a user may watch a given video with high probability generally but is unlikely to click on the specific homepage impression due to the choice of thumbnail image….Our final ranking objective is constantly being tuned based on live A/B testing results but is generally a simple function of expected watch time per impression.
It’s not a surprise clickthrough rates are called out here. In order to generate Watch Time a video has to get someone to watch it in the first place, and the most surefire way to do that is with a great thumbnail and a great title. This gives credence to many creator’s claims that clickthrough rate are extremely important to a video’s ranking within the algorithm.
YouTube knows that CTR can be exploited so they provide a counterbalance. This paper acknowledges this when it states the following:
Ranking by click-through rate often promotes deceptive videos that the user does not complete (“clickbait”) whereas watch time better captures engagement [13, 25].
While this might seem encouraging, the authors go on to write:
If a user was recently recommended a video but did not watch it then the model will naturally demote this impression on the next page load.
These statements support the idea that if viewers are not clicking a certain video, the algorithm will stop serving that video to similar viewers. There is evidence in this paper that this happens at the channel as well. It states (with my added emphasis):
We observe that the most important signals are those that describe a user’s previous interaction with the item itself and other similar items… As an example, consider the user’s past history with the channel that uploaded the video being scored – how many videos has the user watched from this channel? When was the last time the user watched a video on this topic? These continuous features describing past user actions on related items are particularly powerful…
In addition, the paper notes all YouTube watch sessions are considered when training the algorithm, including those that are not part of the algorithm’s recommendations:
Training examples are generated from all YouTube watches (even those embedded on other sites) rather than just watches on the recommendations we produce. Otherwise, it would be very difficult for new content to surface and the recommender would be overly biased towards exploitation. If users are discovering videos through means other than our recommendations, we want to be able to quickly propagate this discovery to others via collaborative filtering.
Ultimately though, it all comes back to Watch Time for the algorithm. As we saw at the beginning of the paper when it stated the algorithm is designed to meet a “desired objective function,” the authors conclude with “Our Goal is to predict expected watch time,” and “Our final ranking objective is constantly being tuned based on live A/B testing results but is generally a simple function of expected watch time per impression.”
This confirms, once again, that Watch Time is what all of the factors that go into the algorithm are designed to create and prolong. The algorithm is weighted to encourage the greatest amount of time on site and longer watch sessions.
That’s a lot to take in. Let’s quickly review.
YouTube uses three primary viewer factors to choose which videos to promote. These inputs are Watch History, Search History, and Demographic Information.
There are two filters a video must get through in order to be promoted by way of YouTube’s Browse, Suggested Videos, and Recommended Videos features:
Candidate Generation Filter
The Ranking Filter uses the viewer inputs, as well as other factors such as “Freshness” and Clickthrough Rates.
The promotional algorithm is designed to continually increase watch time on site by continually A/B testing videos and then feeding that data back into the neural networks, so that YouTube can promote videos that lead to longer viewing sessions.
Still Confused? Here’s An Example.
To help explain how this works, let’s look at an example of the system in action.
Josh really likes YouTube. He has a YouTube account and everything! He’s already logged into YouTube when he visits the site one day. And when he does, YouTube assigns three “tokens” to Josh’s YouTube browsing sessions. These three tokens are given to Josh behind the scenes. He doesn’t even know about them! They’re his Watch History, Search History, and Demographic Information.
Now is where the Candidate Generation filter comes into play. YouTube takes the value of those “tokens” and combines it with the Watch History of viewers who like to watch the same kind of stuff Josh likes to watch. What’s left over is hundreds of videos that Josh might be interested in viewing, filtered out from the millions and millions of videos on YouTube.
Next, these hundreds of videos are ranked based on their relevancy to Josh. The algorithm asks and answers the following questions in fractions of a second: How likely is it that Josh will watch the video? How likely is it the video will lead to Josh spending a lot of time on YouTube? How fresh is the video? How has Josh recently interacted with YouTube? Plus hundreds of other questions!
The top ranked videos are then served to to Josh in YouTube’s Browse, Suggested Videos, and Recommended Videos features. And Josh’s decision on what to watch (and what not watch) is sent back into the Neural Network so the algorithm can use that data for future viewers. Videos that get clicked, and keep the user watching for long periods of time, continue to be served. Those that don’t get clicks may not make it through the Candidate Generation filter the next time Josh (or a viewer like Josh) visits the site.
Deep Neural Networks for YouTube Recommendations is a fascinating read. It’s the first real glimpse into the algorithm, directly from source(!!!), that we’ve seen in a very long time. I hope we continue to see more papers like it so publishers can make better choices about what content they create for the platform. And that’s ultimately why I write these blogs in the first place. Making content suited for the platform means creators will generate more views, and therefore more revenue, which ultimately means we can make more and better programming and provide more entertainment for the billions of viewers who rack up significant Watch Time on YouTube each and every month.
By Matt Gielen
[Editor’s Note: You can read Reverse Engineering the YouTube Algorithm: Part II right here. You don’t need to read it after reading Part I, but you should check it out at some point. It’s excellent.]
[Originally published on Tubefilter.com in June 2016]
If you’re a creator who makes content for any kind of distribution (whether it be a feature film, a theatrical play, a TV program, or some kind of online video) the success or failure of that content can be dependent upon the mechanics of the distribution mechanism. For example, if you’re making a TV show and you want that show to be successful, you ideally want to know when to put in ad breaks, how to promote the program, which channel your show will appear on, how many homes the channel reaches, and so on, and so forth.
If you’re distributing videos onto YouTube, however, the most valuable knowledge you can have about that distribution point is how the YouTube algorithm works. But, like everything algorithm-related, that’s hard to do.
YouTube doesn’t make the variables that factor into its algorithm public. So, to figure out how it works, we must peer into a very big and very dark black box with very limited data. There are also factors at play that we have absolutely no data for whatsoever. These data points (such as thumbnail and title impressions, user viewing history and behavior, session metrics, etc.) would shed a lot of light on the algorithm. But, alas. They don’t exist.
Despite these limitations, we still have an obligation to try and figure out as much as we can with the data available to us. This is why my former colleague (FYI, I recently left Frederator to explore other opportunities), Jeremy Rosen, and I spent six months examining data from Frederator’s owned and operated channels to learn as much as we could about the YouTube algorithm.
One quick note before we get started. Throughout this post we will refer to the multiple YouTube promotional algorithms (Recommended, Suggested, Related, Search, MetaScore, etc.) simply as “the YouTube algorithm.” There are many differences between them, but generally they share the same principle. They’re all optimized for “Watch Time“.
First things first. “Watch Time” DOES NOT mean minutes watched. As we discussed before, Watch Time is a combination of the following:
Essentially, each of these items relate to how well and how often your channel and its videos get people to start a Viewing Sessions and stay on the platform for an extended period of time.
In order to accrue any sort of value in the algorithm, your channel and videos first need to get views. And for a video to be “successful” (success being defined by achieving viewership equal to or greater than 50% of the subscriber base in the first 30 days) you need to get a lot of views in the first minutes, hours, and days of a video’s release. We refer to this as View Velocity.
Views and View Velocity
When analyzing Frederator’s view velocity, we found that the average life-to-date viewership of a video increased exponentially as the percent of subscribers who watched in the first 48 hours increased:
Average viewership for videos that received this percentage of subscriber views in the first 48 hours.
As a result of seeing this, we dug a bit deeper and found with a near 92% accuracy we could predict whether a video would perform well for us based on its View Velocity. Essentially, there was a direct correlation between the percentage of subscribers who viewed in the first 72 hours and a video’s life to date viewership.
Trendline of viewership as it relates to the percentage of subscribers who watched in the first 72 hours.
These graphs and correlations show that Views and View Veolicty have direct and significant impacts on the overall success of a video and a channel. In addition, we found evidence that suggests the reverse is true as well. Poor View Velocity has a negative impact on that video, the following videos, and previous videos.
This graph shows that if Frederator’s previous uploads had poor View Velocity (defined as less than 5% of subscribers) in the first 48 hours, our next uploads would be impacted negatively as well:
The percentage of subscribers who watched the next video versus the average percentage of subsribers who watched the 2 previous videos.
This data supports Matthew Patrick‘s theory outlined in this video, which suggests that if one of your videos is not clicked on by a large amount of subscribers, YouTube will not serve your next upload to a significant portion of your subscriber base.
It is possible that since the previous upload did poorly there will be less viewership on the channel, which will lead to less viewers passing through organically. But the results are the same regardless as to the “why”.
Another significant impact from negative View Velocity on a new upload is that there’s evidence to suggest that it also harms the viewership on your library of videos. Below you will see the first graph shows an average seven-day rolling % of subscribers who viewed in the first 48 hours (blue line) versus overall channel viewership. The second graph shows overall percentage of subscribers who watched a video that day versus overall channel viewership.
The 7 day rolling average of subscriber views versus total viewership for Channel Frederator.
Essentially what these graphs show is that as the percentage of your subscriber base that view new uploads and/or your library videos goes down, so does overall channel viewership. To us, what this says is that through the algorithm, YouTube actively promotes channels that appeal to that channel’s core audience, while actively punishing channels that do not.
The next biggest metric we found to have a significant impact on the algorithm is View Duration.
View Duration speaks to how long a viewer spends watching an individual video. This metric carries a lot of weight and our data suggest that there’s an obvious tipping point. On Channel Frederator this year, videos with an average View Duration of over eight minutes brought in an average of over 350% more views in the first 30 days than those under five minutes. The following graph shows the average life-to-date views on an for Channel Frederator’s videos versus the average view duration of those videos.
Average Life Time Views versus aggregated Average Life View Duration. *Note on this graph. We have limited data points on videos with view durations greater than eight minutes.
We also found that videos that were longer in duration performed better, too. This graph shows the average first seven-day views for videos less than five minutes (1), five minutes to 10 minutes (5) and 10 minutes or greater (10):
7 Day Average Views versus aggregated Average View Duration.
This graph shows the same but with life to date views instead.
Life To Date Average Views versus aggregated Average View Duration.
Adding to these findings, we have anecdotal evidence to suggest that simply making videos longer will improve viewership performance. A channel that Frederator works with in the kids space was uploading three to four videos per week of varying lengths (three minutes, 10 minutes, 30 minutes and 70 minutes). We noticed that the 70-minute videos were receiving far more viewership in the first two days than the other videos, despite being mainly repurposed library videos. On top of this, the 70-minute videos had the same average view duration as any other video of any length on this channel.
We recommended that they reduce their uploads to just the 70 minute video each week. Since implementing this new strategy the channel’s daily average viewership has increased by 500,000 views, while uploading 75% fewer videos over the last 6 weeks. Crazy, I know.
Session Starts, Session Duration, and Session Ends
A great deal of this research was based on the research done for my previous post, WTF Is Watch Time?!.
For a quick recap, Session Starts is essentially how many people start their YouTube viewership session with one of your videos. This speaks volumes as to why the first 72 hours of viewership from your subscribers is so important. Subscribers are the people most likely to watch your video on its first days of being live. They are also the most likely to click on one of your thumbnails as they are familiar with your brand.
Session Duration is how long your content keeps people on the platform as they are watching your video, as well as after they’ve watched your video. There’s little to no hard data here other than Average View Duration and Unique Views, which is a shoddy metric at best.
Session Ends relates to how often someone terminates a YouTube session while or after watching one of your videos. This is a negative metric to the algorithm and a metric where there is literally no data available to us.
An Algorithm Theory:
YouTube’s algorithm is designed to PROMOTE CHANNELS, NOT INDIVIDUAL VIDEOS. However, it uses VIDEOS to promote INDIVIDUAL CHANNELS.
The algorithm uses a combination of video specific data and channel aggregate data to determine which videos to promote. However, the end goal is to build that CHANNEL’S audience.
YouTube does this because they want to promote channels that:
Make people come back to the platform often.
Keep them on the platform for an extended period of time.
Here are three graphs that give evidence to this theory.
The first graph is the 48-hour subscriber views % vs. the seven-day viewership for individual videos. It shows us that if you start a lot of sessions your video is going to get a lot of views. If you reach a threshold, it becomes exponential:
Average 7 Day Views of videos that reached a certain percentage of subscribers in the first 48 hours.
The second graph show the average daily views vs. rolling five-day % of subs viewership for the channel.
This means that if you CONSISTENTLY get a large number of subscribers to start sessions (five-day rolling average) the algorithm increases the daily views it sends to the channel’s entire video library.
The final graph is the average daily views as a percentage of subscribers vs. rolling five-day percentage of subs viewership for the channel.
Daily viewership as a percentage of Channel Frederator’s total subscriber base versus the five day rolling subscriber viewership percentage.
We believe this shows there is a correlation between a channel’s consistency and exactly how many views, as a percentage of your subscribers, YouTube will driver to your videos.
So, let’s say you’re a gaming channel with 100,000 subs and you upload 1x daily and get 5% of your subs to watch each video. Your rolling average would be a consistent but modest 5%. This means you would be generating roughly 30% of your subscriber count in views on a given day, (or 30,000/day or 600,000/month). Now let’s say you have 1mm subs. Those numbers would look more like 300,000 daily views and 6,000,000 monthly views.
We think that math checks out pretty well. And essentially this means that YouTube is selecting channels to promote based on certain performance metrics and then driving exactly as many views as its algorithms determine to promote that channel.
But that’s just a theory!
An Algorithm Score
Here we have taken a crack at recreating these algorithms. Using 15 signals and our best estimate of their weights we’ve created an Algorithm Score. Here are the factors we used to figure it out:
And here are the graphs putting our factors into action.
Trendline of correlation between the 3 day rolling average Algorithm Score versus Views.
Trendline of correlation between the Algorithm Score versus Views.
We’ve gotten it pretty close here:
The 3 Day Rolling Average Algorithm Score versus Daily Views
If you’re curious this is our (very) rough view of how the algorithm is weighted:
Algorithm Weighting Factors
Weighting for Watch Time metrics.
Algorithm Weighting for non-Watch Time Metrics
However, without more data, we can’t be sure what type of regression to use in the correlation and are only able to say we have strong correlations for most signals. That and we’re still just YouTube Algorithm enthusiasts.
The Ramifications of YouTube’s (Current) Algorithms
The data we found suggests 6 main takeaways:
YouTube algorithmically determines exactly how many views each video and channel will get.
Successful channels focus on one very specific content type/idea.
Channels should rarely experiment once they’ve established a single successful content type.
High dollar content producers will never be successful on the YouTube platform and therefore never fully embrace it.
Personality driven shows/channels will always be the dominant content type on the platform because they are the “very specific content type” people are watching for.
New channels that have no access to their own audience off the YouTube will struggle for a long time to grow.
In conclusion, it is our view that the algorithm is designed to promote channels that are capable of uploading videos that get and keep a large swath of their niche audience watching. If you want to be successful on YouTube the best advice we can give you is to focus on one very specific niche interest and make as many 10-minute or longer videos as you can about that singular topic.
On a personal note I’d like to mention that YouTube catches a lot of flack for its algorithms and I hope they don’t interpret this post as a negative look at the algorithm. Throughout this research process, I have gained an even deeper appreciation for YouTube and the engineers who oversee and design the algorithms. They are, after all, trying to entertain a billion people a month across the entire world, with vast and varied interests. When you take a step back and look at it as a whole, it’s an astounding thing of beauty designed unbelievably well to achieve YouTube’s business goals and prevent people from abusing the system. My hat is off to them.