Distributors of content tend to be evaluated on the basis of their content. But focusing on Netflix's "quality" is a mistake. Not only does the notion barely exist, it mistakes Netflix's "job to be done."

The first lesson you learn when you start working in Hollywood is that everyone you know has an opinion to offer on the content you do (or choose not to) make. It’s unlikely that Netflix’s development executives offer advice on the company’s encoding technologies, or that HBO’s casting directors have suggestions about AT&T’s 5G rollout, or that Apple’s line producers offer derivative trading strategies to their banker friends. But the reverse is almost always true. Few people know about encoding, 5G or derivatives – but everyone has opinions on TV and film; it’s the most intensely and widely consumed medium on the planet. And given the opportunity, few can resist weighing in.

This universal love for visual media is often inspiring for those who work to produce and release it, but this same affection has also led to a strange obsession with “quality” in OTT video competition. Historically, the idea of the average quality of a video service was always abstracted and thus largely ignored. When consumers bought a Pay-TV subscription, they knew they were getting access to 100+ brands and thousands of programs. Accordingly, households never sat down and asked themselves “what’s the average quality of this subscription?” This didn’t mean these homes liked “paying for” all content they didn’t watch (the average household rarely watched more than 15 networks), but they didn’t judge the value of Pay-TV service based on the average quality of everything it had to offer.

This isn’t true for SVOD services. The press, Wall Street and consumers can’t help from talking about the “average quality” of Netflix, or Hulu, or HBO, and what that means competitively for these services. This fixation is understandable, but it’s also foolish because “Quality” doesn’t matter. That’s not exactly true – it does, in a sense, matter – but just not really in a way that’s objectively and externally observable, significant or intuitive.

To this end, there are three core issues affecting the idea of “quality” in the OTT era. The first concerns its role, the second focuses on how it’s assessed, and the third is its implication on strategy.


What do we want? Quality! When do we want it? Sometimes!

The most important place to start when considering quality is TV’s “job to be done”. The average American watches nearly five and a half hours of TV per day – or nearly one-third of their waking life. When this fact is reported, few believe it could be possible. But it’s possible for the very reason TV’s “job to be done” is misunderstood. Less than half of daily TV time is spent on the living room couch in front of the TV, let alone alongside loved ones. It’s instead spent while cooking dinner, running on a treadmill, tidying up, playing bridge or attending to a crying infant. Here, TV is background entertainment; it’s mostly just helping to pass the time. And even when “dedicated” TV time occurs, its purpose is often to help an audience member relax and unwind after a long day or week. They didn’t turn on the TV to be challenged by it, but to be indulged. Put another way, most TV consumption is actually not “lean back”, it’s “lean out”.

Accordingly, there’s a large disconnect between the press’ focus on creative achievement and audience preferences. The vast majority of consumers don’t want The Crown or Mad Men or Game of Thrones most of the time. To point, close to two-thirds of all primetime TV viewership is unscripted content like Duck Dynasty and The Masked Singer. Not only are these titles not Emmy winners, they’re not even potential nominees.

The heterogeneity of the audience’s interests in quality is critical to understanding competition in SVOD as different providers are focused on sating different portions of viewers’ appetites. HBO, for example, has historically targeted the most intensive and immersive TV viewing. This is shown not just by their perennial Emmy hauls or creative sensibilities, but when they release their shows: Sunday nights from 9-11PM. By focusing on filling the most prioritized TV timeslot with the highest possible quality of content, HBO was able to generate industry-leading revenue per hour watched and record profits. But it also capped its viewer base at roughly 30% of all Pay-TV homes and 1% of total viewing time. When you target only a portion of consumption, you get only a portion of the audience.

At Amazon, original content serves as an anchor for Prime and the overall Amazon Video ecosystem, which also offers virtually all other video content ever produced via its digital store and Channels partners (e.g. HBO, Showtime, Sundance Now). Specifically, this means trying to fulfill unmet content wants for audiences and, in doing so, giving them a reason to centralized their digital video time and spend on Amazon. Many assume this means focusing on “quality,” but it doesn’t mean quality per se – it means offering content that viewers can’t get elsewhere. Sometimes this means narrative, other times it’s the specificity of the story or the spectacle or its budget. Jack Ryan, put another way, isn’t Homeland, nor should it be. And it’s certainly not trying to be Bridge of Spies. Apple is working a similar angle to Amazon, though with a broader global focus on even splashier global titles with mega-watt stars and IP. And again, this doesn’t mean Quality.

Netflix meanwhile is quite open about the fact they are going after all time spent watching video (and as much non-video free time as they can get, including “enjoying a glass of wine with [your] partner”). As a result, the company’s closest competitor is no longer top-shelf content producer HBO – and hasn’t been for some time – but Pay-TV overall. And a lot of Pay-TV content is “bad” (no judgment intended). As a result, it’s also Netflix’s job to make “bad content” (again, no judgment). This may mean releasing shows which pull down the network’s “average”, but this is by design and in accordance with viewer preferences. And just as with a Pay-TV subscription, a Netflix subscriber isn’t valuing the service’s average Quality based on what they don’t watch or what doesn’t appeal to them any more than they do toward Pay-TV. They just care about what they watch. Analysts should stop pretending otherwise.

Relatedly, the prodigious Quantity of Netflix’s content and relative scarcity of its “Quality” programming (or its supposedly lackluster “Average” programming) also means that the streamer’s creative achievements tend to be overlooked or forgotten. In 2016, Netflix had 1 of 7 nominations for “Outstanding Drama” at the Emmy’s and 2 of 7 Comedies, for a combined 21% share of total “Outstanding Series” nominations. In 2017, it had 3/7 dramas and 2/7 comedies (36%). In 2018, 2/7 and 2/7 (29%). The only network to outperform Netflix is HBO – and their nominees represent nearly half of their entire original content offering. (And for what it’s worth, Netflix’s year-over-year decline in Emmy nominees was likely caused by Kevin Spacey; the only House of Cards season not to receive a nomination for Best Drama came after his scandal broke). Imagine judging Pay-TV quality to be lackluster because 90% of its shows never get award nominations.

To thrive in the content business, every network needs “great” shows, and “classic shows”, and arguably monocultural hits, too. Netflix certainly has all of these. But the bulk of its budget is spent in pursuit of the bulk of its viewership: non-Emmy winners. Also notable is the fact that many of the Emmy nominees that don’t come from Netflix are nevertheless distributed by Netflix and branded by Netflix outside of the United States, such as Better Call Saul, American Crime Story. In most markets, Netflix holds a 50% share in “Outstanding Series nominees”. The US has the service’s lowest quality, yet the US catalogue is almost exclusively the only one used to assess its overall “quality”.

In short, Netflix isn’t always hired to show “great TV”. They’re not always trying to “make great TV”. In fact, most TV is “bad”. If Netflix made more The Crowns and less overall content, it would be less popular, less watched, and slower growing.


Who Watches the Watchmen’s Watchmen?

Part of the general obsession with SVOD quality stems from the overall lack of publicly available data in the SVOD sector. Given we don’t know much about what Netflix, Amazon and Hulu viewership looks like, we seek out alternative forms of comparison. And reviews, Google Trends, and social chatter represent the bulk of publicly available Data. Again, while this decision is understandable, it leads to irrelevant assessments, noise and a strange air of judgment.

In 2018, Quartz declared that Netflix was the “King of Mediocre” after Streaming Observer found that Netflix ranked last among Amazon, Netflix and Hulu in “average” Rotten Tomatoes Scores and Metacritic Scores, and in the bottom third when competitors such as Starz, AMC, HBO, and USA were added. There are numerous problems here which serve as a good entrée to the use of consumer or critical review Data in assessing Quality.

Chief among the issues is the topline statistical irrelevance of the findings. What does it matter that HBO is a 75, Amazon a 72 and Netflix (an apparently pitiful) 70? Not only is the materiality of this differential unclear (is ±1 a lot?), but their impact on viewership is unknown. Furthermore, these findings seem to rebut the idea that quality matters – with every player so tightly bunched together, competition is naturally focused elsewhere (e.g. pricing, distribution, brand, etc.). But behind the scenes, these scores become even more useless and the “data” more frivolous.

Despite its popularity, Rotten Tomatoes is very poorly understood. The “Tomatometer” calculates what percentage of critics gave a title a score greater than (but not equal to) 6/10. Not only is this a low threshold, but it reveals nothing about how great a title is – just what share of critics think it’s not “bad”. This frame is significant. As an example, Wonder Woman, Black Panther, and Shazam! all received Rotten Tomatoes scores in the 90s (93%, 97%, 91%). This means only about 22 of an average 380 critics disagree about whether these films were better than a 6/10. At the same time, Black Panther has an average score of 8.4, Wonder Woman 7.7 and Shazam a 7.3. Similarly, Black Panther grossed $700MM in the US and received seven Academy Awards nominations (including Best Picture), Wonder Woman grossed $413MM and was seen as a potential Oscar nominee but received none, and Shazam! will likely peak at $150MM and has no chance at an Oscar nod. That’s a lot of variations in quality and commercial appeal for three films targeting the same audiences and achieving the same Rotten Tomatoes score.

Metacritic is even stranger. While the service uses actual scores, it doesn’t weight all reviewers/outlets equally (though we don’t know which outlets or journalists matter more or less, or to what degree). It also means that when a given outlet doesn’t provide a literal rating, such as The New York Times, a Metacritic editor will try to guess what the reviewer’s exact rating would have been. It’s sort of like Inception; the review site has to review the value of individual reviewers as they relate to the value of other reviewers that reviewed that same title, and then review both reviewers’ reviews.

And beyond the scoring approach of the major services, TV reviews are a statistician’s nightmare. While most major film releases collect hundreds of reviews, even the most popular TV shows struggle to earn more than a dozen. The buzziest show of 2017, Stranger Things, for example, has only 34 reviews on Metacritic. This is a fair sample size, but it tends to be the ceiling. Hulu’s 2017 series Cardinal has only seven reviews. Amazon’s much-marketed 2019 release Hanna season 1 has but 13. The validity of these comparisons is dubious, especially given the reviewers change from series to series – which also means that the value of individual reviewers changes not just on a percentage basis, but in Metacritic’s weighting, too.

And as series age, their review counts drop precipitously. For all of its monocultural significance, Game of Thrones Season 8 has only 12 reviews on Metacritic, down from 26 in its premiere season. The Walking Dead, which remains cable TV’s most popular drama, had only four reviews in Season 9, down from 25 in Season 1. As a result, some show scores go up over time purely because the critics that hated them stopped reviewing. When you consider variations across network sizes, branding policies, a series’ popularity, and age, and then individual reviewers, the sample bias is immense.

Despite their volume, consumer reviews are also suspect and incredibly noisy. Some viewers rate their shows after the pilot, others after the first season, some halfway through the third, others after a bad season turns them off, while a purist might wait until it’s over. This timing is incredibly distortive, as is the reason they viewed the show in the first place (e.g. because of an actor, compelling marketing, cultural buzz, incidental boredom, etc.). As IMDb’s own demographic data shows, its user ratings are far from representative (there’s also ample evidence of shows, such as those focused on women, being relentlessly review bombed). This issue of self-selection (or rather, intent-driven rating) is even more pronounced when considering episode-by-episode reviews – the “average” person never rates a show on IMDb to begin with, let alone every one of its episodes.

Stated preferences are also famously difficult to use in content. What’s the takeaway when someone gives a show 3/5, but then binges the entire season in a weekend? Or reviews Season 1 of LOST as “3/5” because Jack and Locke didn’t actually go down the Hatch in the finale?


Programming for Whom

Regardless of methodological issues, “averages” obfuscates six core critical programming realities and strategies.

One is the relationship between averages and volume. Netflix airs many more TV series than each of its competitors. And so, while it’s average might be less than HBO’s, it also has far more shows that rank at the top. Which matters more to an audience member, the number of great shows or the ratio of great shows to okay shows? At what scale?

To a related end, one also needs to consider library accumulation. As the stored (i.e. library or catalogue) output of Amazon, Netflix, and Hulu grows, it becomes mathematically harder to change this average score. Similarly, the value of each new year’s “quality” matters less, too. A new subscriber to HBO doesn’t need dozens of 9/10 shows in 2020, they have years of old 9/10 output to watch and catch-up on – sometimes of the same shows still airing new seasons.

Third, the value of individual shows and ratings are not equal. HBO’s Game of Thrones is probably worth ten Ballers and five VEEPs; it makes no sense to weight them equally, or even in accordance with their relative viewership.

“Averages” also imply that lower-rated shows are “bad” decisions. Was Fuller House, one of Netflix’s most-watched shows, a “bad” decision because it pulled down the network’s average review? Does a “bad” show even matter if those not targeted by the show never see it thanks to algorithmic recommendations? Even HBO doesn’t focus its programming efforts on “high scoring” shows. That’s not to say they don’t go for the “best” version of a category, but shows like Ballers and True Blood wouldn’t exist in a world where competition is based on having the highest average critical review. The focus on average scores implies this, even though it’s not what determines viewership or value in the eyes of the audience.

Fourth and most important is the fundamental ideal of quality. There are two important English terms that derive from the Latin word “Quale”, quality and qualia. The former, at least in its general application, presumes some universal truth and importance: Mad Men Is Art. But in truth, media adheres more closely to the idea of “qualia” – the internal and subjective component of sense perceptions, arising from stimulation of the senses by phenomena. Hollywood might have tastemakers, but it’s ultimately not in the business of telling audiences what’s good, it’s about entertaining them. We love to say Disney makes the very best content (and their performance here breaks economic gravity), but the company routinely makes supposedly “bad” films that delight audiences to the tune of $1B (see 2019’s Aladdin and 2010’s Alice in Wonderland). What else matters? Pay-TV and Netflix SVOD’s “job to be done” is to provide positive qualia, not optimal quality. Ultimately, all that matters is that a viewer decides something has quality – and that’s shown by engagement.

This concept connects with the aforementioned challenges of using stated user reviews in assessing quality. The Crown isn’t a hammer or television set; averages work when a product or service is a plain utility. They don’t work with narratives. What matters is whether the viewer likes a show, not whether others do. The question of “quality” is best left to professional critics – if not philosophers altogether – it has little relevance in programming mass media. To this end, it should be no surprise that Netflix decided to remove written reviews, average ratings, and five-star rating scale and replace them with a percentage recommendation and Tinder-like thumbs up and thumbs down. Between a film that 80% of people give a B and one that’s split 50/50 between As and Cs, the second film matters more – it just needs to be delivered to the right viewer and hidden from the wrong one.

And this is the important point around “Quality”. Every network needs to maximize the amount of content its audiences says is “great”. It doesn’t matter if these titles truly are great, if critics agree, or even if most viewers are referring to the same titles when they say Network [X] makes great shows. They just need to love what they choose to watch. Perhaps Netflix isn’t that different from traditional TV after all.


You can reach Matthew Ball at mb.ball@gmail.com or @ballmatthew

Part 1 of this series explained that Netflix spends far more on content than is typically reported. Part 2 explained how (and why) Netflix uses product and technology to economically outspend its competitors. Part 3 explained why Netflix risks so much. Part 4 explained why the term ‘Original Series’ if often a lie – and how Netflix uses this fact to beat its competitors. Part 5 explained why 2019 and 2020 don’t represent significant threats to Netflix despite the volume of new entrants and their impact on Netflix’s library. Part 7 explains why Netflix has been so resilient over the past decade – and why this is likely to continue even as competition intensifies.