Future of TV Briefing: How generative AI tools are speeding up video production processes

This Future of TV Briefing covers the latest in streaming and TV for Digiday+ members and is distributed over email every Wednesday at 10 a.m. ET. More from the series →

This week’s Future of TV Briefing looks at how generative AI tools like OpenAI’s DALL-E 2 and Stability AI’s Stable Diffusion are being used to produce videos.

  • Lights, camera, AI
  • Upfront advertisers’ price pushback, Amazon’s Prime Video ad plans, Twitch’s branded content debacle and more

Lights, camera, AI

It’s probably only a matter of time before artificial intelligence technology becomes a standard part of the video production workflow. And the reason for that seeming inevitability is likely also a matter of time, with generative AI as the latest tool — in the lineage of digital cameras, non-linear editors and green screens — being adopted to not only speed up production workflows but free up creative teams to concentrate on the parts of production that most require human handling.

For Coca-Cola’s “Masterpiece” commercial, the team at Electric Theatre Collective had eight weeks to create the visual effects that would turn subjects from famous paintings, such as Edvard Munch’s “The Scream,” J.M.W. Turner’s “The Shipwreck” and Johannes Vermeer’s “Girl With a Pearl Earring,” into animated characters. 

“We had to look to AI for this project because there was no other solution,” said Ryan Knowles, head of 2D and VFX supervisor at Electric Theatre Collective. 

To be clear, it’s not that Electric Theatre Collective’s crew of animators and VFX specialists couldn’t have created the campaign’s effects using traditional filmmaking tools. In fact, many of the campaign’s visual effects were created using human-drawn animations and actors shot on stylized sets and against green screens. But to meet the deadline and stay under budget, the VFX shop needed to add Stability AI’s image-generating AI tool Stable Diffusion to its toolset.

“You’ve got the gamut of creativity. You’ve got part of our team who is literally just drawing things and animating frame by frame. And then at the other end, we’re creating these massive [computer-generated] worlds and then pumping them into Stable Diffusion, getting them back out and then layering them on top of each other,” said Knowles. He added, “It’s not just an AI creation. It’s not just an animation piece. It’s kind of everything thrown in there, at speed.”

The team at Electric Theatre Collective primarily used Stability Diffusion to take actual shots filmed with a camera and live actors on a set and have the image-generating AI tool apply the corresponding painting’s texture to the frames. 

For example, one shot features a woman in a portrait by Stefania Tejada flying out of the painting to throw a Coke bottle across the room. “When she’s flying through the air and she’s frozen, that’s a hand-painted painting of her face that’s then projected and combined with Stable Diffusion layers to give it a tiny bit of oil,” said Knowles. But then other shots, like one in which a character from one painting falls into another, he said, “it’s just pure, unadulterated Stable Diffusion takeover.”

Could all of that have been done exclusively using a traditional VFX workflow sans generative AI tools? Sure. But it would’ve cost a lot of money and taken a lot more time.

“I try and think about some of the setups, some of the vignettes, trying to do them without any machine learning or AI. And it’s a timeline that’s maybe five times as long and a much bigger team,” said Knowles. Thanks to the use of Stable Diffusion, Electric Theatre Collective’s high-end effects simulation specialists could “concentrate on the close-ups while we use machine learning for the wides,” he added.

The same can be said of “The Frost,” a 12-minute-long animated short film about a team of explorers located in a snowy mountain range. “We don’t have the resources or anything like that to get out there on the mountain and create a piece like this,” said Stephen Parker, head of creative at Waymark, the company that created the short film and that runs a business developing AI tools for businesses to use to create video ads. 

Instead of spending the better part of a year filming on location in the Alps with a hundred-plus person crew, a core team of seven Waymark employees spent three and a half months taking a script written by Waymark executive producer Josh Rubin, prompting DALL-E 2 to create each shot, animating characters’ facial expressions using D-ID’s Creative Reality Studio and then editing it all together using traditional post-productions tools, including Adobe’s After Effects, which Waymark’s animators used to animate the still images created by DALL-E 2.

Saying that every frame of “The Frost” was created by DALL-E 2 is like saying that every frame of “Top Gun: Maverick” was created by Sony Venice. No, the Sony Venice was the camera that the film’s director Joseph Kosinski and cinematographer Claudio Miranda used to make the movie, and, similarly, DALL-E 2 was the generative AI tool that Waymark’s team used to make the short film. Just as Kosinski and Miranda needed to conceptualize and frame up their shots, Parker et al. needed to conceptualize theirs and write them up as prompts for DALL-E 2.

The prompts followed “a modular structure. So it reads like a sentence,” said Parker. “The first part is something like shot type. The next section is the actual scene, its components: description of a person performing action in location. Then there’s usually some variable for the climates or the weather, the lighting, the temperature: snow, snowy mountains, blustery snow. And then close it off with a note about direction.”

Even the most extreme examples of AI-generated videos still require some human intervention. Take, for example, a series of one- to two-minute-long, AI-generated trailers for “Star Wars,” “Lord of the Rings” and “Avatar” in the style of Wes Anderson created by production studio Curious Refuge. The scripts and shot lists were written by OpenAI’s Chat GPT; the shots were created by Midjourney; the animations were done by D-ID’s Creative Reality Studio; and the characters were voiced by Eleven Labs, a text-to-speech generative AI tool.

“The ‘Star Wars’ trailer took about 20 hours to put together from start to finish, which — compared to if I had to traditionally go and shoot all this stuff and get actors — 20 hours is amazing. But it still takes time,” said Curious Refuge CEO Caleb Ward. It also still took a human (Ward) to direct the production in the way that a TV showrunner or a movie director oversees everything from the script to the shoot to the editing to the sound design.

“I could type in ‘write the script,’ and [ChatGPT] would give me a script. I could change the script based on my recommendations in the same thread. … And so at each step of this process, you’re injecting your humanity [in the form of] your curation sense into the project,” said Ward. 

For example, Ward would draft upwards of 40 different prompts to submit to Midjourney to create a given shot for the trailer and then sift through the results to pick the one that best fit the image he had in mind. That process was a lot quicker than if he had to go out and shoot each shot — finding a location or designing a set, setting up lights, dialing in the camera settings, blocking out the scene with actors, etc. — but it still required Ward to not only helm the project but to do so in a hands-on way.

“People think, ‘Oh, AI is just this automatic thing that automatically does everything so much better than a human. It can do some things automatically, but really it comes down to that prompting, and you need to have taste in order to create those good prompts,” he said.

For all the utility that video producers and VFX specialists are finding in generative AI tools, as covered above, they are also encountering limits in what the tools can do and how they can be used. We’ll cover that in next week’s edition of this newsletter.

What we’ve heard

“If you include user-level information [in a streaming ad bid request], then you can’t include show-level information [because of the Video Privacy Protection Act]. I heard from one client that said they don’t care about user-level data and they care more about show-level [data].”

Ad tech executive

Numbers to know

74%: Percentage share of U.S. households that have at least one smart TV or connected TV device.

21%: Percentage increase in certain streaming residuals secured by the Directors Guild of America in its new contract.

-39%: Percentage decrease in CNN’s ad revenue for the period from January 2023 through April 2023 compared to the prior year.

74%: Percentage share of surveyed Instagram users who said they use Reels.

-2.4%: Forecasted percentage decline year over year in U.S. TV ad spending in 2023.

What we’ve covered

With TikTok star Keith Lee, Pepsi hopes to draw Gen Z to Black-owned restaurants:

  • Lee will promote Pepsi’s Pepsi Dig In program through his TikTok account.
  • Through the program, Pepsi will pick Black-owned restaurants to receive funding and access to consulting services.

Read more about Pepsi here.

How a new data-driven TV ad product from Blockgraph aims to harmonize varied data feeds:

  • Blockgraph’s GraphPort aims to map out third-party identity graphs.
  • The tool will access audience data from Comcast/NBCUniversal, Charter Communications and Paramount.

Read more about Blockgraph here.

What we’re reading

Amazon preps Prime Video ads tier:

Amazon plans to join Netflix and Disney+ by adding an ad-supported tier to its previously ad-free service (though Prime Video does include ads via its integration of Freevee, much to my mom’s consternation), according to The Wall Street Journal.

Advertisers push back on upfront prices:

Advertisers are angling to get TV network owners to reduce their rates in this year’s upfront ad deals after years of price increases, according to Variety.

Advertisers push back against Amazon’s NFL pitch:

Amazon kinda fumbled the kickoff to its NFL ad sales effort by asking advertisers to pay up for its streaming broadcasts and then under-delivering on audiences, according to The Information.

Twitch streamers push back on branded content rules:

Amazon-owned Twitch quickly reversed course on a policy update that would have prohibited streamers from baking ads into their broadcasts and was immediately met with resistance, according to The Verge.

Netflix pushes into live sports:

Netflix is looking to stream a live celebrity golf tournament this fall featuring stars from two of its sports documentary series, “Drive to Survive” and “Full Swing,” according to The Wall Street Journal.

Post-peak TV:

Traditional TV was a cash cow for programmers and producers and, well, kinda everyone. Streaming, by contrast, has yet to hit on an economic model that’s lucrative for, well, anyone, according to Vulture.

https://digiday.com/?p=507001

More in Future of TV

Future of TV Briefing: How the European creator economy compares to the U.S.

This week’s Future of TV Briefing recaps the conversation with Whalar’s Emma Harman about the European creator economy during last week’s Digiday Publishing Summit Europe.

Future of TV Briefing: A Q&A with MSNBC’s Rashida Jones

This week’s Future of TV Briefing features a conversation with MSNBC president Rashida Jones about how the TV news network’s digital strategy has evolved this year and how that figures into its Election Day coverage plans.

Future of TV Briefing: Inside The Wall Street Journal’s video-based approach to this year’s election coverage

This week’s Future of TV Briefing looks at how The Wall Street Journal is using video to cover this year’s U.S. presidential election.