Why botched election predictions don’t herald the end of data journalism

On Tuesday night, newsrooms around the world got a pretty rude reminder that good data is very important.

Before polls opened for the U.S. presidential election, Nate Silver and his colleagues at FiveThirtyEight put the likelihood of a Hillary Clinton victory north of 70 percent; the New York Times put her likelihood higher still, above 80 percent; less than a week before the election, the startup Morning Consult and Politico published research that suggested “shy Trump voters,” while a real thing, were not numerous enough to tilt the odds in his favor.

Then Tuesday happened, and as it unfolded, people began to ask whether it might be time for the media to reconsider its recent love affair with data journalism.

That love affair, sparked when Silver turned his blog into a massive traffic magnet for the New York Times, spawned data teams across the world, set off an arms race for data scientists and huge swells of interest across media. Over the past five years, attendance at Investigative Reporters and Editors’ annual conference on computer-assisted reporting has more than doubled, attracting over 1,000 attendees this past spring, according to IRE executive director Doug Haddix.

Interest in data has bled into other corners of publishers’ offices. Data has become a key piece of publishers’ audience-development and monetization strategies.

But, as with many journalism fads, there have been growing pains. Silver, still data journalism’s most prominent figure, has admitted that FiveThirtyEight didn’t always apply the greatest rigor to its prognostications, especially when it came to Trump during this recently concluded election. “We were basically acting like pundits, but attaching numbers to our estimates,” Silver wrote in a long mea culpa published this summer.

Data-driven efforts sometimes suffer from a lack of good data or, worse, an inability to tell the difference. For example, most of the election models that came up short Tuesday relied on poll data gathered by calling telephone landlines, a method that has grown dramatically less effective in the mobile era. While the landline data still serves a use, Morning Consult’s co-founder and CEO, Michael Ramlet, said many publishers that are new to working with data might not understand how to mix it with something else. “They don’t have the technical capability,” he said.

For all that, Tuesday’s misadventures are not representative of what most publishers use data journalism for. “Predictive analytics based on polling always has been a challenge, with a lot of variables and uncertainty,” Haddix said. “Most data journalism focuses on what tangibly has happened already that can be tracked and analyzed,” including things like analyzing test scores, or government agency data.

And, increasingly, newsrooms have gotten smarter about how to monetize that work. ProPublica, just weeks after announcing the launch of its own data store, is already fielding calls from other news organizations about partnering up to sell the data, whether it’s to academic institutions, commercial interests or even other publishers. And rather than breaking into new territory, it is building on a core competency.

“It’s not that they’re creating these data sets because they have commercial viability,” said Celeste LeComte, ProPublica’s director of business development. “This is what newsrooms do already.”


More in Media

Inside The New York Times’ plans to correlate attention levels to other metrics

There’s a lot of buzz around attention advertising right now, but The New York Times is trying to stay grounded even as it develops its own plans.

Why publishers are preparing to federate their sites

The Verge and 404 Media are exploring the fediverse as a way to take more control over their referral traffic and onsite audience engagement.

Why publishers fear traffic, ad declines from Google’s AI-generated search results

Some publishers and partners hope for more transparency from Google and other AI companies related to AI-generated search.