Online TV Archive Preserves History of Politics Coverage


Roger Macdonald, director of the TV Archive, at the headquarters of the nonprofit Internet Archive in the Richmond District of San Francisco. (Noah Arroyo // Public Press)

Audio fingerprinting’ makes it possible to track political ads.

For this look-back at the November 2015 election, the Public Press spent many hours tracking down televised political ads and related news coverage of local campaigns.

Once we found them, we assessed the accuracy of their claims (read “The Most Misleading Ads of 2015” on pages B4 and B5); figured out which races were the targets of the most ads; and compared the total number of minutes that stations ran ads and election-related news coverage (page B2).

We crafted these analyses using a database created by the Internet Archive. The pioneering San Francisco-based nonprofit is probably best known for its Wayback Machine, which catalogs previous versions of webpages.

Through the organization’s TV News Archive, users can search for phrases that appear in the closed captions of television broadcasting. They can filter those searches by the dates those terms were spoken, the stations and shows that featured them, the programs’ language and other criteria. All search results are shareable on social media. The archive records 36 TV channels, from the major local stations in the Bay Area, Philadelphia and Washington, D.C. The project has compiled more than 1 million clips since 2009.

This publicly accessible tool helps journalists (and anyone else whos interested) for the first time to perform data-driven analyses of the content found on television — “our most pervasive, and it’s also our most persuasive, medium,” said the TV archive’s director, Roger Macdonald.

“In the case of political ads, here people are spending enormous amounts of money, and local television stations are raking in tons of money to broadcast these ads.” –Roger Macdonald, TV Archive Director

“We think it’s important that everybody’s able to reflect on what the messaging is, who the messenger is, and what the propagation of that message is,” Macdonald said. “In the case of political ads, here people are spending enormous amounts of money, and local television stations are raking in tons of money to broadcast these ads.”

Political ads have proved difficult to track. The Federal Communications Commission does not require advertisements to be closed-captioned if their run times are shorter, and the archive’s team found that about 70 percent of political ads lacked captions and were therefore not searchable.

The ads that ran for San Francisco’s November 2015 election were no exception. For the Public Press to track them, the Internet Archive used a tool under development since 2014 that records “audio fingerprints” of specific ads and then locates other instances when those exact sound sequences play.

The team has fine-tuned audio fingerprinting and scaled up its use. Today it is applied to all the television that is recorded, and Macdonald said it can track almost all types of TV content.

Many organizations and analysts have used the archive’s raw metadata to tell stories and create visualizations. Data scientist Kalev Leetaru has illustrated news stations’ editorial choices, showing how often coverage made reference to international locations by causing those places to twinkle on an animated world map with every utterance.

In another, he graphed the mentions of 2016 presidential candidates’ names in the news. “You can see this incredible differential between Trump and the rest of the field, where he’s mentioned so often,” Macdonald said. Journalists have repeatedly used this tool to show that Trump gets outsized free publicity from news stations eager to snag viewers.

Macdonald said that when the TV News Archive first went online in 2012, journalists didn’t flock to it.

“If you build it, they will come?” he asked. “Nah.”

The database is somewhat nonintuitive, he said, and his team has been actively contacting news organizations and offering to train them to use its interface and the underlying raw data. As a result, journalists are using it more often, working its data into an increasing number of articles and infographics.

Roger Macdonald (Noah Arroyo // Public Press)

Don't miss out on our newest articles, episodes and events!
Sign up for our newsletter

Visit the TV Archive at