This weekend, I built a prototype for a kind of news aggregation I’ve had on my mind for a while, which I’ve called Panorama.
The idea is fairly straightforward: look for all possible articles on a specific story, across many different sources, then put all the headlines in one place, in chronological order. That’s pretty much it.
For my proof of concept, I picked Michael Flynn, for a few reasons:
1. His brief tenure made it easy to scope down the time range
2. It’s not an ongoing controversy (Turkey lobbying aside), and a relatively self-contained story with a beginning an end is easier to grok
3. Responses to the scandal are mostly polarized
The notion is that, by placing all news sources together, trends will emerge. It should be possible to identify an agenda, for bias to be (more) self-evident when placed in the context of news as a whole.
In terms of methodology, some things worth making note of:
- This was all collected by hand. There are certainly errors.
- I did a lot of googling and did my best to find every relevant article on the Washington Post, Breitbart, and the New York Times from the election until now. Other sources are just articles I found along the way.
- I did not include syndicated articles – AP, Reuters, UPI, etc.. These articles don’t really represent what I’m interested in. Also, it turns out that Breitbart buys almost every syndicated article, no matter how redundant, so there are hundreds of these articles just for the last month of news on Flynn. I assume this is for SEO purposes, since it makes Breitbart much more likely to show up on any given search on a topic.
- Just getting reliable publication times can be a huge pain. For all of these, I had to open up the source for a timestamp – sometimes I could get an ISO string, other times I’d have to convert from a plaintext time.
- Some articles just don’t have accurate publication times recorded, e.g. there are articles about Flynn’s resignation showing up before he’s actually resigned.
Deeper thought on this experiment to come later.
So this is going to sound lame, but the indentation in your post (for the numbered and bulleted list) is really wigging me out.
Interesting idea with Panorama. Why not just use something like Feedly and subscribe to all-the-news-sources-I-care-about and filter by topic. They come with timestamps.
Hm, I hadn’t considered upgrading to a paid account. $18 a month for access to their API might be worth the trouble.
The presentation and concept is pretty different from what you can get in Feedly, but I might be able to use their data to power this viz.
Or just use some generic python (or whatever makes you happy) scripting for RSS feed parsing? If you really want the visualization you came up with, which I agree is compelling, some kind of RSS feed automation should be doable. I’ve never tried to do anything like it, but there’s usually a fully-featured library for everything these days…
https://wiki.python.org/moin/RssLibraries?