06 February 2012

Aggregation tests traditional notions of sourcing and accountability

 “A thousand bloggers all talking to each other doesn’t get you a report from a war zone. Somebody’s got to take a real risk... [and] gather that news originally.”
- Wikipedia founder Jimmy Wales,
in Page One documentary

Media blogger Jim Romenesko gained a Twitter following of more than 45,000 people (and growing) by being a maverick paraphraser — a sensei in the art of finding interesting news and distilling it to its essence. His success germinated while working at the Poynter Institute where he helped pioneer an aggregation style that now permeates the web. He was responsible for hundreds of thousands of hits to other websites, giving them free advertising through his enticing summaries.
In late 2011, Romenesko’s aggregation techniques became the subject of hot debate and derision when his boss, Julie Moos, posted a blog that challenged his practices and claimed his attributions were incomplete. She provided examples where Romenesko failed to include quotation marks around certain phrases within his aggregated story.

Romenesko clearly indicated his information was coming from another source, but he failed to put some, not all, direct quotes into quotation marks. He had been engaged in this practice for more than a decade, and, according to Moos, not a single person he had 'plagiarized' came forward to complain during his lengthy tenure at Poynter. Why? Because Romenesko was sending traffic to their websites. Strange how two pairs of tiny lines can determine whether someone is plagiarizing or not.

Should the people who have their work plagiarized be the ones who determine a plagiarism? Or should the organization and its editors determine such? If the plagiarized 'victim' benefits from the 'theft,' does it really matter?

If a substantial portion of the story is scraped, even if links return to the original source, there will be little benefit for a person to read the original post. It’s the difference of scraping a little and sending 5,000 readers to an outside site, or scraping a lot and sending 50. Sure, that’s 50 more readers than you had without the theft, but is that a benefit?  

Aggregation often mixes with regular news stories, making it difficult to know who wrote what. Most news appears in feeds, or blocks, with a headline, maybe a graphic, and a small section of text summarizing the story. Techmeme is a popular aggregator and presents very brief descriptions of stories that link directly to outside sites. But aggregators are becoming increasingly more prone to creating content, and distinguishing from the two can be a difficult task.

Poynter, for example, used to link directly to outside sources, but now they link to a more in-depth aggregation story on Poynter’s website that can then be linked to the original source. Poynter is also starting to generate more original content, and couple it with its aggregation. Main Street Connect and Patch mix aggregated stories often without any external links, and in MSC’s case, attribution is given to “published reports” instead of a specific source. (I used to work for MSC)

The emperor of aggregation, Huffington Post, eventually hired well-heeled journalists to produce original content, which mingles with aggregation. The New York Times includes aggregated stories, but posts them in a separate section in its website called The Lede, which includes “information gleaned from the Web or gathered through original reporting,” which is essentially a hybrid model, but at least clear in its intentions.

What if we all begin mixing aggregation with original reporting, or aggregating aggregation? Aaron Wall of SEOBook.com notes the conundrum of scraping a scrape: “At some point... [these] loops start feeding back into themselves & make a near-infinite cycle.”

Often times, organizations such as Huffington Post will over-aggregate, by summarizing too much of the original work, giving a reader little reason to go to the original source. New York Times columnist Bill Keller singled out Ariana Huffington as the “queen of aggregation.”

HuffPo is infamous for its practices. Simon Dumenco, a blogger for Ad Age, wrote a post about Apple announcing big product news on the same day that former New York Congressman Anthony Wiener was outed for risqué tweets. HuffPo aggregated the story, as did Techmeme. Simon Dumenco:

“Huffpo closed out its post with ‘See more stats from Ad Age here’ -- a disingenuous link, because Huffpo had already cherrypicked all the essential content. HuffPo clearly wanted readers to stay on its site instead of clicking through to AdAge.com.

“So what does Google Analytics for AdAge.com tell us? Techmeme drove 746 page views to our original item. HuffPo -- which of course is vastly bigger than Techmeme -- drove 57 page views.”
Former HuffPo blogger Ryan McCarthy wrote a piece for Reuters where he admitted HuffPo and many others were guilty of over-aggregating. McCarthy’s story lays out several examples of aggregators going beyond what could be considered fair use. In one example, HuffPo rewrote an Associated Press story about Census data that generated 4,600 hits.

Business Insider, a finance website, not only took a story from tabloid site TMZ, it lifted quotes and every pertinent detail about a former NFL star who was living with his parents. The only attribution comes in the second paragraph. This story also had a byline.

“This seems to go against the basic principles of fair use — it diminishes the source article, and neither piece is transformative or adds any new information,” McCarthy writes. “With a little bit more effort the writer could have made observations about the larger context of these stories.”

Of course, AOL’s purchase of Huffington Post for $315 million would seem to indicate there is value in questionable aggregation. This could also be the case with AOL's other investment, Patch. A former Patch salesman recently advocated the company hire people whose sole job would be to scrape stories — professional plagiarists.

Aggregation stories don't appear in a vacuum. As Wikipedia founder Jimmy Wales said, somebody has to gather the news originally. At an Intelligence Squared debate in 2009, New York Times writer David Carr, defends the traditional news media model against a number of 'new media' fans, including Vanity Fair columnist Michael Wolff. To prove his point, Carr holds up a sheet of paper containing a screenshot from Newser.com, a creation of Wolff’s. The first sheet shows 32 graphics representing 32 aggregation stories from mainstream media sites.

Carr then holds up another sheet representing the same page, except 25 of the 32 stories are cut out, to reveal that aggregators still rely on content makers to populate their sites.

But, as McCarthy notes in his Reuters piece, “When media companies are asked to grow at a meteoric pace... the line between original content and borderline theft gets awful blurry. The editorial mission quickly transforms from ‘What can I link to?’ to ‘How much can I take?’”

So how much can you take? Is plagiarism becoming an artifact of the enlightenment period?  Should it be forgotten as the postmodern age challenges the concept of intellectual property? Or should there be a new standard, one that reinforces longstanding ethical traditions? 

No comments:

Post a Comment