OK, I’d seen this article and graph kicking around Twitter for a day or two before I finally looked at it, and I’m both glad and not glad I did.

14596705_4d6b2eaed9829d836f3bdd7b03ca9ba4_wm
This impressive-looking graph. You’ve seen it, right?

For anyone who hasn’t already seen it or (like I had) has given it only a cursory weekend glance,  the graph is based on an analysis done by Semantic Visions, “a risk assessment company based in Prague” who “conduct…big data (meaning non-structured, large data requiring serious calculations) analyses with the aid of open source intelligence, on the foundation of which they try to identify trends or risk factors.” They also use a “private Open Source Intelligence system, which is unique in its category and enables solutions to a new class of tasks to include geo-political analyses based on Big Data from the Internet.”

OK, cool.

The gist in this case: Semantic Visions had algorithms read hundreds of thousands of online sources, including 22,000 Russian ones,  searching for different trends.

OK…though as someone who chose to suffer through a media content analysis as a thesis for some reason I have a number of methodology-related questions I don’t want to harp too much on (e.g., how is the algorithm actually designed to determine positive/negative stories vis-à-vis a human? how were the online sources chosen? etc.). A little transparency here would go a long way, proprietary nature of the algorithms notwithstanding.

What gets me is the conclusion they’ve drawn based on the data they’ve gathered and present here in this article.

The article says “the number of Russian articles with a negative tone on Ukraine [from February 2012] started to show a gradual and trend-like increase – while no similar trend can be found in English-language media.”

Yes, your data does show that. Got no problem there.

But it’s this (my emphasis in bold):

“Therefore, based on hundreds of millions of articles the possibility that the actual events in Ukraine could themselves be the reason for the increasing combativeness of Russian-language articles can be excluded. Moreover, the strongly pro-Russian President Yanukovych was still in government at the time and the similarly Eastern-oriented Party of Regions was in power. The explanation is something else: the Putin administration was consciously preparing for military intervention and the Kremlin’s information war against Ukraine started two years before the annexation of Crimea to turn Russian public opinion against Ukrainians…”

How can someone possibly draw that conclusion based solely on the numbers presented here?? Are you privy to other data or pieces of analyses that aren’t public? Because, based on the data that’s presented here, I see absolutely no justification for the conclusion that the Kremlin “was consciously preparing for military intervention.”

Consider:

  • A big part of the explanation for any apparent increase in negative coverage would be the EU Association Agreement being initialed in March 2012, right?
  • Why start the analysis at June 2011? I’d want to see the tone of coverage compared to the last bit of Yushchenko’s presidency through the beginning of Yanukovych’s – maybe the increase over 2012-2013 isn’t so much an increase as a return to “normal” negative coverage of Ukraine.
  • (OK, I lied about no more methodology questions) What about positive stories? Were negative stories about Ukraine taking up a greater share of overall coverage, or did the overall number of articles itself increase? Not being transparent on methodological nerdish issues like this really, really doesn’t help, guys.

Please – no more divining of Kremlinological intentions from incomplete, unclear sets of numbers.

Advertisements