Friday, July 25, 2014

New algorithm identifies data subsets that will yield the most reliable predictions by Larry Hardesty

Much artificial-intelligence research addresses the problem of making predictions based on large data sets. An obvious example is the recommendation engines at retail sites like Amazon and Netflix.

But some types of data are harder to collect than online click histories —information about geological formations thousands of feet underground, for instance. And in other applications—such as trying to predict the path of a storm—there may just not be enough time to crunch all the available data.
Dan Levine, an MIT graduate student in aeronautics and astronautics, and his advisor, Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics, have developed a new technique that could help with both problems. For a range of common applications in which data is either difficult to collect or too time-consuming to process, the technique can identify the subset of data items that will yield the most reliable predictions. So geologists trying to assess the extent of underground petroleum deposits, or meteorologists trying to forecast the weather, can make do with just a few, targeted measurements, saving time and money.
Levine and How, who presented their work at the Uncertainty in Artificial Intelligence conference this week, consider the special case in which something about the relationships between data items is known in advance. Weather prediction provides an intuitive example: Measurements of temperature, pressure, and wind velocity at one location tend to be good indicators of measurements at adjacent locations, or of measurements at the same location a short time later, but the correlation grows weaker the farther out you move either geographically or chronologically.

Tuesday, July 22, 2014

Emotional Contagion on Facebook? More Like Bad Research Methods by JOHN M. GROHOL, PSY.D.

A study (Kramer et al., 2014) was recently published that showed something astonishing — people altered their emotions and moods based upon the presence or absence of other people’s positive (and negative) moods, as expressed on Facebook status updates. The researchers called this effect an “emotional contagion,” because they purported to show that our friends’ words on our Facebook news feed directly affected our own mood.

Nevermind that the researchers never actually measured anyone’s mood.

And nevermind that the study has a fatal flaw. One that other research has also overlooked — making all these researchers’ findings a bit suspect.

Putting aside the ridiculous language used in these kinds of studies (really, emotions spread like a “contagion”?), these kinds of studies often arrive at their findings by conducting language analysis on tiny bits of text. On Twitter, they’re really tiny — less than 140 characters. Facebook status updates are rarely more than a few sentences. The researchers don’t actually measure anybody’s mood.

So how do you conduct such language analysis, especially on 689,003 status updates? Many researchers turn to an automated tool for this, something called the Linguistic Inquiry and Word Count application (LIWC 2007). This software application is described by its authors as:

The first LIWC application was developed as part of an exploratory study of language and disclosure (Francis, 1993; Pennebaker, 1993). As described below, the second version, LIWC2007, is an updated revision of the original application.
Note those dates. Long before social networks were founded, the LIWC was created to analyze large bodies of text — like a book, article, scientific paper, an essay written in an experimental condition, blog entries, or a transcript of a therapy session. Note the one thing all of these share in common — they are of good length, at minimum 400 words.

Why would researchers use a tool not designed for short snippets of text to, well… analyze short snippets of text? Sadly, it’s because this is one of the few tools available that can process large amounts of text fairly quickly.

Who Cares How Long the Text is to Measure?

You might be sitting there scratching your head, wondering why it matters how long the text it is you’re trying to analyze with this tool. One sentence, 140 characters, 140 pages… Why would length matter?

Length matters because the tool actually isn’t very good at analyzing text in the manner that Twitter and Facebook researchers have tasked it with. When you ask it to analyze positive or negative sentiment of a text, it simply counts negative and positive words within the text under study. For an article, essay or blog entry, this is fine — it’s going to give you a pretty accurate overall summary analysis of the article since most articles are more than 400 or 500 words long.
For a tweet or status update, however, this is a horrible analysis tool to use. That’s because it wasn’t designed to differentiate — and in fact, can’t differentiate — a negation word in a sentence.1

Let’s look at two hypothetical examples of why this is important. Here are two sample tweets (or status updates) that are not uncommon:
    “I am not happy.”
    “I am not having a great day.”
An independent rater or judge would rate these two tweets as negative — they’re clearly expressing a negative emotion. That would be +2 on the negative scale, and 0 on the positive scale.

But the LIWC 2007 tool doesn’t see it that way. Instead, it would rate these two tweets as scoring +2 for positive (because of the words “great” and “happy”) and +2 for negative (because of the word “not” in both texts).

That’s a huge difference if you’re interested in unbiased and accurate data collection and analysis.

And since much of human communication includes subtleties such as this — without even delving into sarcasm, short-hand abbreviations that act as negation words, phrases that negate the previous sentence, emojis, etc. — you can’t even tell how accurate or inaccurate the resulting analysis by these researchers is. Since the LIWC 2007 ignores these subtle realities of informal human communication, so do the researchers.2

Perhaps it’s because the researchers have no idea how bad the problem actually is. Because they’re simply sending all this “big data” into the language analysis engine, without actually understanding how the analysis engine is flawed. Is it 10 percent of all tweets that include a negation word? Or 50 percent? Researchers couldn’t tell you.3

Even if True, Research Shows Tiny Real World Effects

Which is why I have to say that even if you believe this research at face value despite this huge methodological problem, you’re still left with research showing ridiculously small correlations that have little to no meaning to ordinary users.

For instance, Kramer et al. (2014) found a 0.07% — that’s not 7 percent, that’s 1/15th of one percent!! — decrease in negative words in people’s status updates when the number of negative posts on their Facebook news feed decreased. Do you know how many words you’d have to read or write before you’ve written one less negative word due to this effect? Probably thousands.
This isn’t an “effect” so much as a statistical blip that has no real-world meaning. The researchers themselves acknowledge as much, noting that their effect sizes were “small (as small as d = 0.001).” They go on to suggest it still matters because “small effects can have large aggregated consequences” citing a Facebook study on political voting motivation by one of the same researchers, and a 22 year old argument from a psychological journal.4

But they contradict themselves in the sentence before, suggesting that emotion “is difficult to influence given the range of daily experiences that influence mood.” Which is it? Are Facebook status updates significantly impacting individual’s emotions, or are emotions not so easily influenced by simply reading other people’s status updates??

Despite all of these problems and limitations, none of it stops the researchers in the end from proclaiming, “These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks.”5 Again, no matter that they didn’t actually measure a single person’s emotions or mood states, but instead relied on a flawed assessment measure to do so.

What the Facebook researchers clearly show, in my opinion, is that they put too much faith in the tools they’re using without understanding — and discussing — the tools’ significant limitations.6


Kramer, ADI, Guillory, JE, Hancock, JT. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS.

  1. This according to an inquiry to the LIWC developers who replied, “LIWC doesn’t currently look at whether there is a negation term near a positive or negative emotion term word in its scoring and it would be difficult to come up with an effective algorithm for this anyway.” []
  2. I could find no mention of the limitations of the use of the LIWC as a language analysis tool for purposes it was never designed or intended for in the present study, or other studies I’ve examined. []
  3. Well, they could tell you if they actually spent the time validating their method with a pilot study to compare against measuring people’s actual moods. But these researchers failed to do this. []
  4. There are some serious issues with the Facebook voting study, the least of which is attributing changes in voting behavior to one correlational variable, with a long list of assumptions the researchers made (and that you would have to agree with). []
  5. A request for clarification and comment by the authors was not returned. []
  6. This isn’t a dig at the LIWC 2007, which can be an excellent research tool — when used for the right purposes and in the right hands. []

Monday, July 14, 2014

io9: Anti-Obamacare Ads Backfired, Says A New Statistical Analysis by Mark Strauss

Opponents of the Affordable Care Act have spent an estimated $450 million on political ads attacking the law, outspending supporters of Obamacare 15-to-1. But a state-by-state comparison of negative ads and enrollment figures suggests the attacks ads actually increased public awareness of the healthcare program.
Niam Yaraghi, a Brookings Institution expert on the economics of healthcare, based his analysis on recently released data (below) that tallies how much money was spent on anti-Obamacare ads in each state.
He then examined Affordable Care Act (ACA) data to determine enrollment ratios. Although more than 8 million Americans have signed-up to purchase health insurance through the marketplaces during the first open enrollment period, that number masks the tremendous variation in participation across states. For instance, while the enrollment percentage in Minnesota is slightly above 5%, in Vermont, close to 50%of all eligible individuals have signed up for Obamacare.
Yataghi found that after controlling for other state characteristics such as low per capita income population and average insurance premiums, he observed a positive association between the anti-ACA spending and enrollment:
This implies that anti-ACA ads may unintentionally increase the public awareness about the existence of a governmentally subsidized service and its benefits for the uninsured. On the other hand, an individual's prediction about the chances of repealing the ACA may be associated with the volume of advertisements against it. In the states where more anti-ACA ads are aired, residents were on average more likely to believe that Congress will repeal the ACA in the near future. People who believe that subsidized health insurance may soon disappear could have a greater willingness to take advantage of this one time opportunity.