EXCUSE THE PUN, but the polling world is all “atwitter” about a University of Indiana professor’s claim that the social media platform Twitter will “undermine the polling industry” and that analyzing social media conversations will put campaign pollsters “out of work.”
Rojas and his research partners studied the number of tweets of candidates’ names during the 2010 Congressional election. They report a relationship between a candidate’s “Tweet share” and the percentage of the vote they won in November. “It turns out that what people say on Twitter or Facebook is a very good indicator of how they will vote,” he writes in the Washington Post.
Pushback from the polling world has been swift and pointed, largely focused on Rojas’ apparent overselling of his own research in the article. Several pollsters and others reviewed various aspects of the authors’ claims, and concluded that, interestingly, tweets did appear to have some predictive power in 2010 Congressional elections. But any relationship between tweets and votes is small when compared to well-known factors like incumbency and the partisanship of the district. In other words, tweets just don’t contribute as much to our existing understanding of who is going to win these sorts of elections, as Stu Rothenberg summarizes in very readable terms. Since 90 percent of incumbents tend to win reelection to Congress for example, simply knowing which candidate is the incumbent takes you most of the way there when forecasting who will win.
To what has been written so far, I will add a few of my own observations.
- Pollsters would likely have read the research with more interest and less hostility, if not for the overblown description in the op-ed. True, if Rojas had stuck to the facts from his paper, and closer to his model, he may not have found a mainstream audience for his thoughts. But I would venture he would have had a ready audience in the polling world, since his method does appear to reveal data with interesting relationships to outcomes pollsters are interested in understanding.
- Social media changes too fast to give me much confidence that models which worked in 2010 will work in 2016 or 2020. The paper was based on Tweets from 2010, which will almost certainly be different from Tweets in 2016, and almost certainly will be different from whatever social media platforms we are all using in 2020. Different people join and leave each platform, how each is used changes, and, with each change, the structure and meaning of the data coming from the platform also changes. Each time this change happens, research (including the research by Rojas) needs to catch up to the new dynamics and data introduced by the new platforms.
- The polling world is entirely aware of how fast the world is changing and will not be taken off guard. Rojas appears to believe that pollsters are dogmatic practitioners of a dying religion, unable to change, and clinging to outmoded ideas like probability sampling as the only way to salvation. The reality is quite different. The polling industry’s annual conferences these last few years have been dominated by presentations on the uses of social media data, online surveys, and other new “non-probability” methods. AAPOR (the leading professional association for pollsters) recently released a 125-page report reviewing a variety of these methods.
Far from defending exclusive use of traditional methods, the polling industry is working hard to figure out where new methods fit in and how they supplement, challenge, or replace traditional polling methods. By misunderstanding both Twitter’s impact on polling, and the polling world’s reaction to Twitter, Rojas did his own research a disservice and unnecessarily antagonized a group that is actively trying to figure out what “digital democracy” means for its science.