Journalism Professor Ken Blake has been into data long before it became cool.
Ken, who holds a bachelor’s degree and master’s degree from Marshall University, became hooked on data analysis during his days as a newspaper reporter covering local government when he began using spreadsheets to analyze campaign contributions and voting patterns. He left journalism to earn a doctorate in mass communication at the University of North Carolina, Chapel Hill, where he studied statistics and public opinion polling.
A member of the MTSU faculty since 1996, he serves as director of the Office of Communication Research and co-founder of the MTSU Poll, which operated from 1998 until 2018.
In addition to his research, he maintains free online video courses in data journalism and Excel-based statistical analysis. He’s the founder and editor of The Data Reporter, a demonstration site for various data journalism techniques.
He spoke to Beverly Keel about coding, fake news and his work in data journalism.
You can’t mention your work here without mentioning the MTSU Poll, which received tremendous press coverage when its results were released. Why did the poll end and where can we find the data from now on?
: Most people in the college who know anything about my scholarship probably know that I co-founded the twice-annual MTSU Poll in 1998 with former faculty member Dr. Bob Wyatt and ran it for two decades with the help of Dr. Jason Reineke, the poll’s associate director. The poll ended in 2018, due chiefly to the increasing difficulty and expense of producing accurate poll results despite rising cell phone use and declining response rates, but Dr. Reineke and I published the poll’s last nine datasets and associated analyses in the prestigious iPoll archive at Cornell University’s Roper Center for Public Opinion Research, where the data will be available to scholars for years to come. See: https://ropercenter.cornell.edu/middle-tennessee-state-university-poll.
After the poll ended, you began focusing on your longstanding interest in data journalism. What can you tell us about that work?
In 2019, Dr. Jason Reineke and I published Data Skills for Media Professionals, a book about producing journalism-oriented data analysis, data visualizations, interactive maps, and inferential statistics using tools including Google Sheets, XLMiner, Google My Maps, and QGIS. All are intuitive, open-source and/or free tools that anyone can use. On my web site, drkblake.com, I have published a number of free, video-driven tutorials about the skills and techniques the book covers. Also, the book has become the basis for a new SOJSM course, JOUR 3841, “Data Skills for Media Professionals,” approved just last year. A little inside joke: 3841 may seem like an odd choice for a course number, but I chose it deliberately. One of the first things you learn how to do in a basic stats course is perform a chi-squared test on a two-by-two contingency table. The critical value for rejecting the null hypothesis in such a test, a number every stats student encounters early and often, is 3.841. That’s where the course’s number came from. For me, it’s like sneaking a little nerd graffiti into the MTSU course catalog.
Tell us about learning coding in Python.
With time on my hands during the first summer of the pandemic, I intensified the pursuit of my ambition to learn coding. I started with Python, the ubiquitous coding language named not after the snake, as it might seem, but after Monty Python’s Flying Circus, which the language’s “how to” guides constantly reference. One of my first accomplishments was to perfect a script that can connect to the Twitter Application Program Interface – “API,” for short – find tweets that match specified search criteria, and download each one in spreadsheet format along with “metadata” such the date and time the tweet was posted, its source, and the number of “favorites” and “retweets” it received. The script can be run at a single point in time or set to run iteratively, grabbing tweets as often as the user likes, for however long the user wants it to keep going. I published the script on my web site at https://drkblake.com/tweepy-tweet-scraper-2-o/, and the Society of Professional Journalists’ online “Journalist’s Toolbox” site has since linked to the page (See: https://www.journaliststoolbox.org/2022/01/30/social-media-scraping-and-analytics-tools/). I also have a script that will scrape the most recent 3,000 tweets posted by a given Twitter user. Dr. Matt Taylor and I are using the script in a study we are collaborating on involving how nonprofits use social media to promote engagement and donation. More recently, I developed a script for working with the Global Database of Events, Language and Tone (GDELT), which comprehensively scrapes articles published online by news outlets from around the world, including NYTimes.com, WashingtonPost.com, APNews.com, and more. The script (see https://drkblake.com/gdeltheadlinescrape/) can connect to GDELT’s API, search for articles that contain specified keywords, and download headlines, URLs, and publication dates for the articles found. Recent Media Communication MS graduate Christian Antonacci and I used the script to gather data for his thesis, “Trump’s Pandemic: A Content Analysis of Attribute Agendas in U.S. News Media Coverage of the COVID-19 Outbreak Through November 2020.” Drawing from second-level agenda-setting theory, the thesis showed that, during the first year of the COVID-19 pandemic, national U.S. news media coverage of the virus mentioned then-President Donald Trump more often than any other aspect of the story, including some that would seem as important, if not moreso, such as the number of deaths attributed to the virus, the virus’s impact on the economy and schools, the importance of wearing masks, and progress toward developing vaccines. I have attached a .pdf of a key graphic from the thesis. It shows coverage volume, over time, for each of the 10-most-mentioned attributes in COVID-19 coverage by the national news outlets covered in the study. The findings help explain, perhaps, why attitudes toward the pandemic split so early and so persistently along partisan political lines. The script has considerable potential for investigating similar patterns in national news coverage of any recent major story. I have collected data, for example, showing when coverage of the Black Lives Matter movement began, peaked, and tapered off, and how its coverage volume interacted with the coverage volume of other stories unfolding at the same time. My goal is to refine the tool further and begin promoting it as a resource available through the college’s Office of Communication Research, which I head, and make my Twitter API scripts available through the SOJSM’s forthcoming social media insight lab.
You have also learned coding in R. Tell us what that is.
Late this summer, I expanded my coding skills to include R, an open-source programming language focused, more so than Python, on high-level statistical computing, data visualization, and mapping. R is rapidly replacing IBM SPSS as the go-to computer package for statistical analysis, chiefly because, unlike SPSS, it is completely free to use. As such, it is a far more practical choice for students, especially those who will graduate into non-academic careers, where the deep academic discounts on SPSS licensing fees are unavailable. I introduced R into the Media Communication graduate program’s empirical media theory and methodology course this past fall. I also am adding a unit on R this semester to my data-journalism-focused undergraduate course in reporting, and I am including R instruction in the “Integrating Practical Data Skills Into the Classroom” Faculty Learning Community I am co-directing this academic year with Dr. Sally Ann Cruikshank. The FLC includes faculty from SOJSM as well as from a range of departments across campus.
What can you tell us about the two projects you are launching this spring?
The first involves working with my former MTSU Poll colleague, Dr. Jason Reineke, to acquire academic access to Version 2.0 of the Twitter API and develop Python and/or R scripts for using it. The new API, and academic-level access to it, together offer substantial advantages over general access to the original API. In particular, we will be able to access any amount of Twitter content from any time period in Twitter’s history, back to the very first Tweet. We also will be able to take advantage of a new option for retrieving all non-deleted tweets that are replies to, retweets of, or otherwise connected to some original tweet. Imagine, for example, being able to retrieve and analyze all tweets that reference the original “#MeToo” tweet, or the original “Black Lives Matter” tweet. The second project involves launching – again, with Dr. Jason Reineke – a recurring Qualtrics survey measuring belief among adult U.S. users of the World Wide Web in a rolling series of “fake news” assertions, and developing and testing theoretical models that explain and predict such belief. Dr. Reineke and I see this project as critically important in the current environment, where political actors and others in the public sphere often seem simply unmoored from objective reality, and in ways that are undermining democratic institutions, public health, and other fundamental public goods.