Big data: what we know vs. all the rest
Before the media took notice of the evolving field of data analytics, Dr. Gary King ’80 (Political Science) often had trouble explaining his profession to those who asked.
“With such an abundance of data types and various analysis in practice, it was often confusing beyond clarification,” said King. “But now, when you say ‘big data’ it resonates with the public and fortunately helps convey the importance of this advancing field.”
Big data is a media-coined term that describes the large bulk of information – both structured and amorphous – that inundates our senses on a regular basis. Every digital process and social media conversation produces it. Systems and mobile devices transmit it.
The actual data in big data can exist as unstructured text in the form of emails, speeches, social media updates, scholarly literature and product reviews. It is made up of information that can be found via credit cards, sales transactions, cell phones, satellite imagery and various online behaviors. But according to King, this infinite amount of data alone does nothing; the current revolution is that we now know what to do with it.
“The goal for most scientific purposes isn’t the data set itself,” said King. “Data alone is not transformative. We must make it actionable.”
King is the Albert J. Weatherhead III University Professor at Harvard University – one of 24 with Harvard’s most distinguished faculty title – and Director of the Institute for Quantitative Social Science. There, he develops and applies empirical methods in many areas of social science research, focusing on innovations that span the range from statistical theory to practical application.
In 2016, the alumnus joined a panel of industry leaders and scholars at SUNY New Paltz to discuss big data with students interested in learning more about the explosion of educational and professional opportunities within the fi eld of data science.
“Students seeking a profession in data science need to understand the power of what they can do with access to new forms of data,” said King. “For example, the recent increase in human expression via social media is nearly useless without some type of analytic capacity.
Every day there are billions of publicly available social media posts, but without assistance from automated text analysis, no one person has the ability to understand what billions of others are saying.”
Understanding social media’s potential reach is just one aspect of King’s highly regarded work. His research on legislative redistricting has been used in most American States as well as the U.S. Supreme Court. He has led an evaluation of the Mexican universal health insurance program, which included the largest randomized health policy experiment to date. He has reverse engineered Chinese censorship, and worked on a wide range of other projects that utilize the power of big data analytics. He is also a founder, and an inventor of the original technology for, Learning Catalytics (acquired by Pearson), Crimson Hexagon, Perusall, and others.
“The increasing availability of digitized text and the diversity of data now available presents enormous opportunities for social scientists and students pursuing the field,” said King. “The amount of new data is enormously more informative, but the revolutionary power is in what to do with these unusual measurements, assets and content types.”