Finding the Story in the Data

Posted on 07/06/17 by Andrew Schrock


Data science is often a world of fast-paced “boot camps” and speculation. A dizzying array of platforms and companies promise improved efficiency and new insights. In the rush, local communities and their stories can get left out, which is a loss for us all. How can we treat data as an opportunity for learning about civic issues that affects us all?

Cities and companies generate data that can be mined for new insights to guide public policy and citizen action. Data can reveal aspects of a community that remain hidden to mere observation. Combining data literacies with real-world problem-solving around civic issues. Forward-thinking journalism professors and city employees in Los Angeles County have seized this opportunity by using open data in education.

Dana Chinn is a lecturer at the Annenberg School for Communication and Journalism at the University of Southern California. She teaches students how to find new stories in data. Her data journalism courses start with the issues. Students research crime, gun violence, and water use. She takes an issues-first approach because, “the subject matter context is important to learn before you even look at the data.” They download data sets from public websites and use basic statistics to give visibility to the the issue. Data journalism can be less flashy than students expect. It can take time to start to see trends in the data.

“Collecting data should be part of your ‘beat’,” she says, referring to particular topics that journalists specialize in. Chinn prefers Microsoft Excel to more complex data analysis packages. She describes data literacies as one tool in a “critical thinking toolbox.” Another tool is realism. “Applying math and using it to put together a story is different than taking a test.” For each story she asks students, “why is this important to the public?” Her final test is challenging students to describe how they came to their conclusions, so anyone doing follow-up research can understand their analysis.

Similarly, students in Gwen Shaffer’s Cal State Long Beach class Enterprise Reporting in Diverse Communities are challenged to find the story in the data. She starts by presenting vibrant examples of data journalism. “I expose students to some of the amazing data-driven stories created by journalists at the New York Times, the Los Angeles Times, and the Atlantic.” Students are encouraged to follow a topic that they are passionate about. In one memorable project in Shaffer’s class, a student analyzed the lack of African American males enrolled in teacher credential programs. They found a lack of representation, which supported the idea that increasing the number of black men in teaching could help students in the school district. Data helped strengthen the story about representation in local schools, making the case for a much-needed change.

Open data matters to Shaffer, who also helped write Long Beach’s open data policy. “In order for democracy to thrive, the public must be informed,” she said. “If a reporter wants to dig deep they need access to data.” Traditionally journalists have relied on Freedom Of Information Act (FOIA) requests to obtain data from government. But these requests can be troublesome. Requests can take months to complete and take time out of local government employees’ busy schedules. And government doesn’t have every piece of data you might need for your analysis. A more sensible route for all involved is to make raw data available for download from multiple sources. Civic data education often stretch far beyond the classroom. “Civic hackathons” are events where anyone can help tackle a pressing social issue. Participants at these events learn about data analysis and how to help their community. The City of Los Angeles and LA County recently collaborated on a “datathon” at the Reef. Artists, students, and activists streamed through frosted double doors into a clean, breezy space with white tables. They were welcomed with a message of inclusion and respect. A “Guide for Engagement” posted on the wall encouraged people to listen actively to the perspectives of others. Each table featured a small pamphlet “Guide to Spreadsheets for the Spreadsheet-Phobic.” The warm chatter of people meeting and catching up filled the room.

Speakers were excited to talk about the importance of arts access to Angelenos. The event even garnered participation from federal agencies. Sunil Lyengar, Research & Analysis Director at the National Endowment for the Arts (NEA), described increasing participation in the arts as a noble goal. “We want to empower citizens to run their own queries,” he said. Lyengar described data analysis as essential for revealing local trends. The data sets presented that day were detailed down to the level of individual murals, museums, and historic buildings.

Bronwyn Mauldin, director of research and evaluation at the LA County Arts Commission, reinforced that data access was a right. Her message of empowerment segued seamlessly into a hands-on workshop with data sets. Groups divided up to brainstorm questions that might be answered by the data. They were invited to download data sets on Los Angeles’ community centers, public art, and libraries. Were there gaps in access? Who might be unintentionally excluded?

Mauldin summed up the goal of the day as a form of public service. “Part of what we want to do is help people become aware of the data that is available and get them smarter about using arts data in developing programming and providing services around the county.” At the end of the day participants presented their analyses to judges. The group I was in found a correlation between the gentrifying neighborhoods and nonprofit arts organizations. Another made the case for arts education creating local leaders. The Arts Datathon was just one datathon organized around important civic topics like declining fisheries, prisoner’s rights, and air quality.

The lesson of data journalism and datathons is that there is no substitute for caring about local issues. It doesn’t make sense to be data literate but culture blind. Data analysis needs to start with subject matter expertise and understanding where the numbers came from. And we live in an era where it pays to be critical of findings. Used without caution, data can lead you to conclusions that don’t end up being true. One such “spurious correlation” is the number of films Nicholas Cage starred in is statistically related with the number of people who drowned by falling into a pool. There is nothing connecting the two, the math just works out that way. So a little domain knowledge goes a long way.

Analyzing data doesn’t require a degree in computer science or statistics. In fact, looking at data education in LA shows how low the barriers are to analyzing data. While Hadoop and Tableau are helpful for big data analysis, they are not a great place for beginners. Chinn and Shaffer start students off with descriptive statistics in Microsoft Excel. Mauldin made a pamphlet for the datathon that was cheekily titled, “A Guide to Spreadsheets for the Spreadsheet-Phobic.” You probably have the tools for data analysis on your computer right now. How you use them to make an impact in your community is up to you. But it will always pay to start with a solid grounding in questions that arise from local issues.

Linked stories

comments powered by Disqus