Posted on 09/10/18 by LA Counts
Nina has been a Systems Analyst at the Los Angeles County Auditor-Controller’s Office for over 11 years. She works on web applications to improve the county’s internal operations. Three years ago, she learned of civic tech and open data and very quickly discovered this was her passion. She is now a Co-Captain of Hack For LA, a local non-profit that empowers its tech-focused volunteers to improve their communities by building and creating in conjunction with other non-profit agencies and government departments.
How’d you get into data?
I’ve been a data nerd at heart for as long as I can remember, but my usage of data was mostly to generate and visualize data to improve my own life. I didn't think about data at a professional level until Los Angeles County released its Open Data Portal. From that moment on, I started researching open data, APIs, civic tech, etc. and I was immediately hooked on becoming an advocate for open data.
In your opinion, what gives a dataset value?
For me, a dataset is most valuable when it can be combined with other datasets. The benefit of combining datasets is that you gain insights beyond what you would see with only a single dataset. To be able to do that you need a contextual understanding of a raw dataset - things like data dictionaries and the kinds of insights that come from experts with intimate knowledge of the data.
Often times, data portals will only provide aggregated data or data that is somehow combined or filtered behind the scenes. The excuse is often that this is for privacy purposes, but I suspect that it’s often not strictly necessary and could be provided in a different, suitably anonymized way. Aggregation shows us data through someone’s predefined lens, and any meaning we gain from it will still be filtered through that lens. For instance, Los Angeles County has a dataset that shows numbers for employee demographics, but it shows counts based on gender, ethnicity, and job category. Is it really necessary to aggregate this data? If so, there’s been no contextual understanding for the user to understand. I believe value is derived from whether we can glean anything of use from the data and combining multiple datasets or understanding context through the use of data stories are just a few ways of showing off the value of a dataset.
What issue in Los Angeles do you think has the most potential for a data-driven solution?
I want to say housing. Housing is one of the biggest problems we have in Los Angeles and I believe it is a driver for many other problems Angelinos face, like homelessness and transportation issues. But it’s not just about the quantity of available housing; it’s about having enough housing that makes sense for people, financially and geographically.
If the only affordable housing I can find is 30+ miles away from my workplace or the places I frequent the most, I am contributing to vehicular traffic and I am losing time spent with family. I become too tired to care about anything beyond my own needs and contributing back to my community becomes difficult too. Without enough housing that “makes sense” for people, housing becomes another issue that piles on to the mountain of issues Angelinos face daily. I’d like to see data driving regional policies, incentives, and campaigns that encourage denser housing or matches people with housing they can afford in the areas it makes sense for them to live in.
Share and walk through an example of your work related to data.
My work responsibilities revolve around building web applications and systems for the County of Los Angeles. As it relates to data, my work is very involved in defining the data being stored and then implementing the systems that gather and store that data.
My biggest project right now has to do with improving the information we have about LA County’s contracts. Every department handles its own contracting, so it’s been historically difficult to get a big picture view. Specific details about the contract terms are often manually gathered because there is no digital storage of that data. My role on the team is to work with two sets of key users. First, I work with the departments that will use our system to run reports (ie. potential end-users). Second, I work with the people who input the contract data into our system (ie. data providers). In working with both sets of users, I’m given a chance to determine the best way of collecting and storing data while designing the best experience for our end users.
What’s your favorite “data-story”?
The first thing that comes to mind is a data visualization called built:LA (http://cityhubla.github.io/LA_Building_Age/) that was created by a friend and fellow MaptimeLA organizer, Omar Ureta.
This visualization maps out data about building age from the Los Angeles County Assessor. It creates a striking visual by assigning individual colors to each building shape according to the building’s age. The map turns into a spectrum of colors laid out across the landscape of Los Angeles County. It’s perfect in its balance of complexity and simplicity, making it approachable for data experts and non-data experts alike.
Very quickly, you can look at this map and identify interesting areas you want to explore. As an Angelino looking at this map I can draw so much meaning by connecting it to my own experience of LA. Maps are so great at creating that strong personal connection with the viewer.
What advice do you have for someone looking to start using LA Counts datasets to tell their own stories?
No matter where you’re getting your data, the very first thing I’d recommend is to keep an open and curious mind. Don’t start with preconceptions and narratives that could blind you to what the data actually says. Instead, I highly recommend playing with datasets with an intention to discover a narrative or glean insights based on what the data reveals and then start your research from there.
I’d also add that as data users we must be cautious about the data we’re analyzing because there can be limitations on the datasets that we aren’t immediately aware of. More specifically, there might have been limitations on the way the data was gathered, how it was stored, or other aspects that can greatly affect the fullness and truthfulness of the dataset. Here’s an example related to contracts: what if I want to know the total sum a department has budgeted for their contracts this year? You would think you could just sum the budget amounts for each contract, however, if you did that in our current system your sum would be incorrect. Some contracts don’t have an individual budget - they share a pool of money where a single contract could receive the whole amount or none of it. However, the system can only store one budgeted amount for each contract and there’s no way to indicate which contracts are part of this pool. People have gotten around it by equally splitting the amount between each contract, duplicating the whole amount for each contract, and putting the whole amount on a single contract while leaving the others blank. None of these options provide an accurate view. If the collection method or storage method limited, how can we take this as truthful or useful data?
My advice for someone looking at LA Counts datasets remains the same. By being inquisitive and looking at data first you’ll begin to ask yourself questions like: “What am I looking at? What is this data trying to tell me?” And at the same time, be cautious about immediately taking any dataset as truth without understanding its limitations.