Data Storytelling: How to Interrogate and Unlock Databases
- David Moore
- May 7
- 6 min read
Updated: May 9
Few of us have access to personal data like Facebook founder Mark Zuckerberg or Amazon founder Jeff Bezos.
But we can leverage large troves of data to advance our work. One way I do that – and help others do the same – is to use software to work with numbers, facts and databases to visualize larger or different stories than what everyone else is telling.
Here's a link to my previous blog post for more about why data-driven stories are superior to many other forms of storytelling.
Using data storytelling, I've been able to:
identify the biggest DUI hot spot in Texas (near Dallas Cowboys' AT&T Stadium, in the story linked above);
locate some of the worst subsidized housing in Dallas; and
determine the most prolific patent litigation attorneys (and judges) in U.S. federal courts.
Who cares about telling stories with these often gargantuan datasets? To name a few: personal injury lawyers (who earn billions every year in the U.S.), housing advocates (who receive a cut of $6 billion in annual housing subsidies) and patent litigators (who generate an average of $2 billion annually).
Attorneys, housing industry experts, consumer product safety groups, etc., are all fighting for their respective place in the marketplace of ideas. Why wouldn't they want to know more than their competitors? Why wouldn't they want to be seen as leaders in their respective fields, discussing intel that was derived from a trusted-but-obtuse database? Why wouldn't they want to use available data to their advantage?
As a data journalist, I used to burn through databases left and right. Every Memorial Day, I'd trot out Ye Olde Boat Accident Database.
This is about more than telling stories.
Now, I'm looking at multiple uses for datasets – the same database that will help trucking firms analyze their lapses in safety practices can be used by tire companies looking for buyers for their tires (unsafe tires show up on federal truck inspections). This data – when properly filtered and analyzed – contains actionable information that companies and organizations can use to increase their business or advance their missions.
For example, right now, I'm mapping the most treacherous roads for wintertime driving in an East Coast state.
OK, enough background, now let's talk about how to find the databases that might benefit you, and how to interrogate them as an effective data storyteller, to discern if they have the answers you want.
Ask yourself these questions before tackling data storytelling:

1) What Data Do You Already Have for Data Storytelling?
Sometimes, a database might be right under your nose. Recently, I found out a charity I work with – Safe Healthy Playing Fields – possessed a database of artificial playing surfaces in our area. This was perfect for the nonprofit, which advocates for natural playing surfaces (and opposes synthetic fields). Turns out, the Sierra Club forwarded that database to Safe Healthy Playing Fields, and I was able to use it to formulate a Facebook campaign that would use data storytelling to inform individuals who live by these artificial surfaces of hazards synthetic fields pose. (Few people know that synthetic surfaces last about a decade before they must be landfilled.) The Sierra Club database – which was gathering dust – could play a key role in educating the public, if given the chance. The database was hand-built, and relatively small. But it had enormous potential.

2) What Stories or Data Does Your Organization or Client Care About?
Here's where your powers of observation, reading comprehension, and listening skills kick in. What is important to your client or partner organization? How do they make their money or serve their clients? Likely, there's a database that relates to it (obesity? The rights of cyclists? Affordable housing?).
I'm getting to the point where I can now say "Show me your cause/story, and I'll find you a database." That's because there are so many databases available for public access. If your client wants to expand their customer base through media interviews, and they do personal injury accidents, this is a pretty easy slam dunk, after you ask a few questions.

3) Can Your Database be Trusted for Data Storytelling?
Now that you've found a database, it doesn't mean you can trust it. How can an inanimate object NOT be trustworthy? Here's how: I recently set out to map DUI crashes with a database, only to realize it was giving me a bunch of bad answers. Every piece of data I got from the database looked like a DUI crash. When I saw that there were large numbers of DUI crashes away from urban areas, I knew something was amiss. Upon further review, I noticed there were as many DUI crashes as there were total traffic reports. But I didn't discard the entire database. I looked at other data categories – such as accidents caused by slippery or icy conditions – and recognized that category of data was solid. The same goes for fatal accidents. Now, I'm mapping the road segments with the most ice- and snow-related accidents. If that one flies, I'll map the fatal accidents (the deadliest stretches of roads). Maybe I’ll look at motorcycle crashes next. The road’s the limit for federally required vehicle crash, for data storytelling.
Though I was annoyed at first, at least I hadn't gone public with the bad data, and embarrassed anyone. And now I've got a new database to plumb.

4) What Story Do You Want to Tell with Your Data?
This might be the biggest challenge for budding data storytellers: how can I make a story out of a big pile of data? Is there a story? To be certain, not all databases are created equally. I've seen some that were hopelessly, fatally flawed. But once you're confident that your data is OK, and you know what your organization/client is interested in, it's time to look into the soul of your database: The record layout and/or the readme file (if there is one).
A record layout (hopefully) tells you what a database tracks, at a granular, record-by-record level. Hopefully, now that you know what your team cares about, you'll be able to tell if there's anything that might interest them enough to look at trends. Sometimes, there’s no change in totals, year after year. Is that a story? Probably not. Well, at least you tried.
4 quick things data storytellers do when they get a database:
A) Check data categories (also known as column headers) and a record layout for things that catch their attention (or gives them a "Holy Crap!" reaction). If it's a column that displays a trucking company's safety score, they might sort that, to see which operation has the lowest score. Personally, I would check to see if there's a "comments" header that discloses what got them into trouble. If they’re the worst, others probably doing the same thing.
B) If the data includes latitude/longitude information, that opens up the realm of mapping specific occurrences. Other times, they'll include just street addresses, or maybe city names, etc. People love to look at maps. It could be someplace you've never been before. But plot data on a map, with a bit of analysis (the most unsafe daycares), and people will beat a path to your map.
C) Determine if there's a measurable increase or decrease of events in a category of interest. For example, is your city seeing more teardowns due to the construction of McMansions? Are there more smartphone-related car crashes? (Good luck with that one – few people will admit to fiddling with their phones and causing a wreck that way.) One of my favorite mapping projects involved comparing truck wrecks year-over-year, and determining that fracking in the Eagle Ford Shale between Austin and San Antonio, Texas, for natural gas had spurred an enormous leap in truck accidents. Later, I mapped alcohol sales over the same period, and saw a huge spike in alcohol sales in that same area. Of course, now I wish I had overlayed those two datasets to show an interesting (and horrifying) trend. But that's all data past the bridge now.
D) Make sure they have the right software skills before making any eye-bulging claims (for all caveats, read my prior blog post).
You're now free to commit random acts of data storytelling at will!
Comments