Earlier this year, we launched a data-driven local wire service that’s distributing thousands of semi-automated local stories across the U.S. every month.
Since then, we’ve expanded the scope of the project, incorporating additional data sets and designing templates to generate more locally relevant news and information across a growing swath of cities and regions.
Want to know why and how we’re doing it? Read on for a deep-dive into our process, methods, and motivation.
Why the focus on data?
With gaps in local news coverage widening across many parts of the country, and local publications laying off staff or shutting down altogether, there’s a pressing need for creative solutions to tackle the challenges faced by today’s news industry.
In many cases, it’s no longer feasible or cost-effective for individual reporters to investigate and write a local news story that may only be read by a small number of readers. But as the cacophony of information in today’s digital age have made it no easier—and perhaps much harder—to figure out what’s happening around us, local news and information remain as relevant and valuable as ever.
The digital age also presents new opportunities, including the growing availability of public and private datasets that can provide locally relevant information on everything from businesses and real estate to jobs, crime, transit and more. Using this kind of data, we have the opportunity to systematically identify developments of interest to local communities, and increase story volume and geographic coverage in efficient and economical ways. Our current data partners include Yelp, Zumper, Zillow, Crunchbase, SpotCrime, MaxPreps, Fandango, Groupon, Petfinder, Saildrone and more.
If you’re a company or organization with locally relevant data that could help to create local news stories, we want to hear from you! Please email us at firstname.lastname@example.org.
What’s all this about automation?
Hoodline got its start as a neighborhood news site featuring original reporting in San Francisco and Oakland. But the majority of Hoodline’s content today—thousands of stories published across 50+ cities around the U.S.—is mostly or fully automated based on locally relevant data.
We distribute this content to local partner publications to supplement their own content, providing stories that traditional newsrooms may not have the resources to cover, and that are designed to complement rather than replace human reporting.
Hoodline’s content represents a collaboration between experienced local reporters and innovative data scientists and engineers, combining the latest computational methods and tools with journalistic insights, news judgment, and thoughtful design to develop a new form of news reporting. We leverage machine learning and natural language processing, as well as creative engineering pipelines and thoughtful construction of replicable yet flexible narratives.
Our goal is to produce meaningful stories that capture important local happenings, that vary with each new development, that are delivered in a timely manner, and that fill existing news gaps and add real, practical value to local communities and everyday lives.
How does it all work?
Our journalists design templates for each type of story that we produce. A template is like a blueprint for a story: it contains complex logic and language that may be included based on different values in the data, and also includes placeholders for numbers, words, and phrases that come directly from the data sources, or that we extract, calculate, and generate based on trends and insights drawn from that data.
For the data and computational aspects of the process, we’ve developed our own library of tools to transform datasets into tables that populate each template, along with our own schema for the structure of data components that make up a narrative.
The articles generated from the transformed data tables and the pre-written templates become drafts available for editing and publishing in our proprietary content management system. Human editors review all drafts before they are finalized, identifying and resolving issues and filling in any remaining qualitative elements that may require manual effort.
These editors also provide feedback to the template designers, as we seek to automate as many tedious or repetitive components as we can to reduce costs and shift human effort to more valuable contributions. We also collect feedback from publishing partners and continuously seek to identify new datasets that would be valuable to incorporate, new story ideas worth covering, and improvements that can be made to existing stories.
Automation in the local news space is a young and evolving field, and there's still much to learn about the possibilities and pitfalls it presents. Many of us are experienced journalists, and others scientific researchers motivated by social challenges, which means that we're determined to report news as accurately as possible, understand the shortcomings in our data, and be thorough in validating the information we publish.
As this project evolves, we want your feedback to make it better. What’s missing in our reporting on your local area? Which types of stories are falling short, and what would make them more interesting or useful? Which of our stories are most engaging or valuable to you?
Do you have suggestions for other datasets or data sources that we might explore, to open up new areas of coverage, or to validate, complement, and enhance existing coverage? Or overall thoughts on what we’re doing in general?
Let us know what you think, right here.