Houston 3-1-1 Data : Initial Impressions

The city of Houston recently launched the Open Houston 3-1-1 Data Challenge: a month-long “civic hacking” event where teams and individuals can pair with the city of Houston to develop applications and data visualizations that make practical use of the City of Houston’s 3-1-1 data.

…And what is 3-1-1 data, you may ask?

3-1-1 is a central, non-emergency telephone number that provides quick municipal services in many communities throughout Canada and the United States.  When a citizen calls in via 3-1-1, their request is logged in a database, and tracked until it reaches completion.  General rule of thumb for calling in: “Burning building? Call 9-1-1. Burning question or request? Call 3-1-1.”

In large cities, such as Houston, the number of calls is astronomical — for example, in just the past month, our city has had over 20,000 unique 3-1-1 reports!

Imagine the value of this data for politicians, for real estate agents, and for concerned citizens:  if you’d like to purchase a home, you probably want to ensure that it’s in an area where pipelines aren’t leaking, where stray animals aren’t running rampant, and where there are few reports of vagrants and noise complaints;  if you’re a politician, you probably want to know about the problems concerning your constituents;  If you’re a citizen, you want to ensure that the broken fire hydrant on your block will be fixed in a timely manner; etc.  Potential insight into public issues is endless, and it’s for this reason that sites like Open311 have been launched.

A (very) preliminary look at last month’s Houston 3-1-1 data shows:

Image

Districts C and D seem to have the largest percentage of calls, by far, even though each of the eleven districts represent just under 10% of the Houston populace (2010 census data) and cover similar land areas:

Percentage of Houston population by district (2010)

Percentage of Houston population by district (2010)

* * * * *

Houston City Council districts

Houston City Council districts

There also seems to be a trend in service requests finished early and late, by department — not to mention some obviously erroneous data.  Somehow it doesn’t seem likely that 20% of tasks could be completed 180 days ahead of schedule…

Major challenges are going to be data clean-up and attempting to extract meaning from the large amount of information we have currently, in addition to the thousands of entries streaming in each night.  Also excited to start playing with the METRO public transportation and hydrology .shp files available via the Code for Houston website.

* * * * * * * * * *

METHODOLOGY:

  • Downloaded the current month‘s 3-1-1 data, saved as 311-Public-Data-Extract-monthly-clean.txt
  • awk-ed out the DISTRICTS column, saved as 311_districts.txt (see one-liner below)
  • short and messy matplotlib pie chart (see script below)

== AWK ONE-LINER ==
awk -F"|" '{print $4}' 311-Public-Data-Extract-monthly-clean.txt > 311_districts.txt

== PIE CHART ==
#! /usr/bin/python

from pylab import *
import collections
import re

words = re.findall(r'\w+', open('311_districts.txt').read().lower())
tally = collections.Counter(words).most_common(len(collections.Counter(words)))

figure(1, figsize=(8,8))
ax = axes([0.1, 0.1, 0.8, 0.8])

labels = []
fracs = []

for i in tally:

labels.append(i[0])

fracs.append(i[1])

explode = (0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

pie(fracs, explode=explode, labels=labels, colors = ['b', 'y', 'g', 'b', 'w', 'r', 'y', 'g', 'b', 'w', 'r', 'y', 'g'], autopct = '%1.1f%%', shadow=True)

title('311 data percentages, via district', bbox={'facecolor':'0.8', 'pad':8})

show()