Boundary Estimation – Siege Analytics

(TL;DR)

We make shapefiles from your voter files. They are accurate. They are standards compliant. They work.

Axioms

Elections are inherently demographic and geographic in that

- they concern the behaviours of large numbers of people within designated spaces,
- they are conducted by convincing large numbers of people to move from several addresses to a smaller set of addresses, wait in line and take an action,
- they ask those people to make decisions about where they live.

There is no singular United States Geographical Services Board to provide the spatialized data necessary to do geographic analysis for elections. Instead, what is produced is

- produced piecemeal by several different government entities,
  - on different timelines,
  - with different methodologies,
  - with different restrictions and availabilities, such that
  - even if we were to spend the time tracking down what data are produced, they still would not approximate national coverage in all the jurisdictions we need.

Why do we need estimated boundaries?

Although the problem that we are solving is inherently geographical and the data are spatial, without estimated boundaries, we cannot engage the problem geographically. We can do any other kind of analysis, but not meaningful geographical analysis. We have partial data available, in the form of a specific set of boundaries provided by the Census, and we have coördinates available for every household in America.

This allows us to do some things, but not everything that would be useful. For example, we are not able to take advantage of a fundamental tool of political science, the precinct level election results. As an accident of history, elections are conducted precinct by precinct, and, therefore, reported as such. Because precinct level boundaries are handled differently state to state, county to county, there is no way to do a national analysis of precincts.

Similarly, we are not able to look at smaller, less attended districts, such as school districts, school board districts, utility zones, city council districts, town council districts, etc.,

As such, we are forced to think of these things one level of abstraction away from what they really are, which is places and spaces that have aggregate properties, lie on top of each other, connect in complex ways, etc.

How do we estimate boundaries?

The short answer is that we do the world’s most tedious trigonometry to estimate boundaries. The longer answer is that we take a voter file that is geographically referenced, make use of all the listed address fields, geographical fields and Census data to provide an estimation of the boundaries in question, one at a time.

In our space, what we call a voter file is a tabular data structure in which rows represent new records and columns represent attributes of the records, with each record being an individual voter.

The voter file is our input. Without this file, nothing else may go forward. We require a voter file that is geographically referenced, meaning that each row should have values with :

- the full address information,
- districts information,
- coördinates in the form of (lon, lat)
- and then the precision score of the coördinates.

If coördinates and precision score are not available, then we are able to provide those (“geocode”) on our own, but this slows down the process and our resources are simply not as robust as those available to other actors.

This is our sole data requirement for input. We use a combination of open source GIS softwares supported by The Open Source Geospatial Foundation to do our computations.