Skip to content
CD constituencydata

Open data

Why voter data should be open to every campaign

The legacy vendor model locks $30,000 behind the same public records a freedom-of-information request can produce. We think that's a structural problem — and this piece lays out the thesis.

ConstituencyData Research · ·7 min read
open data thesis policy

Every state election authority in the United States maintains a voter file. Every state makes it available, in some form, to campaigns, parties, journalists, researchers, and in most states, ordinary citizens. The statutes that govern this access — California Elections Code § 2194, Florida § 97.0585, Nevada § 293.440, and forty-seven peers — all rest on the same premise: voter rolls are a public record, because elections are a public trust.

And yet, if you are a first-time candidate for a state house seat, the practical experience of trying to use this data is indistinguishable from trying to license a proprietary product. The raw files are formatted for archivists, not operators. A handful of well-resourced vendors — everyone in the industry knows the names — ingest the rolls, clean them, join them to consumer data and turnout models, and license access back for somewhere between $10,000 and $60,000 per cycle, depending on the state and the tier.

This is not a conspiracy. The vendors do real work. Cleaning 50 state voter files into a single queryable system is genuinely difficult, and a working product deserves to be paid for. The problem is that the price point has calcified at a level that presumes a certain kind of campaign — one with a finance team, a vendor budget, and at minimum a modest congressional-race war chest.

It leaves out almost everyone.

The missing 95%

There are roughly 519,000 elected offices in the United States. Roughly 537 of them are federal. About 7,400 are state legislative. The remaining 511,000 are the ones that actually shape the daily texture of American life: school boards, city councils, county commissions, sheriffs, judges, water districts, library boards.

Almost none of those races can afford a legacy vendor contract. Many of them have a campaign budget smaller than the setup fee. The result is predictable: at the local level, political data is effectively a luxury good, and campaigns that don’t have it run blind. They canvass neighborhoods they’d win anyway, call donors they’ll never convert, spend money on mail that reaches the wrong voters, and lose races they could have won.

It’s not just bad for the candidates. It’s bad for representative democracy. When only the well-funded candidate can see the electorate clearly, the well-funded candidate tends to win, and the people who would have made the best local officials — teachers, nurses, small-business owners, veterans — never run.

What “open” actually means

We want to be precise. “Open” does not mean “free for all uses.” Voter files carry real use restrictions under state law, and most of those restrictions exist for good reason. You cannot — and should not be able to — use voter data to solicit commercial sales, to intimidate voters, or to compile a profiling dossier to sell to the highest bidder.

What “open” means, in our view, is:

  1. The base data is accessible. A candidate with a valid affidavit of permitted use can query the file without signing a $30,000 contract.
  2. The schema is documented in public. Anyone can see what fields exist, how they’re derived, and where they came from — no NDA required.
  3. The methodology is auditable. If we build a turnout model or a donor propensity score on top of the data, we publish how we built it. Campaigns deserve to know what the model is actually saying.
  4. Commercial extraction is bounded. We do not re-sell voter-file-derived audiences to marketers. Full stop. The data is for civic work.

That last point is the critical commercial question, and we think it separates legitimate open-data infrastructure from the broker ecosystem. The moment your business model requires extracting voter data for commercial resale, your incentives diverge from the civic purpose of the file. The entire point of the legal framework is that this data exists because elections require it. It doesn’t exist so that somebody can enrich their third-party marketing graph.

Why we can afford to do this

The obvious objection: if this data is so hard to process, how are we pricing Pro at $99 a month?

Two reasons. First, the infrastructure cost of cleaning and normalizing a voter file, amortized across every campaign that uses it, is a small number per campaign. The legacy vendors are not charging $30,000 because it costs $30,000 to service you. They’re charging what the market will bear, and the market — until recently — had no alternative.

Second, AI collapsed the interface cost. A huge part of what you used to pay the vendors for was the humans on the other end who could write you a VAN query or pull a target universe. That work now costs cents. A campaign staffer with no SQL training can ask Civitas for a list in English and get it in three seconds. The staffing layer of the old model was expensive because it was labor. It isn’t anymore.

We’re a platform business. The data catalog is the commons. We make money on the platform that makes the data legible — the AI queries, the routing, the donor scoring, the message stress test. That’s the right alignment. A campaign gets stronger the more of the platform it uses; the data itself stays available to anyone who just wants to browse.

What this unlocks

Here is what we expect to happen, at scale, as this kind of infrastructure becomes normal:

  • More first-time candidates run, because the data ceiling drops out of the way.
  • Races below the congressional level become more contested, because challengers can see the electorate as clearly as incumbents.
  • Small-dollar donors become findable for candidates who aren’t already nationally famous.
  • Volunteer time compounds, because canvass routes stop wasting Saturdays on doors that won’t open.
  • Journalists and academics build a richer record of how American democracy actually works at the level where most of it happens.

None of this is utopian. The data has been public the whole time. What was missing was a credible, affordable, non-extractive layer between the raw records and the people who have a legitimate reason to use them.

We’re building that layer. If you’re running a race — or thinking about one — start with the free account and let us know what’s missing. The catalog is going to get richer. The pricing is going to stay where it is.

Elections belong to everyone. It’s time the data matched.

Keep going

Turn this into action.

Everything discussed above is queryable inside Civitas. Free account, 50 queries a month, no credit card.