In our deliberations about how the Count Love app should work, we've given much thought to the role of protests and the act of being counted as forms of political speech. We want to let individual participants declare their presence and preserve that declaration as historical record. Yet, we are not naive to the reality that the act of making one's self known at a protest comes with neither a guarantee of privacy nor safety.
With the goal to record political dissent while respecting individual privacy, we designed a technical solution to privately count individuals by limiting our ability to answer the question, “Have we seen this individual before?” We store enough information to be able to claim with certainty that we have never counted a protestor before. However, we do not store enough information to link any individual with a particular submission record. This key feature allows us to know when we should count a new submission while also preventing us from placing a specific person at a demonstration.
The puppy matching conundrum
As a simple example, consider the following puppy matching game. You are allowed to adopt one Labrador puppy, but you are blindfolded and cannot see any of them (truly, this is not more ridiculous than any other contemporary “matching” reality show). You are allowed to pet the puppies as many times as you want, and someone will tell you the color of the puppy that you've just pet. The first puppy that you pet is a black Labrador (seems to like to lick hands). The second puppy that you pet is a yellow Labrador (very soft, but a bit aloof). The third puppy that you pet is a black Labrador—but is this the same as the first puppy? You only know for certain that you have pet at least two and at most three puppies. Asked if you've met one of the chocolate labs yet, you can answer “no” with complete certainty. However, asked if you've pet Logan the black Labrador yet, you can only answer, “I don't know.” This puppy counting conundrum illustrates how a Bloom filter works, and it is the key technology that allows us to count submissions anonymously.
In this blog post, we describe our Bloom filter implementation in detail along with other techniques that we use to encrypt and anonymize submissions. We share these technical details because we believe in privacy through peer review, mathematical uncertainty, and computational intractability. We welcome feedback and suggestions for improvement.
Our software stack for counting submissions consists of an iOS app that transmits location and demonstration information and a server application that determines uniqueness and aggregates event details. We wrote the iOS client in Swift and the server application in PHP.
Constructing and transmitting the payload
When a user submits an event, the app prepares a payload using iOS's unique device ID, a serialized version of the phone's location, and an event description. Because the iOS device ID is unique, we never include it in its raw form. Instead, we combine the ID with the current date and then take the SHA-256 hash, producing an identifier specific to that phone on that day. We never transmit the payload unencrypted, and we never store the identifier hash on our servers. We believe these practices should make the task of obtaining and reversing a hash back to its original specific app ID quite difficult.
All communications between the iOS app and countlove.org occur over HTTPS. To transmit its payload, the app makes an HTTPS post request using Apple's URLSession class. Our SSL certificate comes from Let's Encrypt, and to minimize the possibility of a man-in-the-middle attack, we've pinned the issuing certificate authority to prevent the app from communicating with other HTTPS endpoints. For more motivation on why everyone should pin their certificates, read this fascinating story about Gmail and certificate authorities.
Server-side Bloom filtering and data aggregation
When our server receives a submission, we extract the hashed device ID and pass it to the Bloom filter to determine submission uniqueness. You can learn more about Bloom filters here and here.
Our current Bloom filter implementation creates eight different hashes based off of the hashed device ID. We treat these hashes as extremely large numbers, take the modulus of these hashes and the size of the Bloom filter (which varies depending on the size of unique submissions that we want to be able to count), and then check for the existence of all eight remainders in our table of previously seen hashes. If at least one remainder is new, we know that it is a unique submission. If every remainder already exists in the table, we count the submission as a potential duplicate.
For a specific event, this gives us a range of counts. We know that the minimum number of attendees that we can count is the number of unique submissions, and the maximum number of attendees that we can count is the number of unique submissions plus the number of potential duplicates. After determining uniqueness, we discard the hashed ID, update the Bloom filter table with any new remainder values from the last set of modulus operations, and archive the location and event description data for aggregation.
Each night, we reset the Bloom filter by emptying the table and then inserting a few random values back into the table so that the first few submissions after a reset cannot be identified by their set of moduli.
Through the use of encryption, SSL pinning, hashing, and minimal data collection and storage, we've attempted to reduce the ways that someone could obtain an unencrypted payload and then use that payload to conclusively link a submission record to a particular person. These techniques allow us to record counts of individual protesters without jeopardizing their privacy. We welcome any feedback to help us further these safeguards.