Someone from MIT also contributed to this study in the same vein http://www.nature.com/articles/srep01376 From the abstract, "[I]n a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals."
Those projects are related, but definitely different. The first is about email metadata. The second is about re-identifying cellphone location data. This paper is about telephone call and text metadata, which is what the NSA collects.
> The law currently treats call content and metadata separately and makes it easier for government agencies to obtain metadata, in part because it assumes that it shouldn’t be possible to infer specific sensitive details about people based on metadata alone.
This quote from the Stanford News article is incorrect. The reason metadata is carved out is because there is Supreme Court precedent carving out metadata: https://en.wikipedia.org/wiki/Smith_v._Maryland. And that case has nothing to do with what can or cannot be inferred from metadata.[1] It distinguishes call data from call metadata because the latter is routinely recorded and used by phone companies for various purposes:
> First, we doubt that people in general entertain any actual expectation of privacy in the numbers they dial. All telephone users realize that they must "convey" phone numbers to the telephone company, since it is through telephone company switching equipment that their calls are completed. All subscribers realize, moreover, that the phone company has facilities for making permanent records of the numbers they dial, for they see a list of their long-distance (toll) calls on their monthly bills.
[1] Because that's totally irrelevant to the 4th amendment.
I know this is not how the law and precedents work in the USA, but for the sake of common sense, someone needs to call this out:
The quoted line of reasoning is absolutely disingenuous when considered in a modern and realistic setting.
What is being wilfully ignored is the change in quality of the information gleaned from the analysis of data as it is being done in bulk.
Nobody in 1979 could have foreseen the sort of information that can be extracted from the unimaginably large fire-hose of metadata we generate today.
It's a completely different thing if you had a list of who-called-who-when in 1979. How was this data kept? Well, for starters it probably wasn't centralized. Was it even digital? Probably, yes? Even if it was digital, the computers of that day could only handle trivial amounts of data. Factor in the ubiquity of phones and phone-usage today versus back then, to even consider the concept vaguely comparable is ridiculous.
It's the difference between getting records from the electricity company so that you know which parts of your house were illuminated when, versus getting "records" from all the individual CCD elements of various cameras installed in your house so that you know the same thing, which (tiny) parts of your house were illuminated when. That's the same thing right? It's just a tiny bit more fine-grained (/s).
Just because people might be okay with the former (say because you can see what lights are on from behind the curtains, on the street), doesn't mean they'd be fine with the latter.
The quality, and therefore the privacy-expectations, of the information extracted from the data changes as you blow up the number of records by some orders of magnitude. Almost nobody from 1979 could have imagined what that would mean. Almost nobody could even have fathomed the amount of computational power a desktop computer could throw at it. Hell, not even most people today can grasp that.
So there's "innocent" metadata that via some unfathomable process can be transmuted into some rather more detailed and revealing information. It's really quite hard to get a proper perspective on it (our brains aren't made for reasoning about graphs this size). I think it's more fair to call this process "magic", than otherwise. And in that case it really doesn't matter where it came from, say magic works, does it matter if the government knows all the details of your life from divining tea-leafs or divining phone metadata?
Now add to this, that thanks to having sufficiently-advanced our technology, it is also possible, using similar techniques of "magic", for the phone companies to keep records for billing purposes in a really clever (magic/encrypted) way to shield the data from those kinds of divination while still being able to do their billing. The only reasonable expectation I have is for these paradigm shifts to be applied on both sides, equally.
5 years or so ago I was on a jury once concerning sexual misconduct, and spent a good hour or so in the deliberation scanning the submitted call records, looking for gaps in correspondence between the two relevant phone numbers. It was pretty easy to identify moments when the two parties were together, that's when they stopped texting, and checked they that matched up with all of the testimonies. They did, and it solidified a guilty verdict with my fellow jurors (sans one, on one count, but we had 7 already).
That hinges strongly on the meaning of the word 'all' in your comment, if all is < 5 then that's not a very strong argument at all, if all is > 100 then it probably is (but it would still need analysis of a much larger volume of calls of pairs of phones to determine how often that situation would occur due to chance).
I should mention that it wasn't the only piece of evidence we looked over. We deliberated for around 6 hours I believe. Forgive me if I wasn't clear; I didn't intend to make it seem like I was the lone seeker of truth and made a grand argument to my peers; there was a lot discussed, by everybody. Also keep in mind that when you're on a jury, you only have so much to go on, and few resources with which to analyze. Thats why the phrasing is "beyond reasonable doubt."
I think if "only so much" is too little, and "few resources" are not enough, then you have no means to reach "beyond reasonable doubt", and you should say so.
I was convinced beyond reasonable doubt, having phone records that matched rather well with testimonies. I also might have fixated on it and ignored chance, but that's why there are 12 jurors.
> That's why there is no jury in all countries that I know of except America.
The UK uses juries in much the same way as the US does, as do many other countries whose legal systems derives from that of the UK (which are a fair number, because colonialism.)
Other countries definitely use juries, just not as universally as the US. They more often just have a judge or a panel of judges, or sometimes a sort of jury that includes judges.
I think electing judges might also be a uniquely American thing, and less than half the country does it. It's a much weirder practice than using juries. Obviously you can still say a judge was appointed by people elected with lobbyist money, but it's not that direct.
Might be my lack of knowledge in the subject, but I've never heard of electing judges. Isn't it just a study and a background check or so, at least in western Europe?
Not only do some states have elections, but they have partisan elections, which is just obviously a dumb idea. I have no idea how it works in other countries, but in most US states judges are either elected by the legislature or appointed by the governor.
Suppose somebody's alibi is "I was driving between these two cities at the time, and then I spent the night working."
But their car-odometer didn't change, and their home-electricity usage dropped to zero that day.
Both of those observations are surely "evidence". Similarly, a digital signal still carries information, even though the zeroes are an "absence" of voltage.
Odometers increment on driven cars. The odometer did not increment. The car was driven. Unsatisfied.
People having sex are together. People who are together don't text each other. During a certain hour a person didn't text 68 contacts. At this certain hour, this certain person had sex with 68 people. Satisfiable.
It's more like: these two numbers are in constant contact all other hours of the day. At this hour midday, when others, including one of the relevant parties, say those two numbers are together, there is no correspondence. Later, the correspondence picks up as before.
You seem to be aware, at least on a subconscious level, of the larger point - as you've demonstrated by including additional evidence. The larger point being that the absence of phone records can, at best, disconfirm. There are far too many alternative explanations for it to be used alone or as confirmation.
Remember, in court we're not talking about logical proof (eliminating every alternative but the desired conclusion), but statistical proof (eliminating alternative conclusions with 51% or 95% certainty). Given an assumed or established statistical model of the probability of related events (e.g. knowing that a secretary at a business makes a log entry 99.9% of the time when she sends out a mailing), a non-happening (say the absence of a log entry) can easily establish a conclusion to the desired level of certainty.
Darn, I already replied to this point moments ago. There is a difference in the application of this logic in the court room where it can be challenged, and the juror room where it cannot - right?
I see your point of high probability they might have been together. But I don't see evidence for "sexual misconduct" during the time they were together. They could have just talked.
Anyway, the whole charge is disgusting. Leave them alone dammit.
There is nothing logically wrong with that. Evidence is anything that makes a conclusion more or less likely (in the Bayesian sense). All else being equal, knowing of a gap in an otherwise constant stream of communication makes a in-person meet-up more likely than it is without that knowledge.
Not every piece of evidence (in the Bayesian sense) is admissible in every court though, polygraphs being an obvious example. That is why courts get all pissy when jurors go off the reservation and start doing their own research. I'd be a little surprised if the judge in this case enthusiastically endorsed this sort of analysis going on in jury deliberation, but given your background - I'm interested to hear if I'm mistaken.
Evidence can be excluded if it's not relevant (in the Bayesian sense), or unreliable. You might have some argument that the phone records don't make the conclusion sufficiently more likely to warrant admitting it. But the bar for relevance is not high.
Moreover, this is kind of what juries do best: inferring what people did from other things they did or didn't do. While juries are subject to a lot of biases, I think humans have a good statistical model of the behaviors of other humans.
I guess I've been mistaken in my understanding of the whole process. I was under the impression that all the controls are in place specifically to restrict the juror's natural inclination to infer, because humans really suck at logical inference (and not just because of bias). I'd always assumed that an an innovation in natural language processing and combination with formal logic would be welcome in the court room, but it doesn't sound like that would be the case if empiricism is preferred to rationality.
This is just not true -- there are tons of highly informative, relevant kinds of Bayesian evidence that are excluded. For example, remaining silent, illegally-obtained evidence, hearsay, opinion of experts (in many cases).
A case built on circumstantial evidence is the very definition of reasonable doubt and should be tossed out! I don't understand why so many people think the purpose of court is to convict people! You're question could litterally be reduced to "But how are we supposed to find them guilty if we don't have evidence?"
Ultimately, what is "reasonable doubt" is up to the jury.
I can't say I agree with the use of circumstantial evidence, but that's how the court works. Additionally, many people do get off based on lack of clear evidence?
It's off-topic, but I'm deathly curious - what kind of sexual misconduct is so serious that two people consensually meeting warrants a jury trial, complete with subpoenaed phone logs?
Odd that the phone records were needed. If I thought the victim was telling the truth about beimg abused then the records wouldn't matter and if I had reason to doubt them then I don't see matching phone records satisfying the doubt.
> what kind of sexual misconduct is so serious that two people consensually meeting warrants a jury trial, complete with subpoenaed phone logs?
Their meeting up may have been consensual, but what happened while they were together might not have been. Also, the parent doesn't say both parties wanted to meet up. So, the meeting-up could have not been consensual. It could have been a stalking situation or maybe they just happened to run into each other by chance.
I am reminded of an interesting observation by the mathematician Terence Tao about how our anonymity on the Internet and in a connected world is so fragile [1]. Basically, because there are only 3 billion internet users, every person can be uniquely identified by a 31 bit number. The uncovering of each bit gets one closer to the identity of the person. Seen in this light, one would expect metadata to uncover quite a few of the 31 bits!
Thanks for this. I would have been happier if the authors had phrased their findings in term of differential privacy rather than the effectiveness of the algorithms they were able to achieve.
When setting policy, it is better to have theoretical mathematical results rather than empirical effectiveness, since you can bet the technological frontier of privacy violation is a moving one. As with cryptography, you want solid foundations in unbreakable maths -- not 'we can't break this cipher with what we know today'. Probably, someone can.
The real problem with issues like this is that while I have chosen to remove myself from certain social media (have a facebook but it's poisoned data), is that friends and family aren't as paranoid or knowledgeable about privacy implications as I am, so I have to remind people to not tag me, not upload photos of me, etc.
Friends and family are reporting on their friends and family without even understanding the implications of what they are doing.
It may seem benign at the moment, but given the nature of the turn-key totalitarian state, it's when that key gets turned and the cat starts getting walked back that this sort of information leakage from unexpected sources really becomes an issue.
It does, hence why I suggest people manage their own but spike the data... and be sure to use tor or similar to connect. (Which I only check every six months or so, just to make sure facebook doesn't try to pull any more "oh hey, we made your entire profile public" stunts again.)
> in part because it assumes that it shouldn’t be possible to infer specific sensitive details about people based on metadata alone.
> One of the government’s justifications for allowing law enforcement and national security agencies to access metadata without warrants is the underlying belief that it’s not sensitive information.
Is this true? I didn't think this was an actual part of the argument for using metadata, but that metadata wasn't covered under current laws, and was therefore easier to get.
I was working under the assumption that it was an unintentional oversight, not an intentional hole in legislation.
In England the government say (when they want to expand laws to make it easier for them to get metadata) that it doesn't contain any content, it's just about who you call or who calls you and for how long. They never say that individuals cannot be identified from this - information about identified individuals is the point of gathering the data.
So they seem clear about the difference between content and meta, and that metadata will identify people.
They're less clear about the further de-anonymisation aspects of "just" metadata, and it's hard to know if that's because they don't know or don't care.
This paper will be read by two of the authors in the Stanford EE Computer Systems Colloquium, EE380, http://ee380.stanford.edu. EE380 is a public lecture--anyone is welcome to attend or watch the live stream video. The talk video will be posted to YouTube the day following the presentation. For details, see the announcement at http://ee380.stanford.edu/Abstracts/160518.html.
There was an interesting and informative blog post along similar lines (that I think was linked here not that long ago), on, "Using Metadata to find Paul Revere".
You can draw social graphs from who calls who, just like Facebook can? No shit. (And Facebook is scary right?)
You can find a person's city of residence in 57% of the cases? That's pretty bad, I'd almost feel oddly relieved my data is saying so little, but I'm afraid the NSA would do a better job.
You can predict who is pregnant? And who owns a rifle? Alright now we are getting somewhere. The article didn't mention how many cases succeed here so I'm not sure if I should be impressed, since the rifle hotline or a licensing agency thing (I don't know how that works) would be pretty obvious.
I feel the media has taken a useful word in the technology world and made it almost useless for general usage. I'm scared to even mention "metadata" to people even in relevant technical context as the word has become politicised and loaded, just like I can't call myself a "hacker" any more.
> In combination with independent reviews that have found bulk metadata surveillance to be an ineffective intelligence strategy, our findings should give policymakers pause when authorizing such programs.
If metadata has such power, why do they say that it is an "ineffective intelligence strategy"?
I think they mean that it's not as useful in discovering and preventing unknown threats, but it's great for tracking a particular person (so it's a better tool for surveillance of citizens).
Someone from MIT also contributed to this study in the same vein http://www.nature.com/articles/srep01376 From the abstract, "[I]n a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals."