Thursday, October 17, 2013

Thoughts on using metadata to detect conspiracies

The word for today is combinatorics.

When you are trying to track down spies and conspirators, it seems like an excellent idea to find out who they contact and look for the clusters of connections that suggest that "here be the leaders." If you can find strong connections between known enemies and Joe X, he becomes a person of interest.

The problem with this is that anybody with any tradecraft is going to use cutouts and dead drops, and generally make things as muddy as possible. The old hollow tree is pretty low tech, but it works. I suppose (not being in the business myself) that bulletin boards and comment sections would make very good sites for leaving coded messages.

But, says you, that's just what the metadata searching should help with. If NSA knows Muhammad Smith's favorite web sites, they can look at who else shares those and once had a history of going to jihadist web sites.

How many is that? Suppose Muhammad Bombmaster posts sporadically on 20 different web sites, including the Minneapolis Bayou. Jason Wannabe used to visit jihadist sites, but after a few long talks with his new partners he uses a different computers for different jobs. He visits 10 different web sites including the Bayou. Jackson Nobody visits 15, including the Bayou, as do about 5000 other people. Of that 5000, most live in Minneapolis and maybe 5% have wondered what the heck an al Qaeda web site looked like.

So Jason Wannabe is 1 out of 200. Presumably the non-Minneapolis sites have lower percentages of visitors who've seen a jihadist web site, so maybe the rates are 1 out of 10 instead. So you are watching about 400 different people. Not bad so far. But if each of them is in contact (Twitter, Facebook, etc) with 50 other people, you now have of the order of 20,000 people in the second level, and a million in the third level. OK, say a quarter million--there'll be overlap. In fact, overlaps are just the things you'll be looking for to locate conspirators. Most overlaps will be easy to understand, given time; accidental ones are harder to understand because there isn't a reason. (Put 30 random
people in a room. What are the odds that at least 2 share a birthday? Over 70%)

If your criteria for suspicion are loose, you have a lot of false positives on your hands. If they're tight a little tradecraft will flummox you. Trails peter out in a sea of noise.

I figured that if they're tracing phone calls, there are only 6, and maybe 5 links between me and the Boston bomber. I dunno whether he shared any plans with his imam: all the links may be innocent; I know they are from this end...

The initial claim was that this kind of tracking prevented several dozen terrorist attacks. Then the number was rounded down to 1 or 2. I don't expect to hear honest answers--it might give away useful secrets, or it might reveal that they can't do what they say. I'd not be surprised if the number was 0.

On the other hand it is very much easier to do ex post facto analyses and try to find co-conspirators. If A, B, and C are involved you can look at the sites they had in common and the links they shared and sort through only a few hundred people looking for suspicious activity. That's orders of magnitude easier. But it is also a little late.

It remains to be proven that this vast a tool can be used for the announced purpose.

We were told that some of the engineers used the system to "spy" on their spouses/live-ins. That seems like an obvious way to test the system; try to see if it works with someone you know. If you find that it misses calls you know about, you know the system isn't working. And if there are phone calls you didn't know about--you drill down a bit and find out if they're real, and if they are then there is domestic discord.

Of course there's more than just metadata--Snowden wasn't telling us anything particularly new. Nobody can handle all the raw data for everybody, but they can drill down for an individual suspect, and presumably get a lot of information--online purchases, emails, phone calls, and so on. From this raw material you can assemble the detailed links for a legal case, or for further monitoring.

This is also a useful tool if the individual is suspected of conspiring to be a Congressman of the wrong political party. Politics demands some deal making that constituents might not be happy to read a transcript of, and power attracts groupies--after a few years I'd guess that quite a few people would have some dirty laundry. Maybe not even anything illegal (like forgetting to pay sales tax on online purchases?), but enough to worry them into going along. It wouldn't take large numbers.

We just saw the Republicans go down to dramatic defeat and get nothing for their pains--a few got pork for the home district but there's no budget and no debt limit any more (unless there's a supermajority to vote against raising it), which used to mean that the king could do whatever he pleased. Probably still does.

If I'm right about the spy system we put together, we'll see more of this sort of curious business in years to come, because the system is much easier to use against individuals than in tracking spies. There will always be a temptation to use it against political foes "for the greater good" and we don't typically elect leaders on the basis of their integrity.

No comments: