James Wimberly has an interesting post up at RBC about the possibility of large-scale data mining as an intelligence-gathering tool, its costs, and benefits. I'm less sanguine about the possibilities of any such large-scale operation than he is, but he's absolutely correct to note that we need to be having this discussion in the open, rather than relying on the "lawless-unitary-executive-knows-best" model that we've been running on so far.
The problem with any such operation is dealing with the false positives, the things the algorithm says are suspicious that turn out to be nothing. Let's say his "third-degree" assumption is roughly accurate and we're targeting about one million people nationwide for data surveillance. Further assume that there are as many as 1,000 truly dangerous terrorist organizers in the US--determined, competent, well-financed. I'm not talking about janitors with fantasies of blowing up airports, I'm talking about people who have figured out how an airport could be sabotaged, and have access to the means to carry it out, and are motivated to do so.
First of all, we note that 1,000 / 1,000,000 = 0.1% of our targets are actually dangerous. The other 999,000 are not dangerous--not motivated, incompetent, don't have the means, whatever.
Now suppose we have a screening method that can detect 95% of the bad guys and screen out 99% of the non-bad-guys. This is, of course, MUCH better than any actual method can do. But run the numbers:
We find 950 out of 1,000 terrorists (true positives), leaving 50 dangerous people at large (false negatives--people we think are harmless, who really aren't).
We also round up 999,000 * 0.01 = 9, 990 people who aren't dangerous but weren't screened out--false positives.
Meaning we round up a total of 9,990 + 950 = 10,940 people, of whom 9,900 (90.49%) aren't dangerous.
This isn't as much of a needle-in-a-haystack problem as we started with, I'll grant. But what happens when we tell investigators to go through a list of people with suspicious data traffic, but to remember most are probably completely innocent?
Well, we know that about 5% of Americans cheat on their taxes. And we know that IRS auditors, who spend all day dealing with tax fraud, estimate that 30% of Americans cheat on their taxes. When you deal with something out of the ordinary all day long, you can forget how out of the ordinary it is. When you deal with tax cheats all day, you tend to overestimate the prevalence of tax cheats.
Good luck getting your investigators going through the list of "data-based suspects" to remember that 90% are probably innocent or harmless, or both.
We can't just sit back and do nothing. But we should also avoid falling into the trap laid out in Yes, Minister:
We must do something.
This is something.
Therefore, we must do this.
Friday, July 4, 2008
Privacy and Data Mining, and false positives
Monday, August 13, 2007
Sims R Us
An interesting update on the idea familiar to every reasonably bright 11-year-old: What if our world and everything in it, including me, is just part of someone's dream? How would we know? Would we know? Would it make a difference?
We couldn't tell we were part of a simulation, unless very subtle errors were introduced and clues left to point us toward them. But what sort of inconsistency/error in our observed world leads to the inescapable conclusion "The universe is actually a simulation"? Some oddness in the binary expansion of pi, as hypothesized in Contact? No, not really... the law of large numbers demands that if you follow pi out far enough, and its digits are essentially random (as they appear to be, though there are some interesting properties that no one's quite explained, such as why the digit 5 doesn't appear quite as often as it should)....sooner or later, you're going to get some whopping big coincidences, that not having whopping big coincidences would itself be extremely unlikely. Including something that could be interpreted as ASCII if you looked at it right.Dr. Bostrom assumes that technological advances could produce a computer with more processing power than all the brains in the world, and that advanced humans, or “posthumans,” could run “ancestor simulations” of their evolutionary history by creating virtual worlds inhabited by virtual people with fully developed virtual nervous systems.
Some computer experts have projected, based on trends in processing power, that we will have such a computer by the middle of this century, but it doesn’t matter for Dr. Bostrom’s argument whether it takes 50 years or 5 million years. If civilization survived long enough to reach that stage, and if the posthumans were to run lots of simulations for research purposes or entertainment, then the number of virtual ancestors they created would be vastly greater than the number of real ancestors.
There would be no way for any of these ancestors to know for sure whether they were virtual or real, because the sights and feelings they’d experience would be indistinguishable. But since there would be so many more virtual ancestors, any individual could figure that the odds made it nearly certain that he or she was living in a virtual world.
And that's the issue. Strictly speaking, it's not a scientific hypothesis, since there's no way to disprove it. It's an interesting bit of logic, but ultimately one possibility among many:
But there are a couple of alternative hypotheses.... One is that civilization never attains the technology to run simulations (perhaps because it self-destructs before reaching that stage). The other hypothesis is that posthumans decide not to run the simulations.To draw a comparison: Suppose I spend the evening playing Sims. To whatever extent they're "real," would knowing they're in a simulation make their immediate wants any less "real" to them? If they knew they were just data files and onscreen renderings, would they have a crisis of faith? Suicidal despair? Not likely.
If I turn out to be a sim in someone else's game...what difference does it make when I get up in the morning? What do I do differently, knowing I'm a sim? It may only be a simulated job and a simulated apartment, but it beats sleeping on the simulated streets.
Update: Someone's already been giving some thought to how to live as a simulation.
Tags: AI, games, philosophy, science
Wednesday, August 1, 2007
Cooperation as an evolutionary strategy
Interesting article over at NYT about how cooperation can lead to more adaptive behavior across a population, even (if I'm not reading too much into the article) if it sometimes leads to individuals being taken advantage of. Add in reputation effects, and the effect becomes more pronounced, and you get small tightly-knit 'communities' of cooperators, even if there's also a lot of noncooperation going on around them.
It's an interesting idea, and it certainly makes sense. (It also explains, to a degree, how cooperation breaks down during periods of strife. The risk of being taken advantage of is greater, and the payoff doesn't increase correspondingly. Thus you arrive at Hobbes' state of nature.)
Fascinating stuff.
Saturday, July 21, 2007
Checkers solved
This article shows, I suspect, the risk of what happens when a journalist either doesn't fully understand the subject (and thus isn't sure what questions to ask, especially followup questions) or isn't familiar enough with it to anticipate readers' questions.
The Chinook program can play a perfect game of checkers once the number of pieces on the board gets below a certain point (10 checkers). The article says "It can't lose" after that, based on an exhaustive database. It's literally looked at every position with 10 or fewer checkers still on the board. But is any such position always a draw? Obviously not. I think--speculating, I haven't read the full paper--that what they're getting at is this.
There must be some combinations of pieces in which one side can force a win; e.g. 9 white, 1 black, I doubt Chinook could force a draw playing black. (Or perhaps so, by forcing a "no legal moves for black" situation, or if checkers has a 3-fold repetition rule like chess.)
But--if earlier play were strong enough to ensure that such grossly unbalanced positions don't come up in play--the discussion is academic.
And given a large enough database, good enough heuristic rules and position evaluations, and enough time to 'back up' the results farther up the game tree (this position is a loss for black, so a position that turns into this one is also a loss for black), it may be possible (duh, apparently is) possible to keep material balanced enough & positioning strong enough that forced-loss positions can be avoided when the exhaustive database finally kicks in.
Solving chess is going to take longer, of course. And at one time the ability to play chess was generally regarded as prima facie evidence of artificial intelligence--until, of course, the first chess-playing programs were demonstrated, and it was obvious that what they were doing was nothing like what people do when playing chess. But why should they? They're not people, they have different strengths and abilities. They're very good at numbercrunching--thus, computer chess involves lots of number crunching. They're good at database searching--thus, computer checkers (and chess) involves lots of database searching.
okay, this is what happens when I blog first thing in the morning.... i'm rambling.