Sunday, October 21, 2007

Math Vs. Gut in World Series Prediction (Yes, It's Boston...)

click image to enlarge

If you look at the prediction, based on season numbers, the Red Sox have an overwhelming advantage, based on runs scored, runs against and Log5. I know what many people are saying: how can the Rockies have such a low chance? Well, a season is not the last 22 games of regular season and playoffs. When someone is making a business decision, will they choose the business that has performed the best for the longest, or do they choose the team that has performed the best only for the last month? Not too many decision-makers take the temporarily hot pick, they take the reliable pick.

And that's what the numbers say: Boston is the reliable pick, and quite handily. Colorado doesn't have much of a chance, although you might like their make-up and the fact that they've managed an amazing feat to reach the playoffs and then sweep so far in the playoffs. But Math says that if the Red Sox and Rockies played 100 series, the Red Sox would win almost 70% of the time. Red Sox should win their second World Series in 4 years.

Unless you believe in Colbert logic: math is only as accurate as the gut that produces it...

(Cross-posted in Big Shoulders Sports)

Monday, October 8, 2007

A Rocky Mountain High and a Boston Tea Party

Click Image to Enlarge

I guess $200,000,000 plus payrolls just don't go as far as they used to anymore.

The probable winners in the LCS's will be the Rockies and the Red Sox. Of course, that's not to say that they will win. As I often find myself explaining to people that the statistics show the most probable world, not the way the world will be. That's also why I don't take this information and bet on it. It's entirely possible that the Cleveland Indians can stun Boston (37% chance is pretty high). But it is unlikely. It should be a good series, though, that's for sure.

And in the NL, the ratings bonanza that is two teams formed in the Nineties in the baseball mecca of this country, the D-backs certainly could overcome an incredible home field advantage that is Rockies Stadium, but it's unlikely.

On a side note, as to method: I'm still on board with log5, and the same with pythagorean, but I'm thinking the better way to slice and dice the runs is perhaps not just home and away runs/runs against, but to look at it only against teams that are better than .500. Too often in a short series, which is a crapshoot anyway, the real way to examine teams is to look at how well they performed against their real competition, rather than against the dregs of their own division. If I had the stats to do that, I would use that method instead. For instance, the Cubs totally fattened their pythagorean record on their bad division, whereas the Arizona record didn't get properly weighted for playing against basically a whole division of playoff contenders (minus San Francisco--and even they weren't pushovers!). We'll see what we can do next year...

(Editor's Note: I changed the image for two reasons from the original post: 1. there was an error in the formula for the home series in both LCSs; 2. I added the current, as of Oct. 13, changes in the odds based on the games played.)

Sunday, October 7, 2007

So Sad About Us...

As The Who sang so many years ago, it fits Cubs' fans today. Even when the odds seem to favor them, the "odd" seems to take hold, and off goes another season ending to the sad refrain of "wait 'til next year..."

The Phillies Phans as well, are sad, and most likely will those of LA and NY. It's possible that all teams might sweep today--now that's a weird one (chance: 3%).

click to enlarge image

I will post LCS odds after all contenders are in.

Friday, October 5, 2007

Bronx Bombers Become Bug Bombers, Boston Brilliant

Sorry--I love alliteration.

In any case, the Yankees now have less of a chance to beat Cleveland than the Cubs do of beating the Snakes, even though they both play the next two (if there are two) at home.

A note while I watch the Red Sox play the Angels: Steve Stone is the best analyst in the game. I remember back in Chicago, and listening to him carry Harry Caray during all those years--when Caray was cute, but inarticulate. I've played baseball, as a semi-professional, and thought I was pretty knowledgeable about the game. But in almost every single game, Stone would point something out that I either didn't know, had forgotten, or didn't think of--it was educationsl and entertaining because he was a very intelligent and articulate announcer. It made me completely forget about the great Brickhouse/Boudreau years as I was growing up (and, boy, were those guys good!). Listening to him announce these games was as good as it gets--it truly puts those Faux (Fox?) announcers to shame, they sometimes seem that the more words that issue from their mouths, the more it shows how little they know about the game (thanks TM for that one).

In any case, the question to be asked is what is more accurate now? The odds from before the two losses for these teams, or the odds as they stand after them? Here are the chances for these teams to sweep the next 3 games:
Cubs, 18.4%
Yankees, 17.6%
Phillies, 10.2%
Angels, 8.6%

Now you might think: "them's not good odds"--and you'd be right. But of course, that's the same chance that the Rockies and Phillies had of winning their playoff spots a week ago...

Also, you might also be asking what are the odds that all four series would be 2-0? Of course I have the number for you: 2.9% or less than 3 out of 100 (I aim to please).

Here's the updated odds after tonight's games.

click the image to enlarge

Thursday, October 4, 2007

Cubs Snakebit, Phillies Phoiled and Yankees? Yikes!

Looks like the Cubs are going to have to put it all together in the last 3 games of this series, they don't look at all like the team that that overcame adversity and battled all year. Soriano, in particular, looks like the second coming of a young Shawon Dunston, haplessly flailing away at curves out of the strike zone. The Phillies look like a total fluke now that had no business making the post-season.

The Yankees have only one problem, pitching. Their hitting has been so good that it masked their pitching during the season, but it looks like it might be exposed in the playoffs. I still stand by my stats, but all of these teams have seen their probability of reaching the LCS drop to 10-20%. But do keep in mind that probabilities are subtractive as well as additive, so a win by any of these underdogs (now) will reset the probabilities again.

Click here for the larger image

Wednesday, October 3, 2007

The Odds Have Changed, Even for the Cubs

Two of the favorites tonight have held serve, but the Cubs did not. Even though the odds overall were in the Cubs' favor, most knowledgeable knew that beating Brandon Webb at home was a tall order. Although the Cubs odds of winning have gone down with this loss, I still think that they'll probably win the next two of three games, as originally predicted.

Boston and Colorado now have commanding odds of winning their series, while we wait for the Yankees to do their bit tomorrow.

Click the image to enlarge it. A win for the first game of each series received a 1, and losing a 0 (for the chances of winning, which are now definite).

Monday, October 1, 2007

Maybe for the Next Four Weeks I Call This Illumi-Baseballrati...

This will be a cross-post with Big Shoulders Sports.

Two years ago on Big Shoulders Sports, I blogged about the postseason chances of the White Sox. Re-reading prior to this post surprises me that I was so accurate (I predicted a Sox victory)--perhaps I need to call a bookie!

In any case, my method is to use a combination of Bill James' Pythagorean Theorem and his Log5 Theorem for one-game win probability. My little wrinkle is using home/away numbers instead of totals, and then to total up all the probabilities adding up to 3 wins (win-win-win, win-loss-win-win, win-loss-loss-win-win, etc.). Because probabilities are additive, the chances of any scenario can just be added up (where the total equals 100%).

So what do the numbers show?


(Click to enlarge image)

The Yankees hold an advantage over Cleveland, Boston looks very tough to beat by the Angels in the AL. In the NL, the Cubs should win the first playoff series over the Diamondbacks, and the Rockies have a slight edge over the Phillies.

Now, I know what you're saying: how can your numbers show the Cubs having such an advantage over the team with the best record in the league? It's simple, the D-backs have a below-.500 pythagorean record, which is almost unheard of in the playoffs. The Cubs have a decent home-field advantage and are far better on the road--this will translate into an NLDS victory for the North Siders.

As the games are played, I will update this table and post it here (and on Big Shoulders), since the odds will change with each victory and loss.

A Quick One About My Baseball Blog Posts...

As you may have seen in one of my earlier posts, I also do a Stats-type Baseball Posting on an Ex-Pat Chicago Sports Web site.

It's always nice to be appreciated, as my Season Predictions proved highly accurate, my fellow blogger posted about that.

I love Bill James' Sabermetric work, and I used his Win-Shares formula to see how the off-season would add up for the Cubs. Turns out I was only off by one game (I predicted the Cubs would win only 84 or 85 games).

My mid-season prediction was eerily similar to my pre-season prediction, and the eventual outcome. I should say that the Pythagorean Theorem was particularly accurate, because I didn't devise this, Bill James did, with only a little programmatic application that I did which was to include the season as it unfolds and the dwindling number of opportunities, along with the OPI (over-performance index). This prediction also made my Yankees-loving friends extremely happy. Most interesting in this prediction was the fact that I had the Cubs winning the division even though their performance at that time had resulted in an under-.500 record and Milwaukee was winning at a high clip.

I will be cross-posting my play-off predictions here and on Big Shoulders Sports.

Sunday, August 19, 2007

Caveat Rector: Wannabe President Beware--You Will Be Found Out!

Since a friend pointed me to the story of WikiScanner early last week, I've been reminded of my previous post of helping politicians technologically: If you've done something stupid, it will catch up to you. And so caveat rector, literally, ruler beware, but this will apply more to the wannabe rulers out there. Add WikiScanner as an additional nightmare for politicians and their PR people, along with current nightmares of YouTube and Viral Videos.

What kind of information will Wikiscanner divulge about Wiki entries? Will we learn that Hilary's camp was "correcting" Obama's name to Osama? Will we discover that Rudy Giuliani has edited the postings on transgender information? It's a brave new world, isn't it?

What Virgil Griffith has done is not in fact new: most IT people and otherwise technologically-capable Internauts knew that you could Whois an IP or DNS fairly easily. But Griffith has just democratized a little slice of the Web, and particularly, the already democratic Wikipedia. Now, not only can more people look at this little-used information, but it will be easier to read and compile for everybody, and it will allow various other Web mash-ups to take this information and create even more useful links: like anyone who made a Wikipedia edit and to whom they've contributed money (WikiScanner + OpenSecrets.org?).

Since the news broke, we've learned that the CIA doesn't have anything better to do than to troll Wikipedia and occasionally make edits; that Diebold employees helped out their CEO with some revisionist history; and that a South African government worker apparently believes that HIV/AIDS is not a problem in his country.

I have a feeling that as people start using WikiScanner, that we will be hearing about a lot of different conflicts of interest, slander, etc.

Saturday, June 30, 2007

iPhony or iPhonetastic? An Early Adopter Explains Why He'll Wait on This One Until Gen2.0


Last night, The Wife and I were near the Apple Store on 5th Avenue around Midnight. We thought we would check out the latest electronic gadget. It would not surprise anybody to know that the writer of a blog titled what this one is would be what is called in the business an Early Adopter. Not only am I an early adopter, I'm also an Early Proseletyzer, because if I like the gadget, everyone I know certainly will know about it.

So did I buy the iPhone?

The answer, surprisingly, is no. Why?

The iPhone is beautiful, simply put. The technology packed into this tiny, thin, lightweight device is alluring, in a digital way, of course. But I'm not going to buy it. These are some reasons why:

  • I want to listen to music and I have a big collection, over 30GB of mp3s. Why would I want an iPod that only holds 4GB or 8GB? I like being able to put my entire collection on there and listening to anything I feel like. This does not replace my 2nd generation iPod (which replaced a 1G iPod of course).


  • I can't use it with the really fast Verizon network I'm currently enjoying with my Motorola Q (which I early adopted and proseletyzed to many, including my boss at GMHC). Web pages looked great on the iPhone, way better than my Q, but they were a lot slower-loading than I'm used to.


  • I use my Q for work email, which means it has to work with Exchange Server. Also, my Q syncs with Outlook (and Entourage). There is no mention anywhere of a seamless synchronization with these important workplace technologies.


  • Even though my current contract with Verizon finished just this month, and I did not have to pay a cancellation fee, and even though I already pay $99/m for my plan, it is really not worth paying $600 for a new phone or iPod, or iPhone plus iPod.



This morning I read the review in the New York Times by David Pogue who seemed to be dead-set against the iPhone. His main arguments were the battery life, lack of a physical keyboard, and the AT&T plan. I think the battery life problem is really not a big deal, because if it's a phone, who's going to keep it more than two years, really?

Mossberg also mentioned the physical keyboard, as did Michael Robertson, formerly of Linspire and MP3.Com, in one of his self-serving broadcast emails ostensibly trying to be "helpful" to potential iPhone purchasers. Having used it, I don't think it will be much of a problem--people will get used to it. Also, I believe that it will become far less of a liability once voice dial becomes a feature. Not only that, it's probably too short-sighted to claim something like this will be the "downfall" of the iPhone--user-interface design is changing user interaction rapidly, and Apple has been at the forefront since the beginning of the digital revolution, so I don't buy this argument.

Cost, though, that might do it for some people. I'm not exactly frugal--but something has to be a life-changing purchase for me, either by allowing me to listen to a large music collection on the go, check my email, check that I'm still leading my fantasy baseball team (thank you, Q!), use Google Maps and gmail, and put it all in my regular phone, that will do it, and my Q at $149 was somewhat expensive. But I really can't justify $600 plus all that New York City tax to go to slower plan.

You know what I will buy, though? An 80GB iPod that looks exactly the same as the iPhone, without the phone features, but with 802.11g Wireless technology. For $350.

And next year's 2nd gen iPhone, when my Q becomes two years old!

NOTE: first image by Mary S. Butler, Q photo, on msb_nyc Flickr.com; second image by me using the iPhone at the Apple Store

Tuesday, June 12, 2007

Yes, I Like Baseball...

I do not have a Techology post, but here is one of my seemingly never-ending blogs on another blog about baseball. Go to Big Shoulders Sports to see this post utilizing Bill James' Pythagorean Wins theorem...

Saturday, May 12, 2007

Politicians Caught in the Web

It wasn't very long ago that a politician could derail his campaign with a badly-timed photo op (or paparazzi shot): Gary Hart on the boat with a jeans model, Dukakis in a tank (I see this one in my nightmares), or even very recently with Kerry windsurfing. You think to yourself, "Knowing the scrutiny they were getting in the middle of a campaign season, how could they let themselves be caught in so unflattering a manner?"

Well, now it is 2007. A politician can be so lucky as to have a gaffe only caught in a picture in a newspaper. Post-George Allen, it is a Brave New World. Not only will a pol have to be "on" (as they say in acting) all the time, but they will have to be vigilant of who is in the vicinity in their least guarded moments (take heed John Edwards, and lay off the spittle, Wolfowitz!). And in the wake of the Hilary Big Brother (Sister?) viral video, controlling the message just got a lot more difficult. I offer this as a primer for the technologically inept politicians of how they can try to do their best at controlling their image (and words!).

"Macaca" Meltdown Moderation:

  • Assume you are being taped at all times. Why?

    • Digital Video camcorders are ubiquitous and tiny--they fit in pockets

    • Most telephones now have digital video/audio capabilities and nearly everyone has one

  • Resist the urge to try to be funny; remember: you're a politician, not a comedian

  • If you're making a TV appearance, remember there's a camera in every room



Control the Message:

  • When a viral video breaks, make sure you have a response: just like the old publicity saw goes, "any publicity is good publicity"--you need to use it to your advantage.

  • If the video of you serenading a coat rack at an orgy becomes a huge hit on YouTube, well then you might have to get creative (I don't know how to get out of that one, maybe hire Mel Gibson's agency!)



Online Polls:

  • Online polls are the worst sort of polls--there is no scientific sampling whatsoever: NEVER QUOTE THESE

  • Never put these on your site, either

  • On the other hand, when you're on a sympathetic Web site, you definitely need to have a high total, because it does mean that Internauts are paying attention; this information might help in fundraising and knowing where to put ad money



A final point: your history is pretty much publicly available. As you have no doubt seen in the last few months, the charities, political groups, and organizations to whom you've contributed is freely available on the Web, and "mash-ups"--so called because they mash together information from many disparate sources into one (from the hip-hop/dance mash-ups putting different songs together and still forming one beat) are really changing the political environment. They have caused Rudy Giuliani, to his credit, to run a basically honest campaign--that is, he at least can't lie about his support for gay rights and pro-choice (the record on terrorism: the jury is still out!).

So if you're going to run, you should definitely 'fess up about any cyber-skeletons in the cyber-closet, because otherwise, the Blogosphere will own you.

Wednesday, April 25, 2007

"Missing" or "Lost" Karl Rove of-the-RNC Emails

When I first heard about this a couple weeks ago, the politics of the situation were the most intriguing element of this story: Did Karl Rove, adviser to the President, try to subvert the Hatch Act, which provides for full accounting of all communications from, to and within the White House, by using his RNC-funded and maintained email for his secret, and potentially, illegal work as an employee of the American people?

For most people, that alone is intriguing. For me, though, I also liked the fact that this little debacle has brought to a more public light some little known facts:
  • In cyberspace, there is almost always a server with something that's passed through it, with traces and back-ups of email.
  • While email could certainly be completely lost (any NetAdmin will tell you that files can get corrupted, either in Microsoft Exchange Server or with any Linux flavor, and also be in between backup times...), it is pretty unlikely.
    • The sender has to delete it from "sent items", then delete it from "deleted items"
    • the receiver has to delete it from "received items", then delete it from "deleted items"
    • the best practice in the IT industry is to create archive folders for a user to personally retrieve old emails, so most users actually just archive their email. So to delete it from there, you have to perform another delete action (that's three, so far!).
    • Depending on the business, some backup servers also back up hard drives on user's workstations, so it would have to be deleted from there, as well.
    • Then, on the server side, the typical time for data to reside on an email server is 30 days, but that is backed up every day, so theoretically, depending on the back-up practices of the company, you might have several years worth of backups on tape.
    • Current practice due to Sarbanes-Oxley is to archive every month of data going back 7 years in "easily retrievable formats" which has been taken by many CIOs and IT Directors to be DVD-ROM discs, so any publicly traded company should have these at least back to 2004. Other companies are adopting these as best practices for IT and accounting purposes.
    • The receiving server also keeps data at least for 30 days. Depending on the company practices of these backups, these might also exist.
  • It is pretty easy to willfully delete emails if there is a strict policy of doing so on the sender's side, but not so easy on the receiver's side, so more than likely, copies of these emails could be retrieved. I've heard of IT departments of law firms charging approximately $2 per retrieved email as a standard cost, but I'm sure it could be more expensive depending on the above-mentioned standards implemented.
It is interesting that 35 years ago, tapes were erased by a secretive White House that did not want the Public (or even just Congress) to hear what it had said. If the RNC and Rove did their jobs well, and did indeed erase any of the evidence that illegal things were done in the Administrative Branch of the United States, then we might never be able to answer some of the questions about how we were led to war, what the political reasons were for firing state's attorneys, or what happened in the energy commission chaired by Cheney & Rove.

But I'm willing to bet that we CAN retrieve this information, that is, if it is legally allowable, because technology allows this capability.

Tuesday, April 24, 2007

Welcome to My Blog!

And I'm sure I mean that literally, considering that the average readership of a blog is 1!

Despite that, I will try to post weekly, but if topics come up that are of interest and immediacy, I will at least attempt a quick note.

Who am I and why should you bother to read my blog? I can answer the first, and give inducements to the second, but ultimately you will decide whether your time is well spent here. Hopefully there will be lively discussions and a minimum of A**holes.

Answering the first question:
I am Dave Tainer, Director of Information Systems at GMHC, a New York City-based Non-Profit dedicated to the national fight against AIDS. In my role, I support the great staff at GMHC by providing them with the tools and the technology to do the best possible job. I offer strategic technological vision for the organization in its fight. Most importantly, I offer technological solutions to the myriad projects in organizing, testing, administering, counselling and record-keeping that are necessary on a day-day basis.

I've been involved in technology, directly or indirectly, since the dawn of the personal PC era, working for the government (Dept. of Energy, FEMA, both at Argonne National Labs), for-profit businesses, and teaching in the computer science department as adjunct faculty at DePaul University. Because I've worked in many different areas of IT (networking, programming, project management, strategy, vision, management, operational), I have a well-rounded view of all that technology can do, as well as what it can't.

I am also very interested in politics and can therefore speak of how technology is affecting our political culture and our political process. What has really sparked me to start this blog is the latest tech-related political flaps, namely, the following:
  • Karl Rove's and the RNC's "lost" emails (hardware/software technology)
  • the "1984" Hillary/Obama ad (viral video)
  • The Giuliani campaign Website SQL flaw (Web/Internet)
  • Voting Machines (hardware/software)
These all made me think that technology and politics are fast becoming like technology and business, heavily intertwined and inseparable. This is already ushering in a change in how campaigns are executed, financed, and viewed culturally. I will try to treat these topics in a journalistic fashion, that is, to tell the truth without distortion of my personal view of things. For the record, I'm socially liberal, but fiscally conservative, which, in these days, the loudmouths of the right refer to as "radicals". In any case, the blogosphere will act as my ethics board!

Let's hope this is the beginning of long friendship (just hope it's not with myself only)!