Using online ads to build a massively distributed computing system
March 11th, 2007
So lets take a totally theoretical problem. You have a very complex set of mathematical calculations to do. There’s relatively little data involved in these calculations, but they’re extremely computationally (CPU) intensive. The math is also setup in such a way that you can easily slice up the problem into millions of parallel calculations. I can’t think of a concrete example, but I’m sure there’s many cases in which this is totally possible. Just to be fun, lets say you need 48,000 CPU hours, e.g. 1000 CPUs for 48 hours, or 2000 CPUs for 24 hours, etc. etc.
So what do you do? You can’t run this on your own machine, you need power. Lets walk through some options!
- Build a computer cluster, perhaps a beowulf?
- Try to get CPU time on a shared data center, e.g. amazong.com now offers this service
- Build your own network a la SETI@Home
- Use an ad network!
Building your own computer cluster is a pain, especially if you only need to do this once. Your own cluster involves hiring people who have the expertise to manage the hardware, installation, management, etc. Maybe worth it if your task isn’t too complex and you expect to continue to use the cluster, but not ideal. If you build a 32 computer cluster
Shared time is actually quite nice. Some very rough research shows that you pay between $0.10 to $1.00 for CPU hour. Using shared computing resources, the 48,000 CPU hours would cost you between about $5 and $50,000. Probably closer to $50,000 since this isn’t a long-term engagement, just a single calculation. (note these is total back of the envelope math)
Building your own network — this is obviously cool. SETI @ Home has over 3 million machines in the network. But realistically, it ain’t happening. Especially not for a one time thing.
So how about the last option — an ad network? Wtf? Mike are you smoking crack? Well, think about it. When you place an ad on a site, you essentially get to execute some code for a limited period of time. Let’s say on average that an ad is shown for a minute on a page. Also, lets assume you can efficiently use 25% of a computer’s CPU with an ad. I think this is actually reasonably realistic since Flash can already execute some pretty complex code, and interact with a third party server. Imagine, your ‘ad’, is a simple flash file that loads a tiny bit of data from our server and then crunches it through and spits back a response within a minute. Every minute the ad is on the page it goes through another set of calculations and spits back results to our server.
Now, we really don’t care if our users interact with the ad, or even see it for that matter. This means we can buy extremely cheap ad space. We can buy tiny little flash buttons that are 120×90 pixels in far corners of sites and still efficiently execute our code. So, lets assume that we can easily buy ads that show for at least a minute for $0.05 CPM (cost for a thousand ads). Also, lets just assume that client CPUs are just as fast as amazon’s CPUs, just to make life easier.
Ok, so $0.05 now buys us 1000 minutes of 25% CPU usage, or 250 minutes of 100% CPU usage. 250 minutes -> a little over four hours of CPU usage, or in other terms, a penny and a half per CPU hour! Now, sadly here we need some infrastructure to handle inputs and outputs from our ‘math ads’, which would of course cost some money, but still, it’s a pretty cool idea! I could even see websites “donating” CPU usage by placing a simple bit of code at the bottom of their page.
Related Posts:
- About
- Michael Rubenstein Joining AppNexus
- The Challenge of Scaling an Adserver
- How Dell treats a million $ customer
- The Coming of the Real-Time Exchanges
-
http://arsblog.com/blog Jonah
-
http://brontemedia.com/2007/03/14/ad-networks-as-distributed-computers/ Bronte Media » Ad Networks as Distributed Computers
-
http://whereandy.com Andy
-
Mike
-
Marcus Robbins