Archive

You are currently browsing the The WING Blog blog archives for July, 2011.

Jul

28

The online advertising ecosystem

By John Byers

Advertising back in the pre-Internet days was pretty simple.  Ad agencies schmoozed with advertisers (think “Mad Men”), designed ad copy for their clients, and negotiated with publishers to buy ad spots.  But recently, and especially in the last five years, the landscape of online advertising has quietly been transformed.  Most people think of Google and maybe DoubleClick when it comes to new business models, but the reality is that a staggeringly complex ecosystem (graphic by a VC firm) has emerged.

Not only are there hundreds of firms depicted in this figure, but there are probably over a dozen distinct business models, most of them predicated on hard-core computer science.  For example, on top of the ad exchanges are DSPs (demand-side platforms), like Turn, that use blackbox optimization and behavioral and demographic targeting to drive ad buys across exchanges, often using real-time bidders.  Another interesting model is DMPs (data management platforms), like BlueKai, that do massive-scale data analytics and data mining based on historical advertiser performance to optimize campaigns.  These companies are pushing the envelope both with respect to systems design, since the throughput and latency requirements of matching ad slots to users in real-time on such a massive scale is daunting;  as well as in data analytics, where mined datasets are running into tens of TB or more.  Another aspect is a silent erosion of privacy, as some firms are cookie-ing users with attributes like “in-market for a new car”, others are buying impressions based on these cookies (cookie retargeting), and still others are computing joins of separate observations to build large databases of user information.

Once Congress figures out what to do about the debt ceiling, they’ll eventually turn their attention back to online advertising practices, so all of this technology as well as the privacy implications will be back in the news.  Also, research in this area, especially as relates to privacy, is still in the very early stages, so it could be a worthwhile venue to investigate.

(Full disclosure:  the post-er is a director at a “Data Optimization” online advertising company (that is still successfully flying a little too stealthily to be on the ecosystem chart :)   )).

Jul

22

Is it clean-slate or more of the same?

By matta

The July IEEE Communications magazine has a special issue on Future Internet Architectures (FIA). The first article by Raj Jain et al. gives a survey of some FIA research projects in the US, Europe and Asia:

ieeexplore.ieee.org/iel5/35/5936142/05936152.pdf?arnumber=5936152

From reading this article, you get a sense that we’re still doing things backwards ;) The authors write that step 1 is coming up with innovations, then step 2 putting them into an overall network architecture. Really! Why should we believe that these individual innovations would “fit” together. Remember, the NAT was an “innovation”, and all sorts of patches we did to the Internet over the last 30-40 years. Do they fit? If so, the research community wouldn’t have rallied behind “clean-slate” FIA!

The government is funding research that proposed design without a theory – but it proposed to come up with “an underlying theory supporting the design”. Hmm, shouldn’t theory come first? :)

And people talk about “data are named instead of their location (IP addresses)”.  Well, in CS, names have always been structured to find where things are. So, a real theory would think about the similarities and differences between names and addresses, e.g. that addresses are just shorter names to make it more efficient to carry them in packet headers…

And people keep confusing storage management (a la DTN) with communication… We need a theory that tells us when to “store” vs. just forward.

The article ends with “even those collaborative ones like in FIA program, put more emphasis on a specific attribute or a specific set of problems. It seems to be a tough problem to handle many challenges in a single architecture design.” And we need a “comprehensive theory in the research process rather than designing based only on experiences.” I agree. The community can’t seem to be able to think really clean-slate.

Based on our research so far, it seems that (really!) clean-slate thinking would give you a very simple architecture with one recursive block that has only two protocols. Meet RINA: http://csr.bu.edu/rina/  :)

Jul

18

Google’s “locksmith” problem

By Mark Crovella

Here is an interesting NYT article about “Search Engine Optimization” (SEO) applied to Google. It seems that certain service categories like local locksmiths are getting flooded by bogus websites that are fronts for phone banks.    So an unsuspecting customer who searches for “locksmith boston” will get a large number of hits that essentially all go to the same service in the end.

There are a number of research questions here, for example:

  1. For how many categories of services is this a problem?
  2. For any given category, how can one sort the “real” from the “fake” sites?

The nice thing about these questions is that you can do the research just by typing google queries and looking at the results.   The main observation I would start from is that any attempt to overwhelm search results must rely heavily on automation, and therefore incorporate simple patterns that can be detected.

For example, “boston locksmith” yields top hits with domain names bostonlocksmiths.net, bostonlocksmith.com, bostonlocksmith.org, boston-locksmiths.us, and quite a few more following that pattern.   Similarly, doing a search for “dc locksmith” yields domains like “dclocksmith.org”, etc.

Another example is the HTML content of web pages.  For example take a look at http://www.minneapolis-locksmith.us/ and  http://www.bostonlocksmith.us/ and http://www.chicagolocksmith.us/.   The similarities here should be easily detected.

Finally, Google can help you directly.   Google image search has come a long way in allowing “query by example”.   Searching for the graphic on the left hand side of http://www.bostonlocksmith.us/ finds the same image used on locksmith sites in about a dozen cities. (It also finds the original image which was appropriated for this graphic — coming from a professor in Manchester England!)

Could be a neat project to “reverse-engineer” these SEO strategies!