How the heck does facebook do it?

One of the most interesting challenges of developing this site is that Facebook has set the golden standard for how people expect to interact with social media sites. Although we could re-invent the wheel, it doesn’t make sense to, considering that one of the most basic UI-design principles is to put things where users would expect them.

So when developing our status update box, I made the decision to mirror as many facebook UI features as possible (work in progress).

So the question continously came up for me: how the heck does facebook do it?

Detecting URLS

A feature I greatly admire in facebook’s status update box is the ability to detect typed URLS and display a blurb about them as you go.

urls

Detecting the URL was easy, but determining when the person stopped typing the url was important. I couldn’t use sentence delineation, since a good portion of punctuation is valid in urls. Eventually I landed on the same choice facebook did, wait for a space!

But how does facebook grab such pertinent information, and such a relevant image as well?

That’s a good question. I went through a few iterations of ideas. First I tried sorting through all the images on the page linked, and grabbing the largest size image. That seemed easy to do, since you can grab the headers without downloading entire images. But the problem was, with most of these sites being as optimized as they were, the biggest sized picture wasn’t always the biggest resolution picture, and often I grabbed logos or advertisements for other posts.

I considered comparing actual resolution, but that would require the script to download and inspect each image.

Another feature which I had considered was allowing the user to scroll through potential thumbnails to choose which was relevant. This was one such feature I’ve seen on facebook. But that brought up another problem: in order to ensure that our site stays secure (SSL), we need to download any content we display and serve it from our server. We aren’t allowed to hot link, as this would break encryption. If we added this feature, our webserver would quickly fill up with junk, and the feature would be too slow.

Ultimately, I made the decision to stay with the largest (in kb) picture. But that didn’t sit right with me, since facebook seemed to have a magic key to know what image to grab.

Enter the OpenGraph standard.

It didn’t take me long to research that facebook was using a format of html tags called OpenGraph that served meta information within each page, feeding easy-to-use information such as the title, description, and image linked to each post. This was how facebook did it. We had a solution.

The hybrid 

Since not every site uses opengraph, I’ve decided to use a hybrid. First, it will attempt to seek opengraph information. If it’s there, we’re done. If not, it cycles through a number of other images it finds, selects the largest (kb) image file, copies it over. 9 times out of 10, it’s a relevant image. Sometimes it’s not, but we have to compromise just a little for performance.

As it stands the process of loading the urls is a bit sluggish, so I’m sure I’ll revisit this one again soon. But there you have it, the status box is that much better!

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>