Monday, June 08, 2009

Yariv’s Blog » Blog Archive » Parallel merge sort in Erlang

Yariv’s Blog » Blog Archive » Parallel merge sort in Erlang


Parallel merge sort in Erlang
Posted by Yariv on March 09, 2009

I’ve been thinking lately about the problem of scaling a service like Twitter or the Facebook news feed. When a user visits the site, you want to show her a list of all the recent updates from her friends, sorted by date. It’s easy when the user doesn’t have too many friends and all the updates are on a single database (as in Twoorl’s case :P). You use this query:


"select * from update where uid in ([fid1], [fid2], ...) order by creation_date desc limit 20"


(After making sure you created an index on uid and creation_date, of course :) )

However, what do you when the user has many thousands of friends, and each friend’s updates are stored on a different database? Clearly, you should fetch those updates in parallel. In Erlang, it’s easy. You use pmap():