Posted on August 1, 2007 Comments (2)
This very interesting post by Todd Hoff gives an overview of the YouTube Architecture and thus some interesting ideas on large scale web application architecture.
* Using a a cluster means:
– More disks serving content which means more speed.
– Headroom. If a machine goes down others can take over.
– There are online backups.
* Servers use the lighttpd web server for video:
– Apache had too much overhead.
– Uses epoll to wait on multiple fds.
– Switched from single process to multiple process configuration to handle more connections.
* Most popular content is moved to a CDN (content delivery network):
– CDNs replicate content in multiple places. There’s a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
– CDN machines mostly serve out of memory because the content is so popular there’s little thrashing of content into and out of memory.
* Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
– There’s a long tail effect. A video may have a few plays, but lots of videos are being played.
I have noticed a large increase in significant delays (taking 10-20 seconds to start playing) with YouTube in the last few months.