This very interesting post by Todd Hoff gives an overview of the YouTube Architecture and thus some interesting ideas on large scale web application architecture.
Each video hosted by a mini-cluster. Each video is served by more than one machine.
* Using a a cluster means:
- More disks serving content which means more speed.
- Headroom. If a machine goes down others can take over.
- There are online backups.
* Servers use the lighttpd web server for video:
- Apache had too much overhead.
- Uses epoll to wait on multiple fds.
- Switched from single process to multiple process configuration to handle more connections.
* Most popular content is moved to a CDN (content delivery network):
- CDNs replicate content in multiple places. There’s a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
- CDN machines mostly serve out of memory because the content is so popular there’s little thrashing of content into and out of memory.
* Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
- There’s a long tail effect. A video may have a few plays, but lots of videos are being played.
I have noticed a large increase in significant delays (taking 10-20 seconds to start playing) with YouTube in the last few months.
August 2nd, 2007 at 6:22 pm
Mmh..lighttpd seems to be all the hype.. a lot of new web2.0 projects I know are using it..
Never tried it.. I guess I’ll have to take a look at it.