Question : Scalable cluster architecture design - need opinion or an advise

Hi experts!

I need to design an architecture for the big web application: big news portal + media hosting (pics, video, audio). Need to hold 400k active sessions  (up to 600k in peaks), 50Gb daily video uploads, 10Gb photo etc. Check the picture attached. Doing such thing for the first time. If you can see any flow in here please give me a hint.

Thanks.

The cluster:

Going to use VMWare ESX, real servers are 24 x Xeon Quad Core blade servers.

Storage: VMFS-connected to virtual-servers, HBA FC connected SAN to real servers, stores media files, >2TB

front servers: nginx, connected to shared storage to read static data

app servers: apache+php, serving dynamic HTML, authorization, media upload, connected to shared storage

memecache: memcached servers, session caching, heavy calculation caching etc

mysql cluster: going to run ndb+mysqld+ndb manager on each server, use load balancers

vconv: video converting, ffmpeg, connected to shared storage

admin will be the only one allowed to access all other servers via SSH plus few other thing.


Make me think i did everything right.

Thanks.

Answer : Scalable cluster architecture design - need opinion or an advise

Squid in front of Apache will be able to handle heap of TCP connections and keep private systems off the public wires.
Apache can abuse sendfile() to serve static files off the NAS at no CPU cost.
PHP connection pooling with careful programming will keep SQL backend cluster happy.

I see no purpose for memcache - network interconnets are like 8Gbps FC (800MB/s) or 1GbE(100MB/s), but computer memory is like 8GB/s, so better add more RAM instead of dedicated cache machines.

PostgreSQL or FireBird has more functionality contained in database to serve as backend

Sharing dependent FC NAS spaces between databases and lots of other IO can slow it down.
If you have that multi-controller SAN you are fine.

Databases can be improved with transactional logs on solid state disks - smallest of fastest (like two in RAID0) are best choice.

gentoo is known to push most out of CPU for vconv part, for rest about anything does with kernel tuning.

VMFS does not share files between machines, it only makes images of systems for live migration shared.

To summarize - frontend Squid(s) does only network processing, not using storage, no secrets on the end of wire, can do logging - Gentoo good... For rest normal system can be tuned to spec.
QuadCore xeon is hyperthreading actually and you have to use processor binding in vmware.
Why not double quadcore processor setup?
Something is strange with your assumptions 10GB photos is at most 10000 files - what normal digital camera is able to produce running non-stop.

Hope I caused some doubts, if you have more feel free to share.
Random Solutions  
 
programming4us programming4us