How does website handle million users

Load Balancer

It is needed if you have so many requests which can not be handled by one web server. Typically 10-15k requests per second can be handled by one web server for a dynamic website, but it depends totally on complexity of website/web application. Load balancer contains multiple web servers and just forwards incoming requests to one of them to distribute.

Separate cdn server

To server static content(js, css, images) setup a cdn server which is optimized to serve static content. This will reduce load from web server.

Web Server

Tune the configuration of web server for your use case. Set number of threads, connections, network buffer size, open file descriptor etc. Different servers have different configuration files to tune the performance.

The maximum number of request processing threads to be created by this Connector, which therefore determines the maximum number of simultaneous requests that can be handled. If not specified, this attribute is set to 200. If an executor is associated with this connector, this attribute is ignored as the connector will execute tasks using the executor rather than an internal thread pool.

acceptCount The maximum queue length for incoming connection requests when all possible request processing threads are in use. Any requests received when the queue is full will be refused. The default value is 100.

The maximum number of connections that the server will accept and process at any given time. When this number has been reached, the server will accept, but not process, one further connection. This additional connection be blocked until the number of connections being processed falls below maxConnections at which point the server will start accepting and processing new connections again. Note that once the limit has been reached, the operating system may still accept connections based on the acceptCount setting. The default value varies by connector type. For BIO the default is the value of maxThreads unless an Executor is used in which case the default will be the value of maxThreads from the executor. For NIO the default is 10000. For APR/native, the default is 8192.
Note that for APR/native on Windows, the configured value will be reduced to the highest multiple of 1024 that is less than or equal to maxConnections. This is done for performance reasons.
If set to a value of -1, the maxConnections feature is disabled and connections are not counted.


Each web server must serve same content, hence should talk to same database. If many web server talking to one db server, it will become bottle neck. Even if there is one web server, sometimes db server may become bottle neck. Many database server have scalable architecture. mysql server supports master-slave and master-master configuration.
  1. In master-slave one server is master where data is written and it is replicated to multiple slave servers. In this case write is done on master and read from slaves. This is useful when very few write happens on database but many reads. typically less than 10-15% of write but depends on the use case.
  2. In master-master is similar to above but all are masters, any data written to any server gets replicated to other servers. Read and write can be done on any server. This is useful for applications which have high writes.
Above is for mysql but similar kind of scalability is supported by other database servers too.

Caching Server

Reading from disk(db) is expansive. Here caching server comes to help you. They keep cache data to the memory. If data is there in cache(hit), then disk read is saved, if it is not(miss) then read from disk and save it in cache for next time. If you get 70-80% hit ratio then it will help in scaling.

Keep all of above servers in same LAN close to each other in high speed LAN so that when they communicate, netword doesn't become bottle neck.