Queuing systems

From FBSwiki
Revision as of 23:11, 10 October 2009 by Murray (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A common process in computing systems (and web servers) is managing a queue of requests. Typical queues can either by first-in, first-out (FIFO) or last-in, first-out (LIFO). The most common for web servers is the FIFO queue.



A schematic diagram of a simple queue is shown to the right. Messages arrive at rate and are stored in a queue. Messages are processed and removed from the queue at rate . The average size of the queue is given by math. There can be large variations in arrival rates and service rates, and the queue length builds up when the arrival rate is larger than the service rate. When the queue becomes too large, service is denied using an admission control policy.

The system can be modeled in many different ways. One way is to model each incoming request, which leads to an event-based model where the state is an integer that represents the queue length. The queue changes when a request arrives or a request is serviced. A discrete-time model that captures these dynamics is given by the difference equation


where and are random variables representing incoming and outgoing requests on the queue. These variables take the values 0 or 1 with some probability at each time instant. In many cases it is possible to determine statistics of quantities like queue length and service time, but the computations can be quite complicated. To capture the statistics of the arrival and servicing of messages, we model each of these as a {\em Poisson process} in which the number of events occurring in a fixed time has a given rate, with the specific timing of events independent of the time since the last event. (The details of random processes are beyond the scope of this text, but can be found in standard texts such as [Pit99].)

A significant simplification can be obtained by using a flow model. Instead of keeping track of each request we instead view service and requests as flows, similar to what is done when replacing molecules by a continuum when analyzing fluids. Assuming that the average queue length is a continuous variable and that arrivals and services are flows with rates and , the system can be modeled by the first-order differential equation


where is the maximum service rate and is a number between 0 and 1 that describes the effective service rate as a function of the queue length.

It is natural to assume that the effective service rate depends on the queue length because larger queues require more resources. In steady state we have , and we assume that the queue length goes to zero when goes to zero and that it goes to infinity when goes to 1. This implies that and that . In addition, if we assume that the effective service rate deteriorates monotonically with queue length, then the function is monotone and concave. A simple function that satisfies the basic requirements is , which gives the model (FAQ)


This model was proposed by Agnew~\cite{Agnew76}. It can be shown that if arrival and service processes are Poisson processes, the average queue length is given by equation~\eqref{eq:webserv:webseverdynamics} and that equation~\eqref{eq:webserv:webseverdynamics} is a good approximation even for short queue lengths; see Tipper~\cite{Tip+Sun90}.

To explore the properties of the model we will first investigate the equilibrium value of the queue length when the arrival rate is constant. Setting the derivative to zero in the dynamics and solving for , we find that the queue length approaches the steady-state value


The figure to the right shows the steady-state queue length as a function of , the effective service rate excess. Notice that the queue length increases rapidly as approaches . To have a queue length less than 20 requires . The average time to service a request is , and it increases dramatically as approaches .


The next figure illustrates the behavior of the server in a typical overload situation. The solid line shows a realization of an event-based simulation, and the dashed line shows the behavior of the flow model. The maximum service rate is , and the arrival rate starts at . The arrival rate is increased to at time 20, and it returns to at time 25. The figure shows that the queue builds up quickly and clears very slowly. Since the response time is proportional to queue length, it means that the quality of service is poor for a long period after an overload. This behavior is called the rush-hour effect and has been observed in web servers and many other queuing systems such as automobile traffic.

The dashed line in th figure shows the behavior of the flow model, which describes the average queue length. The simple model captures behavior qualitatively, but there are variations from sample to sample when the queue length is short. It follows from the model that the rate of change is approximately 3 messages/second when the queue length builds up at time , and approximately 0.5 messages/second when the queue length decreases after the build up. The time to return to normal is thus approximately 6 times the overload time.

Admission control

Delay control