# Queuing systems

A common process in computing systems (and web servers) is managing a queue of requests. Typical queues can either by first-in, first-out (FIFO) or last-in, first-out (LIFO). The most common for web servers is the FIFO queue.

## Modeling

A schematic diagram of a simple queue is shown to the right. Messages arrive at rate and are stored in a queue. Messages are processed and removed from the queue at rate . The average size of the queue is given by . There can be large variations in arrival rates and service rates, and the queue length builds up when the arrival rate is larger than the service rate. When the queue becomes too large, service is denied using an admission control policy.

The system can be modeled in many different ways. One way is to model each incoming request, which leads to an event-based model where the state is an integer that represents the queue length. The queue changes when a request arrives or a request is serviced. A discrete-time model that captures these dynamics is given by the difference equation

where and are *random variables*
representing incoming and outgoing requests on the queue. These
variables take the values 0 or 1 with some probability at each time
instant. In many cases it is possible
to determine statistics
of quantities like queue length and service time, but the computations can be
quite complicated.
To capture the statistics of the arrival and servicing of messages,
we model each of these as a {\em Poisson process} in which the
number of events occurring in a fixed time has a given rate, with
the specific timing of events independent of the time since the last
event. (The details of random processes are beyond the scope of this
text, but can be found in standard texts such as [Pit99].)

A significant simplification can be obtained by using a *flow*
model*. Instead of keeping track of*
each request we instead
view service and requests as flows, similar to what is done when
replacing molecules by a continuum when analyzing fluids. Assuming that
the average queue length is a continuous variable and that
arrivals and
services are
flows with rates and , the
system can be modeled by the first-order differential equation

where is the maximum service rate and is a number between 0 and 1 that describes the effective service rate as a function of the queue length.

It is natural to assume that the effective service rate depends on the queue length because larger queues require more resources. In steady state we have , and we assume that the queue length goes to zero when goes to zero and that it goes to infinity when goes to 1. This implies that and that . In addition, if we assume that the effective service rate deteriorates monotonically with queue length, then the function is monotone and concave. A simple function that satisfies the basic requirements is , which gives the model (FAQ)

This model was proposed by Agnew~\cite{Agnew76}. It can be shown that if arrival and service processes are Poisson processes, the average queue length is given by equation~\eqref{eq:webserv:webseverdynamics} and that equation~\eqref{eq:webserv:webseverdynamics} is a good approximation even for short queue lengths; see Tipper~\cite{Tip+Sun90}.

To explore the properties of the model we will first investigate the equilibrium value of the queue length when the arrival rate is constant. Setting the derivative to zero in the dynamics and solving for , we find that the queue length approaches the steady-state value

The figure to the right shows the steady-state queue length as a function of , the effective service rate excess. Notice that the queue length increases rapidly as approaches . To have a queue length less than 20 requires . The average time to service a request is , and it increases dramatically as approaches .

The next figure illustrates the behavior of the server
in a typical overload situation.
The solid line shows a realization of an event-based
simulation, and the dashed line shows the behavior of the flow
model.
The maximum service rate is
, and the arrival rate starts at . The
arrival rate is increased to at time 20, and it returns to
at time 25. The figure shows that the queue builds up
quickly and clears very slowly. Since the response time is
proportional to queue length, it means that the quality of service is
poor for a long period after an overload. This behavior is called the
*rush-hour effect* and has been observed in web servers and
many other queuing systems such as automobile traffic.

The dashed line in th figure shows the behavior of the flow model, which describes the average queue length. The simple model captures behavior qualitatively, but there are variations from sample to sample when the queue length is short. It follows from the model that the rate of change is approximately 3 messages/second when the queue length builds up at time , and approximately 0.5 messages/second when the queue length decreases after the build up. The time to return to normal is thus approximately 6 times the overload time.