Generally, one can think of the "state" of a system as the minimum number of variables
(or pieces of information) required to predict the future. This is usually
dictated by the type of mathematical model (finite state machines, difference
equations, or differential equations) used to describe the evolution of the system.
The choices for variables to include in the state is highly dependent on the fidelity of
the model and the type of system. One can see immediately that the choice of variables to be included in the state
is not unique. When we talk about physical systems modeled by differential equations, such as
masses and springs, electric circuits or satellites (rigid bodies) rotating in space, we can
attach some additional intuition: the variables in the state should be adequate
to specify the energy of the system.
(This is just a rule of thumb, not a strict principle!).
For example, take a ball free-falling to earth: we can specify the position
of the ball by specifying the height (h) above the ground, but we also need to
include the velocity of the ball (dh/dt) to specify the total energy
(E = 1/2*m*(dh/dt)^2 + mgh). Therefore, the state of the ball is (h,dh/dt).
In a second example, consider a cup of hot coffee. If we were interested only
in a low fidelity model, we could take the average temperature of the liquid
as the state, as the total energy of the system is thermal in nature.
This model would be valid for answering questions like, "Is
this coffee too hot to drink?". However, if we wanted to answer more
complicated questions such as, "What happens to the vorticity in the coffee
as the system cools down", we would need to include in the model a description
of the motion of the fluid, and the state would need to be modified to include
the position and velocity of the fluid particles in the coffee.