[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Very high loads



The load average is simply the number of jobs in the run queue
averaged per second.  I've had programmers panic that the LA
was 12.  I mentioned it was a 12 CPU machine so really not
a worry.  Then they paniced cause we need more CPUs.  I should
just take "uptime" away from them.

NFS is sort of unique in that a down NFS server (at least in the
past) put the wait into the kernel.  It waits for kernel processing,
the kernel waits for the NFS server, the LA goes up.  I've had
responsive machines with an LA of 1500.  Fix the stupid EMC
and it came down to 0.5.

Typically IO waits (usually disk) are an entirely different state.
Really busy mail machines might run a load of 1.0 or less. But
iostat will show the disks just CRANKING.  I get too many clients
who want 4-way machines but then buy cheap disk.  It's hard to
tell them to spend $30k on disk and $4k on the computer when they
have system admins who mostly do databases and the like.

Read up on Design and Implementation of 4.4BSD for decent kernel
detail. (McKusick, Bostic et al).

Quoting Berk Demir (bdd_(_at_)_ieee_(_dot_)_org):
> > Not very accurate.
> >
> > A single process that requires CPU cycles all the time will increase
> > the load by 1, regardless of its priority. A high load average means
> > that there many processes asking for CPU cycles.
> 
> This is obvious but when a high priority process is getting the bigger CPU
> share, many processes will become ready to run. Then this increases the
> load average. The high priority process is not the direct factor for load
> average but it's the cause.
> 
> > A process can't be in the run queue and waiting for I/O at  the same
> > time. Processes in the run queue are be definition runnable.
> 
> "Blocked for I/O" is a wrong choice of word. Sorry...
> I mean processes  waiting for I/O in a *busy waiting* state.
> This is the case "select" works. This is the case when trying to access a
> not responding NFS share.
> As we can see from Ajai Khattri's top output, there are many processes
> waiting on "select".
> 
> Hope I'm correct now.
> 
> Regards,
> 
> Berk Demir