[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

partial fix for kernel/1065



I'd like getrusage() to return nondecreasing values. It doesn't now:
this is one aspect of the bug kernel/1065.

The patch below is a partial fix for the kernel/1065 problem. With the
patch, the "system" and "user" components of the time are still
slightly non-monotonic, but at least their sum is monotonic. This is
useful so that functions like ANSI Common Lisp's get-internal-run-time
  <http://www.lisp.org/HyperSpec/Body/fun_get-internal-run-time.html>
can be implemented without giving pathological results saying "you've
used a negative amount of CPU time since the last time you checked".
The patch also fixes what looked to me like some smaller problems with
calcru().

Here's what I did:
  * I rewrote calcru() in several ways:
    ** computed returned "system" time from the sum of the 
       p_sticks value and the p_iticks value, not just the p_sticks
       value alone, in order to make the total returned time monotonic
    ** wrapped splstatclock()/splx() around more code, including
       the sequence
		microtime(&tv);
		sec += tv.tv_sec - runtime.tv_sec;
		usec += tv.tv_usec - runtime.tv_usec;
       since it looked to me as though otherwise that code was
       asking for trouble (and since I figured it's OK to do this
       since some similar code in mi_switch() wraps splstatclock()
       around a call to microtime() and accesses to the "runtime" var)
    ** reorganized the operations so that control returns only from 
       the end of the function, so that (now that splstatclock() covers
       more code) I wouldn't have to put in a different splx()
       call for each possible way to return from the function
    ** smoothed out the reported values around process startup (when
       p_sticks, p_uticks, and p_iticks are all zero)
    ** used more-descriptive variable names 
    ** deleted the never-used fourth argument (and modified calls to
       calcru() calls so that they no longer pass NULL there)
As far as I know, the only questionable thing here is the change in
handling of p_iticks. To me, including p_iticks in the "system" time
seems sensible. I, at least, was surprised when I found that the old
getrusage() wasn't reporting all the time spent on a process. This
change also has the advantage that it allows the getrusage() result to
be more nearly monotonic. But the p_iticks value is, ahem, rather
sparingly documented in the sources, and I didn't find any useful
information elsewhere. (I tried Google, and McKusick et al.) So it
would probably be good for someone who's in touch with the project's
secret oral traditions to review the way that I've included p_iticks
in the reported "system" time, to make sure that it's OK.

Something that I didn't do, but that I'd be happy to do if it's
considered a good idea, is:
  * Document the remaining non-monotonicity in the BUGS section of
    the getrusage(2) man page. Perhaps
	The individual system-time and user-time values do not
	necessarily increase monotonically, due to round-off error
	in computation of the ratio between user time and system
	time. Thus, successive system-time and user-time values can 
	occasionally decrease by an amount on the order of the 
	system scheduler tick time. This problem washes out in
	the sum of the two values; i.e., the sum never decreases.
	(This problem has the bug database ID kernel/1065.)

If this fix isn't acceptable, and no alternate fix is applied soon,
then
  * Suggestions for an appropriate way to implement Common Lisp's
    get-internal-run-time on OpenBSD would be appreciated..
  * I think the BUGS section of the getrusage(2) man page should
    probably mention kernel/1065, since it's been several releases
    since the problem was reported.

Snip below to get
  * a test case to illustrate the problem fixed by the patch
  * the patch

I'd appreciate it if you'd notify me by hand on changes related to this
problem, since as of June 13, Todd Miller wrote me that
  Hmm, it would appear that GNATS has a bogus ',' in the Resent-Reply-To
  field; it looks like a bug in the code.  For the time being I would suggest
  unsubscribing from the bugs list since you are bouncing all messages
  from the bug tracking system anyway.  I'll look into updating that
  stuff to a (hopefully) less buggy version.
and I don't think the problem has been fixed yet. (But it's not
critical, I'll check the database by hand too.)

-- 
William Harold Newman <william_(_dot_)_newman_(_at_)_airmail_(_dot_)_net>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C
-----------------------------------------------------------------------
a test case which mocks up the ANSI Common Lisp function
get-internal-run-time to illustrate the problem fixed by this patch
--
/* OpenBSD's getrusage() times are not monotonic (bug kernel/1065).
 * Even their sum isn't monotonic, so Lisp get-internal-run-time isn't
 * monotonic either.
 *
 * (Running this program for 10 seconds or so on my 700 MHz P3 laptop,
 * or sometimes longer if there isn't much activity to generate
 * interrupts and context switches, gives the the "huh?" message.) */

#include <stdio.h>
#include <sys/time.h>
#include <sys/resource.h>

#define INTERNAL_TIME_UNITS_PER_SECOND 100
#define USEC_PER_SECOND 1000000
#define USEC_PER_INTERNAL_TIME_UNIT \
  (USEC_PER_SECOND / INTERNAL_TIME_UNITS_PER_SECOND)

int
main(int argc, char *argv[])
{
    int last_internal_run_time = -1;
    int last_utime_sec, last_utime_usec;
    int last_stime_sec, last_stime_usec;
    struct rusage rusage;

    while (1) {

	int status = getrusage(RUSAGE_SELF, &rusage);
	
	int utime_sec  = rusage.ru_utime.tv_sec;
	int utime_usec = rusage.ru_utime.tv_usec;

	int stime_sec  = rusage.ru_stime.tv_sec;
	int stime_usec = rusage.ru_stime.tv_usec;

	/* GET-INTERNAL-RUN-TIME is an ANSI Common Lisp function. This
	 * is essentially how it's implemented in SBCL. */
	int internal_run_time =
	    INTERNAL_TIME_UNITS_PER_SECOND * (utime_sec + stime_sec) +
	    (utime_usec + stime_usec) / USEC_PER_INTERNAL_TIME_UNIT;

	if (internal_run_time < last_internal_run_time) {
	    fprintf(stderr,
		    "huh?\n"
		    "            %6s %6s\n"
		    "utime_sec:  %6d %6d\n"
		    "utime_usec: %6d %6d\n"
		    "stime_sec:  %6d %6d\n"
		    "stime_usec: %6d %6d\n"
		    "run_time:   %6d %6d\n",
		    "this", "last",
		    utime_sec, last_utime_sec,
		    utime_usec, last_utime_usec,
		    stime_sec, last_stime_sec,
		    stime_usec, last_stime_usec,
		    internal_run_time, last_internal_run_time);
	    exit(1);
	} else {
	    last_utime_sec = utime_sec;
	    last_utime_usec = utime_usec;
	    last_stime_sec = stime_sec;
	    last_stime_usec = stime_usec;
	    last_internal_run_time = internal_run_time;
	}
    }
}
-----------------------------------------------------------------------
the patch ("cvs diff -c" against the 2.9 patch branch 23 June 2001)
--
Index: sys/compat/hpux/hpux_compat.c
===================================================================
RCS file: /cvs/src/sys/compat/hpux/hpux_compat.c,v
retrieving revision 1.12
diff -c -r1.12 hpux_compat.c
*** sys/compat/hpux/hpux_compat.c	2000/08/24 10:41:51	1.12
--- sys/compat/hpux/hpux_compat.c	2001/06/23 17:01:26
***************
*** 1351,1357 ****
  	struct tms atms;
  	int error;
  
! 	calcru(p, &ru, &rs, NULL);
  	atms.tms_utime = hpux_scale(&ru);
  	atms.tms_stime = hpux_scale(&rs);
  	atms.tms_cutime = hpux_scale(&p->p_stats->p_cru.ru_utime);
--- 1351,1357 ----
  	struct tms atms;
  	int error;
  
! 	calcru(p, &ru, &rs);
  	atms.tms_utime = hpux_scale(&ru);
  	atms.tms_stime = hpux_scale(&rs);
  	atms.tms_cutime = hpux_scale(&p->p_stats->p_cru.ru_utime);
Index: sys/compat/linux/linux_misc.c
===================================================================
RCS file: /cvs/src/sys/compat/linux/linux_misc.c,v
retrieving revision 1.35
diff -c -r1.35 linux_misc.c
*** sys/compat/linux/linux_misc.c	2001/04/02 21:43:11	1.35
--- sys/compat/linux/linux_misc.c	2001/06/23 17:01:28
***************
*** 722,728 ****
  	struct rusage ru;
  	int error, s;
  
! 	calcru(p, &ru.ru_utime, &ru.ru_stime, NULL);
  	ltms.ltms_utime = CONVTCK(ru.ru_utime);
  	ltms.ltms_stime = CONVTCK(ru.ru_stime);
  
--- 722,728 ----
  	struct rusage ru;
  	int error, s;
  
! 	calcru(p, &ru.ru_utime, &ru.ru_stime);
  	ltms.ltms_utime = CONVTCK(ru.ru_utime);
  	ltms.ltms_stime = CONVTCK(ru.ru_stime);
  
Index: sys/compat/sunos/sunos_misc.c
===================================================================
RCS file: /cvs/src/sys/compat/sunos/sunos_misc.c,v
retrieving revision 1.25
diff -c -r1.25 sunos_misc.c
*** sys/compat/sunos/sunos_misc.c	2001/01/26 05:30:18	1.25
--- sys/compat/sunos/sunos_misc.c	2001/06/23 17:01:31
***************
*** 1214,1220 ****
  	struct rusage ru, *rup;
  
  	/* RUSAGE_SELF */
! 	calcru(p, &ru.ru_utime, &ru.ru_stime, NULL);
  	tms.tms_utime = CONVTCK(ru.ru_utime);
  	tms.tms_stime = CONVTCK(ru.ru_stime);
  
--- 1214,1220 ----
  	struct rusage ru, *rup;
  
  	/* RUSAGE_SELF */
! 	calcru(p, &ru.ru_utime, &ru.ru_stime);
  	tms.tms_utime = CONVTCK(ru.ru_utime);
  	tms.tms_stime = CONVTCK(ru.ru_stime);
  
Index: sys/kern/kern_acct.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_acct.c,v
retrieving revision 1.9
diff -c -r1.9 kern_acct.c
*** sys/kern/kern_acct.c	2000/05/05 08:38:23	1.9
--- sys/kern/kern_acct.c	2001/06/23 17:01:55
***************
*** 194,200 ****
  	bcopy(p->p_comm, acct.ac_comm, sizeof acct.ac_comm);
  
  	/* (2) The amount of user and system time that was used */
! 	calcru(p, &ut, &st, NULL);
  	acct.ac_utime = encode_comp_t(ut.tv_sec, ut.tv_usec);
  	acct.ac_stime = encode_comp_t(st.tv_sec, st.tv_usec);
  
--- 194,200 ----
  	bcopy(p->p_comm, acct.ac_comm, sizeof acct.ac_comm);
  
  	/* (2) The amount of user and system time that was used */
! 	calcru(p, &ut, &st);
  	acct.ac_utime = encode_comp_t(ut.tv_sec, ut.tv_usec);
  	acct.ac_stime = encode_comp_t(st.tv_sec, st.tv_usec);
  
Index: sys/kern/kern_exit.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.30
diff -c -r1.30 kern_exit.c
*** sys/kern/kern_exit.c	2001/04/02 21:43:11	1.30
--- sys/kern/kern_exit.c	2001/06/23 17:01:55
***************
*** 256,262 ****
  	 */
  	p->p_xstat = rv;
  	*p->p_ru = p->p_stats->p_ru;
! 	calcru(p, &p->p_ru->ru_utime, &p->p_ru->ru_stime, NULL);
  	ruadd(p->p_ru, &p->p_stats->p_cru);
  
  	/*
--- 256,262 ----
  	 */
  	p->p_xstat = rv;
  	*p->p_ru = p->p_stats->p_ru;
! 	calcru(p, &p->p_ru->ru_utime, &p->p_ru->ru_stime);
  	ruadd(p->p_ru, &p->p_stats->p_cru);
  
  	/*
Index: sys/kern/kern_resource.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_resource.c,v
retrieving revision 1.12
diff -c -r1.12 kern_resource.c
*** sys/kern/kern_resource.c	2000/05/05 08:34:18	1.12
--- sys/kern/kern_resource.c	2001/06/23 17:01:55
***************
*** 348,379 ****
   * system, and interrupt time usage.
   */
  void
! calcru(p, up, sp, ip)
  	register struct proc *p;
! 	register struct timeval *up;
! 	register struct timeval *sp;
! 	register struct timeval *ip;
  {
! 	register u_quad_t u, st, ut, it, tot;
  	register long sec, usec;
  	register int s;
  	struct timeval tv;
  
  	s = splstatclock();
- 	st = p->p_sticks;
- 	ut = p->p_uticks;
- 	it = p->p_iticks;
- 	splx(s);
- 
- 	tot = st + ut + it;
- 	if (tot == 0) {
- 		up->tv_sec = up->tv_usec = 0;
- 		sp->tv_sec = sp->tv_usec = 0;
- 		if (ip != NULL)
- 			ip->tv_sec = ip->tv_usec = 0;
- 		return;
- 	}
- 
  	sec = p->p_rtime.tv_sec;
  	usec = p->p_rtime.tv_usec;
  	if (p == curproc) {
--- 348,368 ----
   * system, and interrupt time usage.
   */
  void
! calcru(p, utimeval, stimeval)
  	register struct proc *p;
! 	register struct timeval *utimeval;
! 	register struct timeval *stimeval;
  {
! 	register u_quad_t sticks, uticks, iticks, totalticks;
  	register long sec, usec;
  	register int s;
+ 	register u_quad_t totalusec, uusec, susec;
  	struct timeval tv;
  
+ 	/*
+ 	 * Gather ye raw time data while the SPL is high.
+ 	 */
  	s = splstatclock();
  	sec = p->p_rtime.tv_sec;
  	usec = p->p_rtime.tv_usec;
  	if (p == curproc) {
***************
*** 386,405 ****
  		sec += tv.tv_sec - runtime.tv_sec;
  		usec += tv.tv_usec - runtime.tv_usec;
  	}
! 	u = (u_quad_t) sec * 1000000 + usec;
! 	st = (u * st) / tot;
! 	sp->tv_sec = st / 1000000;
! 	sp->tv_usec = st % 1000000;
! 	ut = (u * ut) / tot;
! 	up->tv_sec = ut / 1000000;
! 	up->tv_usec = ut % 1000000;
! 	if (ip != NULL) {
! 		it = (u * it) / tot;
! 		ip->tv_sec = it / 1000000;
! 		ip->tv_usec = it % 1000000;
  	}
  }
  
  /* ARGSUSED */
  int
  sys_getrusage(p, v, retval)
--- 375,434 ----
  		sec += tv.tv_sec - runtime.tv_sec;
  		usec += tv.tv_usec - runtime.tv_usec;
  	}
! 	sticks = p->p_sticks;
! 	uticks = p->p_uticks;
! 	iticks = p->p_iticks;
! 	splx(s);
! 
! 	/*
! 	 * We need a nonzero value of totalticks in order to avoid
! 	 * division by zero when partitioning p->p_rtime into "system"
! 	 * and "user" time, below. For very very young processes, we
! 	 * can't get this value from p_sticks, p_uticks, and p_iticks,
! 	 * since they're all zero, so instead we fake it.
! 	 */
! 	totalticks = sticks + uticks + iticks;
! 	if (totalticks == 0) {
! 		/*
! 		 * (The rationale for dividing time equally between
! 		 * system and user here is that it minimizes the
! 		 * maximum possible discontinuity in return values
! 		 * when the first genuine tick occurs.)
! 		 */
! 		sticks = uticks = 1;
! 		totalticks = sticks + uticks + iticks;
  	}
+ 
+ 	/*
+ 	 * Partition totalusec in proportion to the number of ticks.
+ 	 * Note that at the level of detail we report from this
+ 	 * routine, both the p_sticks and p_iticks values are
+ 	 * considered "system" time. If we don't lump them together
+ 	 * that way, the p_iticks value becomes invisible to the user,
+ 	 * but still contributes to discretization error, with the
+ 	 * unfortunate and surprising result that the sum of reported
+ 	 * "system" and "user" times can't be monotonic.
+ 	 *
+ 	 * (bug kernel/1065: It's also unfortunate and surprising that
+ 	 * in the current computation the reported "system" and "user"
+ 	 * times are individually non-monotonic. Discretization error
+ 	 * on the ratio of ticks can cause one to increase at the
+ 	 * other's expense by on the order of one tick, i.e. many
+ 	 * microseconds. But we can't solve that problem at this
+ 	 * level: to solve it, we'd need to change the way that time
+ 	 * is recorded, e.g. splitting p_rtime into p_system_rtime and
+ 	 * p_user_rtime. Meanwhile, at least this way the sum is
+ 	 * monotonic.) */
+ 	totalusec = (u_quad_t) sec * 1000000 + usec;
+ 	uusec = (totalusec * uticks) / totalticks;
+ 	susec = totalusec - uusec;
+ 	stimeval->tv_sec = susec / 1000000;
+ 	stimeval->tv_usec = susec % 1000000;
+ 	utimeval->tv_sec = uusec / 1000000;
+ 	utimeval->tv_usec = uusec % 1000000;
  }
  
+ 
  /* ARGSUSED */
  int
  sys_getrusage(p, v, retval)
***************
*** 417,423 ****
  
  	case RUSAGE_SELF:
  		rup = &p->p_stats->p_ru;
! 		calcru(p, &rup->ru_utime, &rup->ru_stime, NULL);
  		break;
  
  	case RUSAGE_CHILDREN:
--- 446,452 ----
  
  	case RUSAGE_SELF:
  		rup = &p->p_stats->p_ru;
! 		calcru(p, &rup->ru_utime, &rup->ru_stime);
  		break;
  
  	case RUSAGE_CHILDREN:
Index: sys/kern/tty.c
===================================================================
RCS file: /cvs/src/sys/kern/tty.c,v
retrieving revision 1.43
diff -c -r1.43 tty.c
*** sys/kern/tty.c	2001/03/02 08:04:04	1.43
--- sys/kern/tty.c	2001/06/23 17:01:56
***************
*** 2098,2104 ****
  		    pick->p_stat == SRUN ? "running" :
  		    pick->p_wmesg ? pick->p_wmesg : "iowait");
  
! 		calcru(pick, &utime, &stime, NULL);
  
  		/* Round up and print user time. */
  		utime.tv_usec += 5000;
--- 2098,2104 ----
  		    pick->p_stat == SRUN ? "running" :
  		    pick->p_wmesg ? pick->p_wmesg : "iowait");
  
! 		calcru(pick, &utime, &stime);
  
  		/* Round up and print user time. */
  		utime.tv_usec += 5000;
Index: sys/miscfs/procfs/procfs_status.c
===================================================================
RCS file: /cvs/src/sys/miscfs/procfs/procfs_status.c,v
retrieving revision 1.4
diff -c -r1.4 procfs_status.c
*** sys/miscfs/procfs/procfs_status.c	2000/12/18 18:44:28	1.4
--- sys/miscfs/procfs/procfs_status.c	2001/06/23 17:02:12
***************
*** 129,135 ****
  		snprintf(ps, sizeof(ps), " -1,-1");
  	COUNTORCAT(s, l, ps, n);
  
! 	calcru(p, &ut, &st, (void *) 0);
  	snprintf(ps, sizeof(ps), " %ld,%ld %ld,%ld",
  	    ut.tv_sec, ut.tv_usec, st.tv_sec, st.tv_usec);
  	COUNTORCAT(s, l, ps, n);
--- 129,135 ----
  		snprintf(ps, sizeof(ps), " -1,-1");
  	COUNTORCAT(s, l, ps, n);
  
! 	calcru(p, &ut, &st);
  	snprintf(ps, sizeof(ps), " %ld,%ld %ld,%ld",
  	    ut.tv_sec, ut.tv_usec, st.tv_sec, st.tv_usec);
  	COUNTORCAT(s, l, ps, n);
Index: sys/sys/resourcevar.h
===================================================================
RCS file: /cvs/src/sys/sys/resourcevar.h,v
retrieving revision 1.4
diff -c -r1.4 resourcevar.h
*** sys/sys/resourcevar.h	2000/05/05 08:34:18	1.4
--- sys/sys/resourcevar.h	2001/06/23 17:02:23
***************
*** 87,94 ****
  #ifdef _KERNEL
  void	 addupc_intr __P((struct proc *p, u_long pc, u_int ticks));
  void	 addupc_task __P((struct proc *p, u_long pc, u_int ticks));
! void	 calcru __P((struct proc *p, struct timeval *up, struct timeval *sp,
! 	    struct timeval *ip));
  struct plimit *limcopy __P((struct plimit *lim));
  void	limfree __P((struct plimit *));
  
--- 87,93 ----
  #ifdef _KERNEL
  void	 addupc_intr __P((struct proc *p, u_long pc, u_int ticks));
  void	 addupc_task __P((struct proc *p, u_long pc, u_int ticks));
! void	 calcru __P((struct proc *p, struct timeval *up, struct timeval *sp));
  struct plimit *limcopy __P((struct plimit *lim));
  void	limfree __P((struct plimit *));
  



Visit your host, monkey.org