[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel/1671: Not enough random bytes available (GnuPG & OBSD2.8)

The following reply was made to PR kernel/1671; it has been noted by GNATS.

From: Brad Allen <Ulmo_(_at_)_Q_(_dot_)_Net>
To: hugh_(_at_)_openbsd_(_dot_)_org
Cc: Ulmo_(_at_)_Q_(_dot_)_Net
Subject: Re: kernel/1671: Not enough random bytes available (GnuPG &
Date: Tue, 13 Feb 2001 08:14:04 -0400 (AST)

 Content-Type: Text/Plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit
 From: Hugh Graham <hugh_(_at_)_openbsd_(_dot_)_org>
 Subject: Re: kernel/1671: Not enough random bytes available (GnuPG & OBSD2.8)
 Date: Sat, 10 Feb 2001 17:47:32 -0800
 Message-ID: <20010210174732_(_dot_)_A15287_(_at_)_argus_(_dot_)_oxide_(_dot_)_org>
 hugh> On Sat, Feb 10, 2001 at 05:50:02PM -0700, Brad Allen wrote:
 hugh> >  
 hugh> >  BTW, the problem and fix *might* be related to the RTC (Real Time
 hugh> >  Clock).  I am preparing a large bug report, and want more time to
 hugh> >  polish it off.  Just in case it gets lost in time, I will give you
 hugh> >  this patch as a clue as to what I think is a clue, and tell you I
 hugh> >  haven't had this problem *yet*, but the last time I had it it was
 hugh> >  after a few weeks of uptime and I still don't know what the trigger
 hugh> >  is:
 hugh> >  
 hugh> I suspected settimeofday() when I noticed a machine running ntpd
 hugh> lost statclock every six months or so, but unfortunately it's
 hugh> mission critical and I wasn't able to dig further than writing some
 hugh> code to reliably reproduce the bug.
 hugh> This program does reliably recreate the problem on some PC's, while
 hugh> others escape completely unaffected. Also, 65535 iterations is very
 hugh> enthusiastic.. usually it takes a lot less. It was passed around
 hugh> OpenBSD a few months ago, but either no one had a machine with the
 hugh> problem, or no one had enough time to look further.
 hugh> /Hugh
 hugh> ===================================================================
 hugh> #include <sys/types.h>
 hugh> #include <sys/time.h>
 hugh> #include <unistd.h>
 hugh> #include <stdlib.h>
 hugh> #include <stdio.h>
 hugh> int main() {
 hugh> 	int i;
 hugh> 	signed short wiggler;
 hugh> 	struct timeval tvn;
 hugh> 	if (gettimeofday(&tvn, 0) != 0)
 hugh> 		exit(1);
 hugh> 	srandom((tvn.tv_sec * getpid()) ^ tvn.tv_usec);
 hugh> 	for (i = 0; i < 65535; ++i) {
 hugh> 		wiggler = random();
 hugh> 		if (gettimeofday(&tvn, 0) != 0)
 hugh> 			exit(2);
 hugh> 		printf("current time: %ld %9ld wiggle: %d\n",
 hugh> 		    tvn.tv_sec, tvn.tv_usec, wiggler);
 hugh> 		tvn.tv_usec += wiggler;
 hugh> 		if (tvn.tv_usec < 0) {
 hugh> 			--tvn.tv_sec;
 hugh> 			tvn.tv_usec = 1000000 + tvn.tv_usec;
 hugh> 		} else if (tvn.tv_usec > 999999) {
 hugh> 			++tvn.tv_sec;
 hugh> 			tvn.tv_usec -= 1000000;
 hugh> 		}
 hugh> 		if (settimeofday(&tvn, 0) != 0)
 hugh> 			exit(3);
 hugh> 		if (gettimeofday(&tvn, 0) != 0)
 hugh> 			exit(4);
 hugh> 		printf("final time:   %ld %9ld\n\n",
 hugh> 		    tvn.tv_sec, tvn.tv_usec);
 hugh> 	}
 hugh> 	exit(0);
 hugh> }
 Uhoh.  That accurately reproduces the bug here, even in the kernel
 which I thought might fix it.
 I still DEFINITELY have the bug:
      The alternate system clock has died!
           Reverting to ``pigs'' display.
 (which also means ... 
 load averages:  1.08,  1.02,  0.78                                     07:55:42
 34 processes:  1 running, 33 idle
 CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
 Memory: Real: 29M/62M act/tot  Free: 61M  Swap: 4K/33M used/tot
 ... and ...
 well, uhm, for now, gpg has enough random bytes ... but I bet it is
 just buffered up in the kernel, but it is not producing any more.  So
 I'll loop it.
 no problems so far ...  dd if=/dev/srandom ... no, still can use gpg!
 but third symptom is definitely here:
 *  total system molassas.
 Let's look at ntpq -pn:
 not bad.
 Well, so, let's see if the system really does stop collecting random
 data.  PERHAPS my patch to the kernel will continue collecting random
 data, but not fix the actual bug.  That would be at least nice for me
 ... able to sign this message, for instance.
 Sorry, it's late -- I've been up a long time.  I went to sleep early
 ntpq -pn still not bad ...
 OK!  Now, had trouble using my mailer, and still don't know the error,
 but was in the process of reboot (kill -USR1 1 while in X) when the
 system froze on me and would not respond to even that little kernel
 fault debugger thing ("boot sync" didn't seem to do anything, as well
 as other variations).  I
 Then, while coming up, my good old Dell system told me:
 CMOS Time & Date Not Set
 I ignored this, and got an interesting quote from "ntpdate" while the
 bootup scripts were setting up for the NTP dragon to start breathing:
 offset -287998.536487s
 What that means to me is that those 3.3333 days were gained (or lost?)
 during this process.  Your program for 65535 iterations only found the
 clock about +5m though after running, only about 3-4m of which could
 be attributed to the program itself (it ran for less than 5m).
 right NOW after this reboot, everything is working fine (top, systat
 vmstat, and ntp).  Let's see if this gets signed & sent now.
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Version: GnuPG v1.0.4 (OpenBSD)
 Comment: For info see http://www.gnupg.org