[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is information on the number of downloads available?



Paul Luo Li <lpl_(_at_)_andrew_(_dot_)_cmu_(_dot_)_edu> and
Siju George <sgeorge_(_dot_)_ml_(_at_)_gmail_(_dot_)_com> write:
...
> > For example, if release 3.5 is downloaded 4000 times and 50 bugs are
> > reported against 3.5, 3.6 is downloaded 2000 times and 25 bugs are reported
> > against 3.6, then maybe we an predict how many bugs for release 3.7 by
> > knowing the number of times it is downloaded.
...
> may be my doubt is silly because I have not considered some other things :))
...

You haven't considered a lot of things.  The number of downloads is
probably a better measure of promotional activities for openbsd than
anything else.  If you are looking for information that can predict bug
frequency, a better measure would be the # and pattern of commits done
by developers.

You won't be able to easily measure total downloads, because there is a
radial pattern of distribution with poor statistical availability in
most cases.  For instance, there are at least 2 distribution sites in
AFS - there is some access information for those volumes, but I doubt
the sites involved many effort to collect them, and I don't know you'd
you interpret those counts in any case.  There's no simple way to tell
if an AFS access was "somebody downloading all the files" to install
somewhere, or "somebody running grep for no good reason".  Some of
those downloads may be used for many hundreds or even thousands of
installs somewhere, other downloads may never be used.  It is doubtful
most of the people doing these downloads, even those setting up
internal mirrors at large sites, will be at all cooperative in terms of
telling you how they plan to use their download.  There are also people
who install from CDs -- which themselves could be reused again and
again, and people who buy CDs to support openbsd who then install from
ftp.  Final installation of downloads aren't even directly related to
the # of installed systems -- some people do multiple installs onto the
same machine as part of their regular maintenance, other people build
from source in CVS, yet others upgrade only rarely.

Not all installed systems are used the same.  Some people make relaxed
occasional use of their systems.  We've got an openbsd system here at
work for "comparitive" purposes which sits idle 100% of the time.
Other people torture their systems -- this can either be a deliberate
attempt to find problems in the system, or heavy use of some sort that
was not necessarily anticipated properly by the developers.

Another problem at looking at long-term use is correcting for unrelated
historical trends.  For instance, suppose there's a windows virus that
creeps through web sites harvesting email addresses.  Such a beast
could easily temporariliy way distort download statistics on openbsd.
There is almost certainly a different % of people who download and
install from CD's today than there was say 3 years ago.  Cheaper
internet bandwidth, or the current economic slump could make
downloading more attractive than CDs.  There could easily be other
changes in user population demographics that could affect reporting
rates as well - the % of users who are also programmers for instance.
An obvious sort of historical change is the range of supported architectures.
That's changed, and that right there has interesting implications,
because not all architectures have the same support or the same
kind of user population.

The # and pattern of commits is likely going to be the best predictor
of bugs.  The # of developers working on code might also be an
interesting variable to look at.  Much as we all hate to admit it, the
most likely place for a bug is in the last change made to a system.
You'll typically see that changes in any complex system are clumped -
either a change will not be complete itself and will need further
refinement to the same subsystem, or a change will cause problems in
other systems possible even affecting yet others so propagating
outwards.  Many bugs are also found by developers in the first place;
so it's not even a good assumption to assume that the # of installed
systems is related in any significant way to bug reporting frequency.
The OpenBSD release process is intended to mask some of these issues.
There is a concerted effort near release time to avoid making changes
that might propagate, and to concentrate instead on "stability"
changes.  Soon after a release, the opposite happens; sometimes changes
are made which are known to potentially to have an "interesting" impact
on other systems, with the intention of smoking out those problems.
For instance, elf and propolice are two such changes made in the past.
One of the reasons developers often find bugs first is that OpenBSD has
a proactive policy for finding bugs - developers are constantly looking
for new ways to find bugs in old established code thought to be good.
There is also a approval/testing/change process that is intended to
find many bugs before they even hit CVS, or failing that, at least not
long after they hit the snapshot process.  And, of course, these
processes being done by humans, they too are prone to their own sorts
of errors.

					-Marcus Watts



Visit your host, monkey.org