[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

find -exec surprisingly slow



At 8:31 AM +0930 8/15/04, Paul A. Hoadley wrote:
>Hello,
>
>I'm in the process of cleaning a Maildir full of spam.  It has
>somewhere in the vicinity of 400K files in it.  I started running
>this yesterday:
>
>find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \;
>
>It's been running for well over 12 hours.  It certainly is
>working---the spams are slowly moving to their new home---but
>it is taking a long time.  It's a very modest system, running
>4.8-R on a P2-350.  I assume this is all overhead for spawning
>a shell and running mv 400K times.

Some of it is that, and some of it is the performance-penalty of
deleting files from a directory which has 400K filenames in it,
only to add the same files into a directory which will eventually
have 400K filenames in it.  Directory adds/deletes are not fast
when a directory has that many filenames.  It is probably even
worse if there are other processes still working on the same
directory (such as sendmail importing more mail).

Where is '.' in the above `find .' command?  Is it is on the same
partition as /home/paulh/tmp/spam/sne/ ?

You may find it much faster to do something like:
     mkdir usermail.new
     chown user:group usermail.new
     mv usermail usermail.bigspam
     mv usermail.new usermail
     cd usermail.bigspam
     find . \! -atime +1 -exec mv {} ../usermail \;

My assumption there is that you have a LOT fewer "good files" than
you have "bad files", so there will be fewer files to move.  But I
am also making the assumption that all your files are in a single
directory (and not a tree of directories), which may be a bad
assumption.

>Is there a better way to move all files based on some characteristic
>of their date stamp?  Maybe separating the find and the move, piping
>it through xargs?

The thing to use is the '-J' option of xargs.  That way you can
have the destination-directory be the last argument in the command
that gets executed, and yet you're still moving as many files in
a single `mv' command as possible.  E.g., change my earlier `find'
command to:
     find . \! -atime +1 -print0 | xargs -0J[] mv [] ../usermail

Check the man page for xargs for a description of -J

-- 
Garance Alistair Drosehn            =   gad_(_at_)_gilead_(_dot_)_netel_(_dot_)_rpi_(_dot_)_edu
Senior Systems Programmer           or  gad_(_at_)_freebsd_(_dot_)_org
Rensselaer Polytechnic Institute    or  drosih_(_at_)_rpi_(_dot_)_edu

Visit your host, monkey.org