[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex
Interesting problem.
To collapse these strings properly, one must check to see whether the
ENTIRE string consists of N repetitions of a pattern. (You can't
collapse it otherwise.) If the length of the string is prime, it can't be
collapsed unless it consists entirely of the same character.
The easiest way to do this is not with regular expressions, but with
an algorithm that works something like this:
1. Set L equal to the length of the string. If L is zero, exit.
2. Set S equal to 1.
3. Check to see if the string consists entirely of its first S
characters, repeated L/S times. If so, return the first S
characters of the string and exit.
4. Increment S.
5. If S is greater than L/2, return the original string and exit.
6. If S doesn't divide L evenly (that is, if L mod S isn't zero),
go to step 4.
7. Go to step 3.
You can play tricks with factorization to avoid the O(L/2) hunt for
factors of L, but it probably isn't worth it unless the strings
you're reducing are huge.
If you use regular expressions, you'll want to use a "non-greedy"
match on the repeating substring (so that you try the smallest
possible one first) and then make sure that the pattern repeats at
least twice and makes up the entire string. In Perl, you might
use something like
s/^([NWES]+?){2,}$/$1/
(The ^ and $ assertions make sure that the ENTIRE string is matched.)
--Brett Glass
At 01:14 PM 1/14/2000 , Trevor R.H. Clarke wrote:
> > in perl I would probably do something like:
> > $str=~ s/N+/N/g;
> > $str=~ s/S+/S/g;
> > $str=~ s/E+/E/g;
> > $str=~ s/W+/W/g;
> >
> > maybe there's a smarter way to do it though.
> >
>
>That doesn't solve the problem....that just recognizes arbritrary strings of
>the same letter. Here are some examples of strings and their simpified
>versions.
>
>SSS = S
>SSSE = SSSE
>EEESEEES = EEES
>ESESESES = ES
>NEESSEESS = NEESSEESS
>------------------
>Trevor R.H. Clarke Computer Science House
>Rochester Institute of Technology Systems Programmer for ISC
>retrev@csh.rit.edu trcsys@rit.edu
>http://www.csh.rit.edu/~retrev/ finger retrev@csh.rit.edu for PGP key
- References:
- Re: regex
- From: CyberPsychotic <fygrave@epr0.org>
- Re: regex
- From: "Trevor R.H. Clarke" <retrev@csh.rit.edu>