[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex



Interesting problem.

To collapse these strings properly, one must check to see whether the 
ENTIRE string consists of N repetitions of a pattern. (You can't
collapse it otherwise.) If the length of the string is prime, it can't be 
collapsed unless it consists entirely of the same character.

The easiest way to do this is not with regular expressions, but with
an algorithm that works something like this:

1. Set L equal to the length of the string. If L is zero, exit.
2. Set S equal to 1.
3. Check to see if the string consists entirely of its first S
    characters, repeated L/S times. If so, return the first S 
    characters of the string and exit.
4. Increment S.
5. If S is greater than L/2, return the original string and exit.
6. If S doesn't divide L evenly (that is, if L mod S isn't zero),
    go to step 4.
7. Go to step 3.

You can play tricks with factorization to avoid the O(L/2) hunt for
factors of L, but it probably isn't worth it unless the strings
you're reducing are huge.

If you use regular expressions, you'll want to use a "non-greedy" 
match on the repeating substring (so that you try the smallest 
possible one first) and then make sure that the pattern repeats at
least twice and makes up the entire string. In Perl, you might 
use something like

s/^([NWES]+?){2,}$/$1/

(The ^ and $ assertions make sure that the ENTIRE string is matched.)

--Brett Glass

At 01:14 PM 1/14/2000 , Trevor R.H. Clarke wrote:
   
> >  in perl I would probably do something like:
> >  $str=~ s/N+/N/g; 
> >  $str=~ s/S+/S/g; 
> >  $str=~ s/E+/E/g; 
> >  $str=~ s/W+/W/g; 
> > 
> > maybe there's a smarter way to do it though.
> > 
>
>That doesn't solve the problem....that just recognizes arbritrary strings of
>the same letter. Here are some examples of strings and their simpified
>versions.
>
>SSS       = S
>SSSE      = SSSE
>EEESEEES  = EEES
>ESESESES  = ES
>NEESSEESS = NEESSEESS
>------------------
>Trevor R.H. Clarke                     Computer Science House
>Rochester Institute of Technology      Systems Programmer for ISC
>retrev@csh.rit.edu                     trcsys@rit.edu
>http://www.csh.rit.edu/~retrev/        finger retrev@csh.rit.edu for PGP key