정규식 : 평등주의 시리즈와 일치

소개

나는 여기에 많은 정규 표현식 문제를 보지 못하기 때문에 많은 정규 표현식을 사용하여 여러 가지 방법으로 수행 할 수있는 믿을 수 없을 정도로 간단한 것을 제안하고 싶습니다. 나는 그것이 정규식 애호가들에게 약간의 재미있는 골프 시간을 제공하기를 바랍니다.

도전

문제는 내가 "느슨한"시리즈라고 불렀던 것, 즉 동일한 수의 다른 캐릭터를 일치시키는 것입니다. 이것은 예제와 함께 가장 잘 설명됩니다.

시합:

aaabbbccc
xyz 
iillppddff
ggggggoooooollllllffffff
abc
banana

일치하지 않습니다 :

aabc
xxxyyzzz
iilllpppddff
ggggggoooooollllllfff
aaaaaabbbccc
aaabbbc
abbaa
aabbbc

일반화하기 위해, 우리는 (형식의 제목을 일치시킬 문자의 목록 을 경우, 모든c₁)ⁿ(c₂)ⁿ(c₃)ⁿ...(c_k)ⁿc₁c_kc_i != c_i+1i, k > 1, and n > 0.

설명 :

입력이 비어 있지 않습니다.
문자열에서 나중에 문자가 반복 될 수 있습니다 (예 : "banana")
k > 1따라서 문자열에는 항상 최소 2 개의 다른 문자가 있습니다.
ASCII 문자 만 입력으로 전달되고 문자는 줄 종결자가 아니라고 가정 할 수 있습니다.

규칙

(이 훌륭한 규칙 블록에 대해 Martin Ender에게 감사합니다)

답은 추가 코드없이 단일 정규식으로 구성되어야합니다 (선택적으로 솔루션 작동에 필요한 정규식 수정 자 목록 제외). 호스팅 언어로 코드를 호출 할 수있는 언어 정규식 기능 (예 : Perl e수정 자)을 사용해서는 안됩니다 .

이 도전 전에 존재했던 정규식 풍미를 사용할 수 있지만, 풍미를 지정하십시오.

정규식이 암시 적으로 고정되어 있다고 가정하지 마십시오. 예를 들어 Python을 사용하는 경우 정규식이 re.match가 아닌 re.search와 함께 사용된다고 가정하십시오. 정규 표현식은 유효한 평등주의 문자열의 경우 전체 문자열과 일치해야하며 유효하지 않은 문자열과 일치하지 않아야합니다. 원하는만큼 캡처 그룹을 사용할 수 있습니다.

입력은 항상 줄 종결자를 포함하지 않는 두 개 이상의 ASCII 문자로 구성된 문자열이라고 가정 할 수 있습니다.

이것은 정규식 골프이므로 바이트 단위의 가장 짧은 정규식이 이깁니다. 언어에서 /.../정규식을 나타 내기 위해 분리 문자 (일반적으로 )가 필요한 경우 분리 문자 자체를 계산하지 마십시오. 솔루션에 수정자가 필요한 경우 수정 자당 1 바이트를 추가하십시오.

기준

이것은 좋은 ol'fashioned 골프이므로 효율성을 잊고 정규식을 가능한 한 작게 만드십시오.

사용한 정규 표현식의 풍미를 언급하고 가능한 경우 실제 표현 표현의 온라인 데모를 보여주는 링크를 포함하십시오.

code-golf string regular-expression

— jaytea
소스

이것이 정규식 골프입니까? 규칙과 함께이를 분명히해야합니다. 이 사이트의 대부분의 과제는 여러 프로그래밍 언어의 골프입니다.

— LyricLy

@LyricLy 조언을 주셔서 감사합니다! 예, 순전히 정규식 이길 바랍니다. 제출자가 선택한 정규 표현식의 단일 정규 표현식. 포함해야 할 다른 규칙이 있습니까?

— jaytea

나는 평등주의와 같은 "평등 주의자"에 대한 당신의 정의를 이해하지 못합니다 banana.

— msh210

@ msh210 시리즈를 설명하기 위해 "평등 주의자"라는 용어를 만들었을 때 시리즈의 후반부 (예 : "바나나"또는 "aaabbbcccaaa"등)에 문자를 반복 할 수 있다고 생각하지 않았습니다. . 나는 반복되는 문자의 모든 덩어리가 같은 크기라는 생각을 표현하기 위해 용어를 원했습니다. "바나나"에는 반복되는 문자가 없으므로이 정의는 그대로 적용됩니다.

— jaytea

답변:

.NET 풍미, 48 바이트

^(.)\1*((?<=(\5())*(.))(.)(?<-4>\6)*(?!\4|\6))+$

온라인으로 사용해보십시오! ( 레티 나 사용 )

글쎄, 논리를 부정 하지 않는 것이 결국 더 간단 하다는 것이 밝혀졌습니다 . 두 가지 접근법이 완전히 다르기 때문에 이것을 별도의 답변으로 만들고 있습니다.

설명

^            # Anchor the match to the beginning of the string.
(.)\1*       # Match the first run of identical characters. In principle, 
             # it's possible that this matches only half, a quarter, an 
             # eighth etc of of the first run, but that won't affect the 
             # result of the match (in other words, if the match fails with 
             # matching this as the entire first run, then backtracking into
             # only matching half of it won't cause the rest of the regex to
             # match either).
(            # Match this part one or more times. Each instance matches one
             # run of identical letters.
  (?<=       #   We start with a lookbehind to record the length
             #   of the preceding run. Remember that the lookbehind
             #   should be read from the bottom up (and so should
             #   my comments).
    (\5())*  #     And then we match all of its adjacent copies, pushing an
             #     empty capture onto stack 4 each time. That means at the
             #     end of the lookbehind, we will have n-1 captures stack 4, 
             #     where n is the length of the preceding run. Due to the 
             #     atomic nature of lookbehinds, we don't have to worry 
             #     about backtracking matching less than n-1 copies here.
    (.)      #     We capture the character that makes up the preceding
             #     run in group 5.
  )
  (.)        #   Capture the character that makes up the next run in group 6.
  (?<-4>\6)* #   Match copies of that character while depleting stack 4.
             #   If the runs are the same length that means we need to be
             #   able to get to the end of the run at the same time we
             #   empty stack 4 completely.
  (?!\4|\6)  #   This lookahead ensures that. If stack 4 is not empty yet,
             #   \4 will match, because the captures are all empty, so the
             #   the backreference can't fail. If the stack is empty though,
             #   then the backreference will always fail. Similarly, if we
             #   are not at the end of the run yet, then \6 will match 
             #   another copy of the run. So we ensure that neither \4 nor
             #   \6 are possible at this position to assert that this run
             #   has the same length das the previous one.
)+
$            # Finally, we make sure that we can cover the entire string
             # by going through runs of identical lengths like this.

— 마틴 엔더
소스

나는 당신이 두 방법 사이에서 본 것을 좋아합니다! 또한 실제로 시도 할 때까지 부정적 접근 방식이 더 짧아야한다고 생각했는데, 더 단순 해 보이지만 훨씬 더 어색하다고 생각했습니다. 나는 PCRE에서 48b, 완전히 다른 방법으로 Perl에서 49b를 가지고 있으며 같은 크기의 .NET에서 세 번째 방법을 사용하면 이것이 매우 멋진 정규식 도전이라고 말할 수 있습니다 : D

— jaytea

@jaytea 나는 그것을보고 싶습니다. 일주일 정도 아무 것도 찾지 못하면 직접 게시하길 바랍니다. :) 그리고 동의합니다. 접근법이 바이트 수에 너무 가깝다는 것이 좋습니다.

— Martin Ender

나는 단지 할지도 모른다! 또한, Perl은 46b로 골프를 쳤다;)

— jaytea

그래서 지금 당신이 이것들을보고 싶을 것 같아요! PCRE의 48b는 다음과 같습니다. 45b 대신 ((^.|\2(?=.*\4\3)|\4(?!\3))(?=\2*+((.)\3?)))+\3$실험하고 있었지만 "aabbbc"에서는 실패 했습니다. PCRE는 PCRE가 무기한으로 재귀 할 수 있다고 생각 하는 반면, Perl은 조금 더 똑똑하고 용서하는 것 같습니다.\3*(?!\3)^((?=(.)\2*(.))(?=(\2(?4)?\3)(?!\3))\2+)+\3+$(\2(?4)?\3)

— jaytea

@ jaytea 아, 정말 깔끔한 솔루션입니다. 실제로 별도의 답변으로 게시해야합니다. :)

— Martin Ender 2016

.NET 풍미, 54 바이트

^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+

온라인으로 사용해보십시오! ( 레티 나 사용 )

나는 이것이 차선책이라고 확신하지만, 지금은 그룹 균형을 잡기 위해 최선을 다하고 있습니다. 동일한 바이트 수로 하나의 대안이 있습니다. 대부분 동일합니다.

^(?!.*(?<=(\3())*(.))(?!\3)(?>(.)(?<-2>\4)*)(\2|\4)).+

설명

주요 아이디어는 문제를 반전시키고 비평 등주의 문자열을 일치시키고 결과를 부정하기 위해 모든 것을 부정적 예측에 넣는 것입니다. 모든 실행의 길이가 같은지 확인 하기 위해 전체 문자열 에서 n 을 추적 할 필요가 없다는 이점이 있습니다 (밸런싱 그룹의 특성으로 인해 일반적으로 n을 소비 할 때). 대신 길이 가 같지 않은 인접한 런의 단일 쌍을 찾습니다 . 그렇게하면 n을 한 번만 사용해야 합니다.

다음은 정규식에 대한 분석입니다.

^(?!.*         # This negative lookahead means that we will match
               # all strings where the pattern inside the lookahead
               # would fail if it were used as a regex on its own.
               # Due to the .* that inner regex can match from any
               # position inside the string. The particular position
               # we're looking for is between two runs (and this
               # will be ensured later).

  (?<=         #   We start with a lookbehind to record the length
               #   of the preceding run. Remember that the lookbehind
               #   should be read from the bottom up (and so should
               #   my comments).
    (\2)*      #     And then we match all of its adjacent copies, capturing
               #     them separately in group 1. That means at the
               #     end of the lookbehind, we will have n-1 captures
               #     on stack 1, where n is the length of the preceding
               #     run. Due to the atomic nature of lookbehinds, we
               #     don't have to worry about backtracking matching
               #     less than n-1 copies here.
    (.)        #     We capture the character that makes up the preceding
               #     run in group 2.
  )
  (?!\2)       #   Make sure the next character isn't the same as the one
               #   we used for the preceding run. This ensures we're at a
               #   boundary between runs.
  (?>          #   Match the next stuff with an atomic group to avoid
               #   backtracking.
    (.)        #     Capture the character that makes up the next run
               #     in group 3.
    (?<-1>\3)* #     Match as many of these characters as possible while
               #     depleting the captures on stack 1.
  )
               #   Due to the atomic group, there are three two possible
               #   situations that cause the previous quantifier to stopp
               #   matching. 
               #   Either the run has ended, or stack 1 has been depleted.
               #   If both of those are true, the runs are the same length,
               #   and we don't actually want a match here. But if the runs
               #   are of different lengths than either the run ended but
               #   the stack isn't empty yet, or the stack was depleted but
               #   the run hasn't ended yet.
  (?(1)|\3)    #   This conditional matches these last two cases. If there's
               #   still a capture on stack 1, we don't match anything,
               #   because we know this run was shorter than the previous
               #   one. But if stack 1, we want to match another copy of 
               #   the character in this run to ensure that this run is 
               #   longer than the previous one.
)
.+             # Finally we just match the entire string to comply with the
               # challenge spec.

— 마틴 엔더
소스

나는 그것이 실패 만들려고 : banana, aba, bbbaaannnaaannnaaa, bbbaaannnaaannnaaaaaa, The Nineteenth Byte, 11, 110, ^(?!.*(?<=(\2)*(.))(?!\2)(?>(.)(?<-1>\3)*)(?(1)|\3)).+, bababa. 실패한 사람은 나야. :( +1

— Outgolfer Erik

그 설명을 마치고 정확한 반대 방법을 사용하여 1 바이트를 절약 할 수 있다고 생각하는 순간 ... 나는 조금 다른 대답을 할 것입니다 ... : |

— Martin Ender

@MartinEnder ... 그리고 당신이 2 바이트로 골프를 할 수 있음을 haha : P

— Mr. Xcoder

@ Mr.Xcoder 이제 7 바이트 여야하므로 안전하기를 바랍니다. ;)

— Martin Ender