indexOf 대소 문자를 구분합니까?

Question 1

indexOf (String) 메서드는 대소 문자를 구분합니까? 그렇다면 대소 문자를 구분하지 않는 버전이 있습니까?

Question 2

indexOf()방법은 모두 대소 문자를 구분합니다. 사전에 문자열을 대 / 소문자로 변환하여 대 / 소문자를 구분하지 않고 대소 문자를 구분하지 않도록 만들 수 있습니다.

s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);

Question 3

indexOf (String) 메서드는 대소 문자를 구분합니까?

예, 대소 문자를 구분합니다.

@Test
public void indexOfIsCaseSensitive() {
    assertTrue("Hello World!".indexOf("Hello") != -1);
    assertTrue("Hello World!".indexOf("hello") == -1);
}

그렇다면 대소 문자를 구분하지 않는 버전이 있습니까?

아니, 없습니다. indexOf를 호출하기 전에 두 문자열을 모두 소문자로 변환 할 수 있습니다.

@Test
public void caseInsensitiveIndexOf() {
    assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
    assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}

Question 4

Apache Commons Lang 라이브러리의 StringUtils 클래스에는 대소 문자 무시 메소드가 있습니다.

indexOfIgnoreCase (CharSequence str, CharSequence searchStr)

Question 5

예, indexOf대소 문자를 구분합니다.

내가 찾은 대소 문자 무감각을 수행하는 가장 좋은 방법은 다음과 같습니다.

String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());

대소 문자를 구분하지 않습니다 indexOf().

Question 6

여기에 힙 메모리를 할당하지 않는 솔루션이 있으므로 여기에 언급 된 대부분의 다른 구현보다 훨씬 빠릅니다.

public static int indexOfIgnoreCase(final String haystack,
                                    final String needle) {
    if (needle.isEmpty() || haystack.isEmpty()) {
        // Fallback to legacy behavior.
        return haystack.indexOf(needle);
    }

    for (int i = 0; i < haystack.length(); ++i) {
        // Early out, if possible.
        if (i + needle.length() > haystack.length()) {
            return -1;
        }

        // Attempt to match substring starting at position i of haystack.
        int j = 0;
        int ii = i;
        while (ii < haystack.length() && j < needle.length()) {
            char c = Character.toLowerCase(haystack.charAt(ii));
            char c2 = Character.toLowerCase(needle.charAt(j));
            if (c != c2) {
                break;
            }
            j++;
            ii++;
        }
        // Walked all the way to the end of the needle, return the start
        // position that this was found.
        if (j == needle.length()) {
            return i;
        }
    }

    return -1;
}

그리고 여기에 올바른 동작을 확인하는 단위 테스트가 있습니다.

@Test
public void testIndexOfIgnoreCase() {
    assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));

    assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));

    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
    assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));  
}

Question 7

예, 대소 문자를 구분합니다. indexOf검색하기 전에 String 및 String 매개 변수를 모두 대문자로 변환하여 대소 문자를 구분하지 않을 수 있습니다 .

String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());

일부 상황에서는 toUpperCase가 작동하지 않을 수 있습니다. 예를 들면 다음과 같습니다.

String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());

idxU는 20이 될 것입니다. idxL은 19이며 맞습니다. 문제의 원인은 toUpperCase ()가 "ß"문자를 "SS"라는 두 문자로 변환하고 이로 인해 색인이 해제된다는 것입니다.

따라서 항상 toLowerCase ()를 고수하십시오.

Question 8

일단 반환 된 인덱스 값으로 무엇을하고 있습니까?

문자열을 조작하는 데 사용하는 경우 대신 정규식을 사용할 수 없습니까?

import static org.junit.Assert.assertEquals;    
import org.junit.Test;

public class StringIndexOfRegexpTest {

    @Test
    public void testNastyIndexOfBasedReplace() {
        final String source = "Hello World";
        final int index = source.toLowerCase().indexOf("hello".toLowerCase());
        final String target = "Hi".concat(source.substring(index
                + "hello".length(), source.length()));
        assertEquals("Hi World", target);
    }

    @Test
    public void testSimpleRegexpBasedReplace() {
        final String source = "Hello World";
        final String target = source.replaceFirst("(?i)hello", "Hi");
        assertEquals("Hi World", target);
    }
}

Question 9

방금 출처를 살펴 보았습니다. 문자를 비교하므로 대소 문자를 구분합니다.

Question 10

@Test
public void testIndexofCaseSensitive() {
    TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}

Question 11

예, 확실합니다. 표준 라이브러리를 사용하여이를 해결하는 한 가지 방법은 다음과 같습니다.

int index = str.toUpperCase().indexOf("FOO");

Question 12

같은 문제가있었습니다. 정규식과 아파치 StringUtils.indexOfIgnoreCase-Method를 시도했지만 둘 다 꽤 느 렸습니다 ... 그래서 직접 짧은 메소드를 작성했습니다 ... :

public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
    if (chkstr != null && searchStr != null && i > -1) {
          int serchStrLength = searchStr.length();
          char[] searchCharLc = new char[serchStrLength];
          char[] searchCharUc = new char[serchStrLength];
          searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
          searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
          int j = 0;
          for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
                char charAt = chkstr.charAt(i);
                if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
                     if (++j == serchStrLength) {
                           return i - j + 1;
                     }
                } else { // faster than: else if (j != 0) {
                         i = i - j;
                         j = 0;
                    }
              }
        }
        return -1;
  }

내 테스트에 따르면 훨씬 더 빠릅니다 ... (적어도 searchString이 다소 짧은 경우). 개선이나 버그에 대한 제안이 있으면 알려 주시면 좋을 것입니다 ... (이 코드를 응용 프로그램에서 사용하기 때문에 ;-)

Question 13

첫 번째 질문은 이미 여러 번 답변되었습니다. 예, String.indexOf()방법은 모두 대소 문자를 구분합니다.

로케일을 구분해야하는 경우 Collator를indexOf() 사용할 수 있습니다 . 설정 한 강도 값에 따라 대 / 소문자를 구분하지 않는 비교를 얻을 수 있으며, 악센트가있는 문자를 악센트가없는 문자와 동일하게 처리 할 수 있습니다. 다음은이를 수행하는 방법의 예입니다.

private int indexOf(String original, String search) {
    Collator collator = Collator.getInstance();
    collator.setStrength(Collator.PRIMARY);
    for (int i = 0; i <= original.length() - search.length(); i++) {
        if (collator.equals(search, original.substring(i, i + search.length()))) {
            return i;
        }
    }
    return -1;
}

Question 14

요약하자면 3 가지 솔루션 :

toLowerCase () 또는 toUpperCase 사용
아파치의 StringUtils 사용
정규식 사용

자, 제가 궁금한 것은 어느 것이 가장 빠른 것인지? 나는 평균적으로 첫 번째 것을 추측하고 있습니다.

Question 15

그러나 작성하는 것은 어렵지 않습니다.

public class CaseInsensitiveIndexOfTest extends TestCase {
    public void testOne() throws Exception {
        assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
    }

    public static int caseInsensitiveIndexOf(String substring, String string) {
        return string.toLowerCase().indexOf(substring.toLowerCase());
    }
}

Question 16

두 문자열을 모두 소문자로 변환하는 것은 일반적으로 큰 문제는 아니지만 일부 문자열이 길면 속도가 느립니다. 그리고 이것을 루프로한다면 정말 나쁠 것입니다. 이러한 이유로 indexOfIgnoreCase.

Question 17

 static string Search(string factMessage, string b)
        {

            int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
            string line = null;
            int i = index;
            if (i == -1)
            { return "not matched"; }
            else
            {
                while (factMessage[i] != ' ')
                {
                    line = line + factMessage[i];
                    i++;
                }

                return line;
            }

        }

Question 18

다음은 Apache의 StringUtils 버전과 매우 유사한 버전입니다.

public int indexOfIgnoreCase(String str, String searchStr) {
    return indexOfIgnoreCase(str, searchStr, 0);
}

public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
    // /programming/14018478/string-contains-ignore-case/14018511
    if(str == null || searchStr == null) return -1;
    if (searchStr.length() == 0) return fromIndex;  // empty string found; use same behavior as Apache StringUtils
    final int endLimit = str.length() - searchStr.length() + 1;
    for (int i = fromIndex; i < endLimit; i++) {
        if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
    }
    return -1;
}

Question 19

나는 지금까지 실제로 작동하는 유일한 해결책을 게시하고 싶다고 주장하고 싶습니다. :-)

처리해야 할 문제의 세 가지 부류.

소문자 및 대문자에 대한 비전 이적 일치 규칙. 터키 I 문제는 다른 답변에서 자주 언급되었습니다. String.regionMatches에 대한 Android 소스의 주석에 따르면 그루지야 어 비교 규칙은 대소 문자를 구분하지 않는 동등성을 비교할 때 소문자로 추가 변환해야합니다.
대문자와 소문자 형식의 문자 수가 다른 경우. 지금까지 게시 된 거의 모든 솔루션이 이러한 경우에 실패합니다. 예 : 독일어 STRASSE 대 Straße는 대소 문자를 구분하지 않지만 길이가 다릅니다.
악센트 부호가있는 문자의 바인딩 강도. 액센트 일치 여부에 관계없이 로케일 및 컨텍스트 효과. 프랑스어에서 'é'의 대문자 형태는 'E'이지만 대문자 악센트를 사용하는 움직임이 있습니다. 캐나다 프랑스어에서 'é'의 대문자 형식은 예외없이 'É'입니다. 두 국가의 사용자는 검색 할 때 "e"가 "é"와 일치 할 것으로 예상합니다. 악센트가있는 문자와 악센트가없는 문자가 일치하는지 여부는 로케일에 따라 다릅니다. 이제 "E"가 "É"와 같습니까? 예. 그렇습니다. 어쨌든 프랑스어 로케일에서는.

현재 android.icu.text.StringSearch대소 문자를 구분하지 않는 indexOf 작업의 이전 구현을 올바르게 구현 하는 데 사용 하고 있습니다.

Android가 아닌 사용자는 com.ibm.icu.text.StringSearch클래스를 사용하여 ICU4J 패키지를 통해 동일한 기능에 액세스 할 수 있습니다 .

Android와 JRE 모두 다른 네임 스페이스 (예 : Collator)에 동일한 이름의 클래스가 있으므로 올바른 icu 패키지 ( android.icu.text또는 com.ibm.icu.text)의 클래스를 참조해야합니다 .

    this.collator = (RuleBasedCollator)Collator.getInstance(locale);
    this.collator.setStrength(Collator.PRIMARY);

    ....

    StringSearch search = new StringSearch(
         pattern,
         new StringCharacterIterator(targetText),
         collator);
    int index = search.first();
    if (index != SearchString.DONE)
    {
        // remember that the match length may NOT equal the pattern length.
        length = search.getMatchLength();
        .... 
    }

테스트 케이스 (로케일, 패턴, 대상 텍스트, expectedResult) :

    testMatch(Locale.US,"AbCde","aBcDe",true);
    testMatch(Locale.US,"éèê","EEE",true);

    testMatch(Locale.GERMAN,"STRASSE","Straße",true);
    testMatch(Locale.FRENCH,"éèê","EEE",true);
    testMatch(Locale.FRENCH,"EEE","éèê",true);
    testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);

    testMatch(new Locale("tr-TR"),"TITLE","tıtle",true);  // Turkish dotless I/i
    testMatch(new Locale("tr-TR"),"TİTLE","title",true);  // Turkish dotted I/i
    testMatch(new Locale("tr-TR"),"TITLE","title",false);  // Dotless-I != dotted i.

추신 : 내가 결정할 수있는 한, PRIMARY 바인딩 강도는 로케일 특정 규칙이 사전 규칙에 따라 악센트 부호가있는 문자와 비 강세 부호가없는 문자를 구분할 때 올바른 일을해야합니다. 하지만이 전제를 테스트하는 데 사용할 로케일은 없습니다. 기증 된 테스트 케이스는 감사하게 생각합니다.

Question 20

indexOf는 대소 문자를 구분합니다. 목록의 요소를 비교하기 위해 equals 메소드를 사용하기 때문입니다. 포함 및 제거도 마찬가지입니다.