정렬되지 않은 배열보다 정렬 된 배열을 * 느리게 * 처리하는 이유는 무엇입니까? (자바의 ArrayList.indexOf)

Question 1

제목은 왜 정렬되지 않은 배열보다 정렬 된 배열을 처리하는 것이 더 빠릅니까?

이것도 분기 예측 효과입니까? 주의 : 여기 정렬 된 배열의 처리 속도 가 느립니다 !!

다음 코드를 고려하십시오.

private static final int LIST_LENGTH = 1000 * 1000;
private static final long SLOW_ITERATION_MILLIS = 1000L * 10L;

@Test
public void testBinarySearch() {
    Random r = new Random(0);
    List<Double> list = new ArrayList<>(LIST_LENGTH);
    for (int i = 0; i < LIST_LENGTH; i++) {
        list.add(r.nextDouble());
    }
    //Collections.sort(list);
    // remove possible artifacts due to the sorting call
    // and rebuild the list from scratch:
    list = new ArrayList<>(list);

    int nIterations = 0;
    long startTime = System.currentTimeMillis();
    do {
        int index = r.nextInt(LIST_LENGTH);
        assertEquals(index, list.indexOf(list.get(index)));
        nIterations++;
    } while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
    long duration = System.currentTimeMillis() - startTime;
    double slowFindsPerSec = (double) nIterations / duration * 1000;
    System.out.println(slowFindsPerSec);

    ...
}

이것은 내 컴퓨터에서 약 720의 값을 인쇄합니다.

이제 컬렉션 정렬 호출을 활성화하면 그 값이 142로 떨어집니다. 왜?!?

결과 는 결정적이며 반복 횟수 / 시간을 늘려도 변경되지 않습니다.

Java 버전은 1.8.0_71 (Oracle VM, 64 비트)이며 Windows 10, Eclipse Mars의 JUnit 테스트에서 실행됩니다.

최신 정보

연속적인 메모리 액세스와 관련이있는 것 같습니다 (순차적으로 액세스되는 이중 객체와 임의의 순서로 액세스 됨). 약 10k 이하의 어레이 길이에서 효과가 사라지기 시작합니다.

결과 를 제공 한 assylias에게 감사드립니다 .

/**
 * Benchmark                     Mode  Cnt  Score   Error  Units
 * SO35018999.shuffled           avgt   10  8.895 ± 1.534  ms/op
 * SO35018999.sorted             avgt   10  8.093 ± 3.093  ms/op
 * SO35018999.sorted_contiguous  avgt   10  1.665 ± 0.397  ms/op
 * SO35018999.unsorted           avgt   10  2.700 ± 0.302  ms/op
 */

Question 2

캐싱 / 프리 페치 효과처럼 보입니다.

단서는 double (원시)이 아닌 Double (객체)을 비교한다는 것입니다. 한 스레드에 개체를 할당하면 일반적으로 메모리에 순차적으로 할당됩니다. 따라서 indexOf목록을 스캔 할 때 순차 메모리 주소를 통과합니다. 이는 CPU 캐시 프리 페치 휴리스틱에 적합합니다.

그러나 목록을 정렬 한 후에도 평균적으로 동일한 수의 메모리 조회를 수행해야하지만 이번에는 메모리 액세스가 임의의 순서로 이루어집니다.

최신 정보

다음은 할당 된 객체의 순서가 중요하다는 것을 증명하는 벤치 마크 입니다.

Benchmark            (generator)  (length)  (postprocess)  Mode  Cnt  Score   Error  Units
ListIndexOf.indexOf       random   1000000           none  avgt   10  1,243 ± 0,031  ms/op
ListIndexOf.indexOf       random   1000000           sort  avgt   10  6,496 ± 0,456  ms/op
ListIndexOf.indexOf       random   1000000        shuffle  avgt   10  6,485 ± 0,412  ms/op
ListIndexOf.indexOf   sequential   1000000           none  avgt   10  1,249 ± 0,053  ms/op
ListIndexOf.indexOf   sequential   1000000           sort  avgt   10  1,247 ± 0,037  ms/op
ListIndexOf.indexOf   sequential   1000000        shuffle  avgt   10  6,579 ± 0,448  ms/op

Question 3

메모리 캐시 미스의 영향을보고 있다고 생각합니다.

정렬되지 않은 목록을 만들 때

for (int i = 0; i < LIST_LENGTH; i++) {
    list.add(r.nextDouble());
}

모든 double은 인접한 메모리 영역에 할당 될 가능성이 높습니다. 이를 반복하면 캐시 누락이 거의 발생하지 않습니다.

반면에 정렬 된 목록에서 참조는 혼란스러운 방식으로 메모리를 가리 킵니다.

이제 연속 메모리가있는 정렬 된 목록을 만드는 경우 :

Collection.sort(list);
List<Double> list2 = new ArrayList<>();
for (int i = 0; i < LIST_LENGTH; i++) {
    list2.add(new Double(list.get(i).doubleValue()));
}

이 정렬 된 목록은 원래 목록 (내 타이밍)과 성능이 동일합니다.

Question 4

간단한 예를 들어 그 확인하는 wero 의해 답변 과 apangin 의해 답변 (+1!) : 다음은 두 옵션의 간단한 비교를 수행 :

난수 생성 및 선택적으로 정렬
순차 번호를 만들고 선택적으로 섞기

또한 JMH 벤치 마크로 구현되지는 않지만 원본 코드와 유사하며 효과를 관찰하기 위해 약간만 수정하면됩니다.

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;

public class SortedListTest
{
    private static final long SLOW_ITERATION_MILLIS = 1000L * 3L;

    public static void main(String[] args)
    {
        int size = 100000;
        testBinarySearchOriginal(size, true);
        testBinarySearchOriginal(size, false);
        testBinarySearchShuffled(size, true);
        testBinarySearchShuffled(size, false);
    }

    public static void testBinarySearchOriginal(int size, boolean sort)
    {
        Random r = new Random(0);
        List<Double> list = new ArrayList<>(size);
        for (int i = 0; i < size; i++)
        {
            list.add(r.nextDouble());
        }
        if (sort)
        {
            Collections.sort(list);
        }
        list = new ArrayList<>(list);

        int count = 0;
        int nIterations = 0;
        long startTime = System.currentTimeMillis();
        do
        {
            int index = r.nextInt(size);
            if (index == list.indexOf(list.get(index)))
            {
                count++;
            }
            nIterations++;
        }
        while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
        long duration = System.currentTimeMillis() - startTime;
        double slowFindsPerSec = (double) nIterations / duration * 1000;

        System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
            size, sort, slowFindsPerSec, count);
    }

    public static void testBinarySearchShuffled(int size, boolean sort)
    {
        Random r = new Random(0);
        List<Double> list = new ArrayList<>(size);
        for (int i = 0; i < size; i++)
        {
            list.add((double) i / size);
        }
        if (!sort)
        {
            Collections.shuffle(list);
        }
        list = new ArrayList<>(list);

        int count = 0;
        int nIterations = 0;
        long startTime = System.currentTimeMillis();
        do
        {
            int index = r.nextInt(size);
            if (index == list.indexOf(list.get(index)))
            {
                count++;
            }
            nIterations++;
        }
        while (System.currentTimeMillis() < startTime + SLOW_ITERATION_MILLIS);
        long duration = System.currentTimeMillis() - startTime;
        double slowFindsPerSec = (double) nIterations / duration * 1000;

        System.out.printf("Size %8d sort %5s iterations %10.3f count %10d\n",
            size, sort, slowFindsPerSec, count);
    }

}

내 컴퓨터의 출력은

Size   100000 sort  true iterations   8560,333 count      25681
Size   100000 sort false iterations  19358,667 count      58076
Size   100000 sort  true iterations  18554,000 count      55662
Size   100000 sort false iterations   8845,333 count      26536

타이밍이 다른 타이밍과 정확히 반대임을 멋지게 보여줍니다. 임의의 숫자가 정렬되면 정렬 된 버전이 더 느립니다. 순차 번호를 섞으면 섞인 버전이 더 느립니다.