String.Join 대 StringBuilder : 어느 것이 더 빠릅니까?

Question 1

A의 이전 질문 서식 지정에 대한 double[][]CSV 형식으로, 그것은 제안 된 사용하는 것이 StringBuilder보다 빠른 것이다 String.Join. 이것이 사실입니까?

Question 2

짧은 대답 : 상황에 따라 다릅니다.

긴 대답 : 함께 연결할 문자열 배열 (구분 기호 사용)이 이미있는 경우 String.Join가장 빠른 방법입니다.

String.Join필요한 정확한 길이를 찾기 위해 모든 문자열을 살펴본 다음 다시 가서 모든 데이터를 복사 할 수 있습니다. 이는 추가 복사 가 필요 하지 않음을 의미 합니다. 유일한 단점은 수단이 잠재적으로 필요 이상으로 메모리 캐시에게 더 많은 시간을 불고있는 두 문자열을 통해 이동한다는 것이다.

당신이 경우 하지 않는 사전 배열로 문자열을 가지고, 그건 아마 더 빨리 사용하는 StringBuilder-하지만 그렇지 않은 경우 상황이있을 것이다. StringBuilder많은 복사 를하는 수단을 사용한다면 배열을 구축 한 다음 호출하는 String.Join것이 더 빠를 수 있습니다.

편집 : 이것은 단일 호출 String.Join과 StringBuilder.Append. 원래 질문에서는 두 가지 수준의 String.Join호출이 있었으므로 각 중첩 호출은 중간 문자열을 만들었습니다. 즉, 추측하기가 훨씬 더 복잡하고 어렵습니다. 나는 일반적인 데이터로 (복잡성 측면에서) 상당히 "승리"하는 것을보고 놀랄 것입니다.

편집 : 집에있을 때 가능한 한 고통스러운 벤치 마크를 작성하겠습니다 StringBuilder. 기본적으로 각 요소의 크기가 이전 요소의 약 두 배인 배열이 있고 올바르게 가져 오면 모든 추가 (구분자가 아닌 요소의 복사본을 강제 할 수 있어야합니다. 또한 고려되어야합니다). 이 시점에서는 단순한 문자열 연결만큼 나쁘지만 String.Join문제는 없습니다.

Question 3

다음 int[][]은 간단하게 사용 하는 테스트 장비입니다 . 첫 번째 결과 :

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

( double결과 업데이트 :)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(2048 * 64 * 150으로 업데이트)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

OptimizeForTesting이 활성화 된 경우 :

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

너무 빠르지 만 엄청나게 그렇게 많지는 않습니다. 리그 (콘솔, 릴리스 모드 등에서 실행) :

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

Question 4

나는 그렇게 생각하지 않는다. Reflector를 살펴보면 구현이 String.Join매우 최적화되어 있습니다. 또한 생성 할 문자열의 전체 크기를 미리 아는 추가 이점이 있으므로 재 할당이 필요하지 않습니다.

비교하기 위해 두 가지 테스트 방법을 만들었습니다.

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

각 메서드를 50 번 실행하여 size 배열을 전달했습니다 [2048][64]. 두 개의 배열에 대해이 작업을 수행했습니다. 하나는 0으로 채워지고 다른 하나는 임의의 값으로 채워집니다. 내 컴퓨터에서 다음과 같은 결과를 얻었습니다 (P4 3.0GHz, 단일 코어, HT 없음, CMD에서 릴리스 모드 실행).

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

배열의 크기 [2048][512]를으로 늘리고 반복 횟수를 10으로 줄이면 다음과 같은 결과를 얻었습니다.

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

결과는 반복 가능합니다 (거의, 다른 임의 값으로 인한 작은 변동). 분명히String.Join 대부분의 경우 약간 더 빠릅니다 (매우 작은 차이이지만).

다음은 테스트에 사용한 코드입니다.

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

Question 5

1 % 차이가 전체 프로그램을 실행하는 데 걸리는 시간 측면에서 중요한 것으로 바뀌지 않는 한 이는 마이크로 최적화처럼 보입니다. 가장 읽기 쉽고 이해하기 쉬운 코드를 작성하고 1 % 성능 차이에 대해 걱정하지 않습니다.

Question 6

Atwood는 약 한 달 전에 이와 관련된 게시물을 작성했습니다.

http://www.codinghorror.com/blog/archives/001218.html

Question 7

예. 조인을 몇 번 이상 수행 하면 훨씬 빠릅니다.

string.join을 수행 할 때 런타임은 다음을 수행해야합니다.

결과 문자열에 메모리 할당
첫 번째 문자열의 내용을 출력 문자열의 시작 부분에 복사
두 번째 문자열의 내용을 출력 문자열의 끝에 복사합니다.

두 번의 조인을 수행하면 데이터를 두 번 복사해야하는 식입니다.

StringBuilder는 여유 공간이있는 하나의 버퍼를 할당하므로 원본 문자열을 복사하지 않고도 데이터를 추가 할 수 있습니다. 버퍼에 남은 공간이 있으므로 추가 된 문자열을 버퍼에 직접 쓸 수 있습니다. 그런 다음 마지막에 전체 문자열을 한 번만 복사하면됩니다.