왜 C ++에서 파이썬보다 문자열을 더 느리게 분할합니까?

93

저는 약간의 속도를 얻고 내 녹슨 C ++ 기술을 연마하기 위해 일부 코드를 Python에서 C ++로 변환하려고합니다. 표준 입력에서 라인을 읽는 순진 구현 (참조 ++ 훨씬 빠르게 C보다는 파이썬 때 어제 나는 충격을 받았다 이 ). 오늘, 마침내 분리 문자 병합 (파이썬의 split ()과 유사한 의미)을 사용하여 C ++에서 문자열을 분할하는 방법을 알아 냈고 이제 deja vu를 경험하고 있습니다! 내 C ++ 코드는 작업을 수행하는 데 훨씬 더 오래 걸립니다 (어제 수업의 경우와 같이 훨씬 더 많은 것은 아니지만).

Python 코드 :

#!/usr/bin/env python
from __future__ import print_function                                            
import time
import sys

count = 0
start_time = time.time()
dummy = None

for line in sys.stdin:
    dummy = line.split()
    count += 1

delta_sec = int(time.time() - start_time)
print("Python: Saw {0} lines in {1} seconds. ".format(count, delta_sec), end='')
if delta_sec > 0:
    lps = int(count/delta_sec)
    print("  Crunch Speed: {0}".format(lps))
else:
    print('')

C ++ 코드 :

#include <iostream>                                                              
#include <string>
#include <sstream>
#include <time.h>
#include <vector>

using namespace std;

void split1(vector<string> &tokens, const string &str,
        const string &delimiters = " ") {
    // Skip delimiters at beginning
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);

    // Find first non-delimiter
    string::size_type pos = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos) {
        // Found a token, add it to the vector
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next non-delimiter
        pos = str.find_first_of(delimiters, lastPos);
    }
}

void split2(vector<string> &tokens, const string &str, char delim=' ') {
    stringstream ss(str); //convert string to stream
    string item;
    while(getline(ss, item, delim)) {
        tokens.push_back(item); //add token to vector
    }
}

int main() {
    string input_line;
    vector<string> spline;
    long count = 0;
    int sec, lps;
    time_t start = time(NULL);

    cin.sync_with_stdio(false); //disable synchronous IO

    while(cin) {
        getline(cin, input_line);
        spline.clear(); //empty the vector for the next line to parse

        //I'm trying one of the two implementations, per compilation, obviously:
//        split1(spline, input_line);  
        split2(spline, input_line);

        count++;
    };

    count--; //subtract for final over-read
    sec = (int) time(NULL) - start;
    cerr << "C++   : Saw " << count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } else
        cerr << endl;
    return 0;

//compiled with: g++ -Wall -O3 -o split1 split_1.cpp

두 가지 분할 구현을 시도했습니다. 한 (split1)는 토큰을 검색 문자열 방법을 사용하고 여러 토큰뿐만 아니라 핸들 많은 토큰 (그것에서 오는 병합 할 수 있습니다 여기 ). 두 번째 (split2)는 getline을 사용하여 문자열을 스트림으로 읽고 구분 기호를 병합하지 않으며 단일 구분 문자 만 지원합니다 (문자열 분할 질문에 대한 답변으로 여러 StackOverflow 사용자가 게시 한 문자).

나는 이것을 다양한 순서로 여러 번 실행했습니다. 내 테스트 머신은 Macbook Pro (2011, 8GB, Quad Core)이지만 그다지 중요하지 않습니다. 나는 각각 다음과 비슷한 3 개의 공백으로 구분 된 열이있는 20M 줄 텍스트 파일로 테스트하고 있습니다. "foo.bar 127.0.0.1 home.foo.bar"

결과 :

$ /usr/bin/time cat test_lines_double | ./split.py
       15.61 real         0.01 user         0.38 sys
Python: Saw 20000000 lines in 15 seconds.   Crunch Speed: 1333333
$ /usr/bin/time cat test_lines_double | ./split1
       23.50 real         0.01 user         0.46 sys
C++   : Saw 20000000 lines in 23 seconds.  Crunch speed: 869565
$ /usr/bin/time cat test_lines_double | ./split2
       44.69 real         0.02 user         0.62 sys
C++   : Saw 20000000 lines in 45 seconds.  Crunch speed: 444444

내가 뭘 잘못하고 있죠? 외부 라이브러리에 의존하지 않고 (즉, 부스트 없음), 구분 기호 시퀀스 병합을 지원하고 (파이썬 분할과 같은) 스레드로부터 안전하며 (따라서 strtok가 없음) 성능이 최소한 인 C ++에서 문자열 분할을 수행하는 더 좋은 방법이 있습니까? 파이썬과 동등합니까?

1 / 부분 솔루션 편집? :

C ++처럼 파이썬이 더미 목록을 재설정하고 매번 추가하도록하여 더 공정한 비교를 시도했습니다. 이것은 여전히 C ++ 코드가 정확히하는 것은 아니지만 조금 더 가깝습니다. 기본적으로 루프는 다음과 같습니다.

for line in sys.stdin:
    dummy = []
    dummy += line.split()
    count += 1

파이썬의 성능은 이제 split1 C ++ 구현과 거의 같습니다.

/usr/bin/time cat test_lines_double | ./split5.py
       22.61 real         0.01 user         0.40 sys
Python: Saw 20000000 lines in 22 seconds.   Crunch Speed: 909090

Python이 문자열 처리에 최적화되어 있어도 (Matt Joiner가 제안한대로) 이러한 C ++ 구현이 더 빠르지 않다는 사실에 여전히 놀랍습니다. C ++를 사용하여보다 최적의 방법으로이 작업을 수행하는 방법에 대한 아이디어가있는 사람이 있으면 코드를 공유하십시오. (내 다음 단계는 C로 전체 프로젝트를 다시 구현하기 위해 프로그래머 생산성을 절충하지 않더라도 순수 C로 구현하려고 시도 할 것이라고 생각합니다. 따라서 이것은 문자열 분할 속도에 대한 실험 일뿐입니다.)

도움을 주신 모든 분들께 감사드립니다.

최종 편집 / 솔루션 :

Alf의 대답을 참조하십시오. 파이썬은 엄격하게 참조로 문자열을 처리하고 STL 문자열은 자주 복사되기 때문에 바닐라 파이썬 구현에서 성능이 더 좋습니다. 비교를 위해 Alf의 코드를 통해 데이터를 컴파일하고 실행했습니다. 여기에는 다른 모든 실행과 동일한 시스템에서의 성능이 있습니다. 기본적으로 순진한 Python 구현과 동일합니다 (목록을 재설정 / 추가하는 Python 구현보다 빠르지 만, 위의 편집에 표시됨) :

$ /usr/bin/time cat test_lines_double | ./split6
       15.09 real         0.01 user         0.45 sys
C++   : Saw 20000000 lines in 15 seconds.  Crunch speed: 1333333

이 경우에 C ++를 수행하는 데 필요한 코드의 양에 관한 유일한 불만 사항이 남아 있습니다.

이 문제와 어제의 stdin 행 읽기 문제 (위에 링크 됨)에서 얻은 교훈 중 하나는 언어의 상대적인 "기본"성능에 대해 순진한 가정을하는 대신 항상 벤치마킹해야한다는 것입니다. 교육에 감사드립니다.

귀하의 제안에 다시 한번 감사드립니다!

— JJC
소스

2

C ++ 프로그램을 어떻게 컴파일 했습니까? 최적화가 켜져 있습니까?

— interjay

2

@interjay : 그것은 자신의 소스의 마지막 코멘트에있어 : g++ -Wall -O3 -o split1 split_1.cpp@JJC : 벤치 마크 요금이 때 실제로 사용 않는 방법 dummy과 spline각각 어쩌면 파이썬에 전화를 제거 line.split()하여 더 부작용이 없기 때문에?

— 에릭

2

분할을 제거하고 stdin에서 읽기 행만 남겨두면 어떤 결과를 얻습니까?

— interjay

2

파이썬은 C로 작성되었습니다. 이것은 C에서 효율적인 방법이 있음을 의미합니다. STL을 사용하는 것보다 문자열을 분할하는 더 좋은 방법이있을 수 있습니까?

— ixe013

3

가능한 중복 std :: string 작업이 제대로 수행되지 않는 이유는 무엇입니까?

— Matt Joiner

57

추측으로, Python 문자열은 참조 카운트 불변 문자열이므로 Python 코드에서 문자열이 복사되지 않는 반면 C ++ std::string는 변경 가능한 값 유형이며 가장 작은 기회에 복사됩니다.

목표가 빠른 분할 인 경우 상수 시간 하위 문자열 작업을 사용합니다. 즉 , Python (및 Java 및 C #…)에서와 같이 원래 문자열의 일부만 참조 한다는 의미 입니다.

그러나 C ++ std::string클래스에는 하나의 사용 기능이 있습니다. 표준 이므로 효율성이 주요 고려 사항이 아닌 곳에서 안전하고 이식 가능하게 문자열을 전달하는 데 사용할 수 있습니다. 하지만 충분한 채팅. 코드-그리고 내 컴퓨터에서는 Python의 문자열 처리가 C ++의 하위 집합 인 C로 구현되기 때문에 이것은 물론 Python보다 빠릅니다.

#include <iostream>                                                              
#include <string>
#include <sstream>
#include <time.h>
#include <vector>

using namespace std;

class StringRef
{
private:
    char const*     begin_;
    int             size_;

public:
    int size() const { return size_; }
    char const* begin() const { return begin_; }
    char const* end() const { return begin_ + size_; }

    StringRef( char const* const begin, int const size )
        : begin_( begin )
        , size_( size )
    {}
};

vector<StringRef> split3( string const& str, char delimiter = ' ' )
{
    vector<StringRef>   result;

    enum State { inSpace, inToken };

    State state = inSpace;
    char const*     pTokenBegin = 0;    // Init to satisfy compiler.
    for( auto it = str.begin(); it != str.end(); ++it )
    {
        State const newState = (*it == delimiter? inSpace : inToken);
        if( newState != state )
        {
            switch( newState )
            {
            case inSpace:
                result.push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) );
                break;
            case inToken:
                pTokenBegin = &*it;
            }
        }
        state = newState;
    }
    if( state == inToken )
    {
        result.push_back( StringRef( pTokenBegin, &*str.end() - pTokenBegin ) );
    }
    return result;
}

int main() {
    string input_line;
    vector<string> spline;
    long count = 0;
    int sec, lps;
    time_t start = time(NULL);

    cin.sync_with_stdio(false); //disable synchronous IO

    while(cin) {
        getline(cin, input_line);
        //spline.clear(); //empty the vector for the next line to parse

        //I'm trying one of the two implementations, per compilation, obviously:
//        split1(spline, input_line);  
        //split2(spline, input_line);

        vector<StringRef> const v = split3( input_line );
        count++;
    };

    count--; //subtract for final over-read
    sec = (int) time(NULL) - start;
    cerr << "C++   : Saw " << count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

//compiled with: g++ -Wall -O3 -o split1 split_1.cpp -std=c++0x

면책 조항 : 버그가 없기를 바랍니다. 기능을 테스트하지는 않았지만 속도 만 확인했습니다. 하지만 버그가 한두 개 있어도 속도에 큰 영향을 미치지 않는 수정이 있다고 생각합니다.

— 건배와 hth. -Alf
소스

2

예, Python 문자열은 참조 횟수 개체이므로 Python은 복사 작업이 훨씬 적습니다. 그러나 코드와 같은 (포인터, 크기) 쌍은 아니지만 여전히 널로 끝나는 C 문자열이 포함되어 있습니다.

— Fred Foo

13

즉, 텍스트 조작과 같은 더 높은 수준의 작업의 경우 효율적으로 수행하려는 노력이 수십 년 동안 수십 명의 개발자에 의해 누적 된 더 높은 수준의 언어를 고수하거나 모든 개발자만큼 작업 할 준비를합니다. 낮은 수준에서 비슷한 것을 가지고 있기 때문입니다.

— jsbueno

2

@JJC :의 StringRef경우 하위 문자열을 std::string매우 쉽게 복사 할 수 있습니다 string( sr.begin(), sr.end() ).

— 건배와 hth. - 알프

3

CPython 문자열이 덜 복사되기를 바랍니다. 예, 참조 횟수가 계산되고 변경할 수 없지만 str.split ()은PyString_FromStringAndSize() 해당 호출을 사용하여 각 항목에 대해 새 문자열을 할당 합니다 PyObject_MALLOC(). 따라서 Python에서 문자열이 변경 불가능하다는 것을 악용하는 공유 표현에 대한 최적화는 없습니다.

— jfs

3

메인테이너 : 감지 된 버그 를 수정하려고 시도하여 버그를 소개하지 마십시오 (특히 cplusplus.com 참조가 아님 ). TIA.

— 건배와 hth. - 알프

9

더 나은 솔루션 (적어도 성능 측면)을 제공하지는 않지만 흥미로운 추가 데이터를 제공합니다.

사용 strtok_r(의 재진입 변형 strtok) :

void splitc1(vector<string> &tokens, const string &str,
        const string &delimiters = " ") {
    char *saveptr;
    char *cpy, *token;

    cpy = (char*)malloc(str.size() + 1);
    strcpy(cpy, str.c_str());

    for(token = strtok_r(cpy, delimiters.c_str(), &saveptr);
        token != NULL;
        token = strtok_r(NULL, delimiters.c_str(), &saveptr)) {
        tokens.push_back(string(token));
    }

    free(cpy);
}

추가로 매개 변수 및 fgets입력에 문자열 사용 :

void splitc2(vector<string> &tokens, const char *str,
        const char *delimiters) {
    char *saveptr;
    char *cpy, *token;

    cpy = (char*)malloc(strlen(str) + 1);
    strcpy(cpy, str);

    for(token = strtok_r(cpy, delimiters, &saveptr);
        token != NULL;
        token = strtok_r(NULL, delimiters, &saveptr)) {
        tokens.push_back(string(token));
    }

    free(cpy);
}

그리고 어떤 경우에는 입력 문자열을 파괴 할 수 있습니다.

void splitc3(vector<string> &tokens, char *str,
        const char *delimiters) {
    char *saveptr;
    char *token;

    for(token = strtok_r(str, delimiters, &saveptr);
        token != NULL;
        token = strtok_r(NULL, delimiters, &saveptr)) {
        tokens.push_back(string(token));
    }
}

이에 대한 타이밍은 다음과 같습니다 (질문의 다른 변형에 대한 결과 및 수락 된 답변 포함).

split1.cpp:  C++   : Saw 20000000 lines in 31 seconds.  Crunch speed: 645161
split2.cpp:  C++   : Saw 20000000 lines in 45 seconds.  Crunch speed: 444444
split.py:    Python: Saw 20000000 lines in 33 seconds.  Crunch Speed: 606060
split5.py:   Python: Saw 20000000 lines in 35 seconds.  Crunch Speed: 571428
split6.cpp:  C++   : Saw 20000000 lines in 18 seconds.  Crunch speed: 1111111

splitc1.cpp: C++   : Saw 20000000 lines in 27 seconds.  Crunch speed: 740740
splitc2.cpp: C++   : Saw 20000000 lines in 22 seconds.  Crunch speed: 909090
splitc3.cpp: C++   : Saw 20000000 lines in 20 seconds.  Crunch speed: 1000000

보시다시피 수용된 답변의 솔루션이 여전히 가장 빠릅니다.

추가 테스트를 원하는 사람을 위해 질문의 모든 프로그램, 허용 된 답변,이 답변, 추가로 테스트 데이터를 생성하는 Makefile 및 스크립트 ( https : // github) 가 포함 된 Github 저장소를 만들었습니다 . com / tobbez / string-splitting .

— 토 베즈
소스

2

데이터를 "사용"(단어 및 문자 수 계산)하여 테스트를 좀 더 현실적으로 만드는 풀 요청 ( github.com/tobbez/string-splitting/pull/2 )을 수행했습니다. 이 변경으로 모든 C / C ++ 버전이 Python 버전 (내가 추가 한 Boost의 토크 나이저에 기반한 버전을 예상 함)을 능가하고 "string view"기반 메서드 (split6과 같은)의 실제 값이 빛납니다.

— Dave Johansen

컴파일러가 최적화를 인식하지 못하는 경우에는이 memcpy아니라를 사용해야 strcpy합니다. strcpy일반적으로 짧은 문자열의 경우 빠른 속도와 긴 문자열의 경우 전체 SIMD의 램프 업 사이의 균형을 유지하는 느린 시작 전략을 사용합니다. memcpy크기를 즉시 알고 있으며 암시 적 길이 문자열의 끝을 확인하기 위해 SIMD 트릭을 사용할 필요가 없습니다. (최신 x86에서는 큰 문제가 아닙니다). 생성자를 std::string사용하여 객체를 만드는 (char*, len)것도 saveptr-token. 분명히 char*토큰을 저장하는 것이 가장 빠를 것입니다 . : P

— Peter Cordes

4

나는 이것이 std::vectorpush_back () 함수 호출 과정에서 크기가 조정되는 방식 때문이라고 생각합니다 . 문장을 사용 std::list하거나 std::vector::reserve()충분한 공간을 확보 하려고 하면 훨씬 더 나은 성능을 얻을 수 있습니다. 또는 split1 ()에 대해 아래와 같이 둘의 조합을 사용할 수 있습니다.

void split1(vector<string> &tokens, const string &str,
        const string &delimiters = " ") {
    // Skip delimiters at beginning
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);

    // Find first non-delimiter
    string::size_type pos = str.find_first_of(delimiters, lastPos);
    list<string> token_list;

    while (string::npos != pos || string::npos != lastPos) {
        // Found a token, add it to the list
        token_list.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next non-delimiter
        pos = str.find_first_of(delimiters, lastPos);
    }
    tokens.assign(token_list.begin(), token_list.end());
}

편집 : 내가 보는 또 다른 명백한 것은 Python 변수 dummy가 매번 할당 되지만 수정되지는 않는다는 것입니다. 따라서 C ++에 대한 공정한 비교가 아닙니다. Python 코드를 수정하여 dummy = []초기화 한 다음 dummy += line.split(). 이 후 런타임을보고 할 수 있습니까?

EDIT2 : 더 공정하게 만들기 위해 C ++ 코드에서 while 루프를 다음과 같이 수정할 수 있습니다.

    while(cin) {
        getline(cin, input_line);
        std::vector<string> spline; // create a new vector

        //I'm trying one of the two implementations, per compilation, obviously:
//        split1(spline, input_line);  
        split2(spline, input_line);

        count++;
    };

— 비테 팔콘
소스

아이디어 주셔서 감사합니다. 나는 그것을 구현했고이 구현은 실제로 원래 split1보다 느리다. 루프 전에 spline.reserve (16)도 시도했지만 이것은 내 split1의 속도에 영향을 미치지 않았습니다. 한 줄에 3 개의 토큰 만 있고 각 줄마다 벡터가 지워져서 그다지 도움이 될 것이라고 기대하지 않았습니다.

— JJC

나는 당신의 편집도 시도했습니다. 업데이트 된 질문을 참조하십시오. 성능은 이제 split1과 동등합니다.

— JJC

나는 당신의 EDIT2를 시도했습니다. 성능이 조금 더 나빴습니다. $ / usr / bin / time cat test_lines_double | ./split7 33.39 실제 0.01 사용자 0.49 sys C ++ : 33 초 만에 20000000 줄을 보았습니다. 크런치 속도 :

— 606060

3

C ++ 17 및 C ++ 14 기능을 사용하면 다음 코드가 더 낫다고 생각합니다.

// These codes are un-tested when I write this post, but I'll test it
// When I'm free, and I sincerely welcome others to test and modify this
// code.

// C++17
#include <istream>     // For std::istream.
#include <string_view> // new feature in C++17, sizeof(std::string_view) == 16 in libc++ on my x86-64 debian 9.4 computer.
#include <string>
#include <utility>     // C++14 feature std::move.

template <template <class...> class Container, class Allocator>
void split1(Container<std::string_view, Allocator> &tokens, 
            std::string_view str,
            std::string_view delimiter = " ") 
{
    /* 
     * The model of the input string:
     *
     * (optional) delimiter | content | delimiter | content | delimiter| 
     * ... | delimiter | content 
     *
     * Using std::string::find_first_not_of or 
     * std::string_view::find_first_not_of is a bad idea, because it 
     * actually does the following thing:
     * 
     *     Finds the first character not equal to any of the characters 
     *     in the given character sequence.
     * 
     * Which means it does not treeat your delimiters as a whole, but as
     * a group of characters.
     * 
     * This has 2 effects:
     *
     *  1. When your delimiters is not a single character, this function
     *  won't behave as you predicted.
     *
     *  2. When your delimiters is just a single character, the function
     *  may have an additional overhead due to the fact that it has to 
     *  check every character with a range of characters, although 
     * there's only one, but in order to assure the correctness, it still 
     * has an inner loop, which adds to the overhead.
     *
     * So, as a solution, I wrote the following code.
     *
     * The code below will skip the first delimiter prefix.
     * However, if there's nothing between 2 delimiter, this code'll 
     * still treat as if there's sth. there.
     *
     * Note: 
     * Here I use C++ std version of substring search algorithm, but u
     * can change it to Boyer-Moore, KMP(takes additional memory), 
     * Rabin-Karp and other algorithm to speed your code.
     * 
     */

    // Establish the loop invariant 1.
    typename std::string_view::size_type 
        next, 
        delimiter_size = delimiter.size(),  
        pos = str.find(delimiter) ? 0 : delimiter_size;

    // The loop invariant:
    //  1. At pos, it is the content that should be saved.
    //  2. The next pos of delimiter is stored in next, which could be 0
    //  or std::string_view::npos.

    do {
        // Find the next delimiter, maintain loop invariant 2.
        next = str.find(delimiter, pos);

        // Found a token, add it to the vector
        tokens.push_back(str.substr(pos, next));

        // Skip delimiters, maintain the loop invariant 1.
        //
        // @ next is the size of the just pushed token.
        // Because when next == std::string_view::npos, the loop will
        // terminate, so it doesn't matter even if the following 
        // expression have undefined behavior due to the overflow of 
        // argument.
        pos = next + delimiter_size;
    } while(next != std::string_view::npos);
}   

template <template <class...> class Container, class traits, class Allocator2, class Allocator>
void split2(Container<std::basic_string<char, traits, Allocator2>, Allocator> &tokens, 
            std::istream &stream,
            char delimiter = ' ')
{
    std::string<char, traits, Allocator2> item;

    // Unfortunately, std::getline can only accept a single-character 
    // delimiter.
    while(std::getline(stream, item, delimiter))
        // Move item into token. I haven't checked whether item can be 
        // reused after being moved.
        tokens.push_back(std::move(item));
}

컨테이너 선택 :

std::vector.

할당 된 내부 배열의 초기 크기가 1이고 최종 크기가 N이라고 가정하면 log2 (N) 번에 대해 할당 및 할당 해제하고 (2 ^ (log2 (N) + 1)-1) = (2N-1) 번. 에서 지적했듯이 realloc을 로그 횟수로 호출하지 않아 std :: vector의 성능이 좋지 않습니까? , 벡터의 크기를 예측할 수없고 매우 클 경우 성능이 저하 될 수 있습니다. 그러나 크기를 추정 할 수 있다면 문제가되지 않을 것입니다.
std::list.

모든 push_back에 대해 소비 된 시간은 일정하지만 개별 push_back에서 std :: vector보다 더 많은 시간이 걸릴 수 있습니다. 스레드 별 메모리 풀과 사용자 지정 할당자를 사용하면이 문제를 완화 할 수 있습니다.
std::forward_list.

std :: list와 같지만 요소 당 더 적은 메모리를 차지합니다. API push_back의 부족으로 인해 작동하려면 래퍼 클래스가 필요합니다.
std::array.

성장 한계를 알 수 있다면 std :: array를 사용할 수 있습니다. 원인은 API push_back이 없기 때문에 직접 사용할 수 없습니다. 하지만 래퍼를 정의 할 수 있으며 여기에서 가장 빠른 방법이라고 생각하며 추정이 매우 정확하면 메모리를 절약 할 수 있습니다.
std::deque.

이 옵션을 사용하면 성능을 위해 메모리를 교환 할 수 있습니다. 요소의 복사는 (2 ^ (N + 1)-1), 할당은 N 배, 할당 해제는 없습니다. 또한 일정한 임의 액세스 시간과 양쪽 끝에 새 요소를 추가 할 수있는 기능이 있습니다.

std :: deque-cppreference 에 따르면

반면에 데크는 일반적으로 최소한의 메모리 비용이 큽니다. 요소가 하나만있는 데크는 전체 내부 배열을 할당해야합니다 (예 : 64 비트 libstdc ++에서 객체 크기의 8 배, 객체 크기의 16 배 또는 64 비트 libc ++에서 4096 바이트 중 더 큰 쪽).

또는 다음 조합을 사용할 수 있습니다.

std::vector< std::array<T, 2 ^ M> >

이것은 std :: deque와 유사합니다. 차이점은 단지이 컨테이너가 맨 앞에 요소를 추가하는 것을 지원하지 않는다는 것입니다. 그러나 (2 ^ (N + 1)-1) 번에 대한 기본 std :: array를 복사하지 않기 때문에 성능이 더 빠릅니다. (2 ^에 대한 포인터 배열 만 복사합니다. (N-M + 1)-1) 번, 전류가 가득 차서 아무것도 할당 해제 할 필요가 없을 때만 새 배열을 할당합니다. 그건 그렇고, 당신은 일정한 랜덤 액세스 시간을 얻을 수 있습니다.
std::list< std::array<T, ...> >

메모리 프레임 화의 부담을 크게 완화합니다. 전류가 꽉 찼을 때만 새 어레이를 할당하고 아무것도 복사 할 필요가 없습니다. 콤보 1에 비해 추가 포인터에 대한 가격을 지불해야합니다.
std::forward_list< std::array<T, ...> >

2와 동일하지만 콤보 1과 동일한 메모리 비용이 듭니다.

— 쑤 지아 하오
소스

128 또는 256과 같이 합당한 초기 크기 (예 : 총 복사본 2 개로 가정)로 std :: vector를 사용하는 경우 해당 제한까지 크기에 대한 복사를 전혀 피할 수 있습니다. 그런 다음 실제로 사용한 요소 수에 맞게 할당을 축소 할 수 있으므로 작은 입력에 대해 끔찍하지 않습니다. 그러나 이것은 매우 큰 N경우 의 총 사본 수에별로 도움이되지 않습니다 . 너무 나쁜 std :: vector는 realloc잠재적으로 현재 할당이 끝날 때 더 많은 페이지를 매핑 하는 데 사용할 수 없으므로 약 2 배 느립니다.

— Peter Cordes

가 stringview::remove_prefix단지 일반 문자열에서 현재의 위치를 추적하는 등 저가로? 오프셋에서 검색을 시작할 수 std::basic_string::find있는 선택적 두 번째 인수 pos = 0가 있습니다.

— Peter Cordes

@ Peter Cordes 맞습니다. 나는 확인 libcxx IMPL을

— JiaHao 쑤에게

동일한 libstdc ++ impl 도 확인했습니다 .

— JiaHao Xu

벡터 성능 분석이 꺼져 있습니다. 처음 삽입 할 때 초기 용량이 1이고 새 용량이 필요할 때마다 두 배가되는 벡터를 고려하십시오. 17 개 항목을 입력해야하는 경우 첫 번째 할당은 1, 2, 4, 8, 16, 마지막으로 32를위한 공간을 만듭니다. 이는 총 6 개의 할당이 있음을 의미합니다 ( log2(size - 1) + 2, 정수 로그 사용). 첫 번째 할당은 0 개 문자열을 이동하고 두 번째 할당은 1, 2, 4, 8, 마지막으로 16 개를 이동하여 총 31 개 이동 ( 2^(log2(size - 1) + 1) - 1))을 수행했습니다. 이것은 O (2 ^ n)이 아니라 O (n)입니다. 이는 std::list.

— David Stone

2

선택한 C ++ 구현이 반드시 Python보다 빠르다는 잘못된 가정을하고 있습니다. Python의 문자열 처리는 고도로 최적화되어 있습니다. 자세한 내용은이 질문을 참조하십시오. std :: string 작업이 제대로 수행되지 않는 이유는 무엇입니까?

— 매트 조이너
소스

4

전반적인 언어 성능에 대한 주장이 아니라 내 특정 코드에 대해서만 주장합니다. 따라서 여기에는 가정이 없습니다. 다른 질문에 대한 좋은 포인터에 감사드립니다. C ++의이 특정 구현이 차선책 (첫 번째 문장)인지 또는 C ++가 문자열 처리 (두 번째 문장)에서 Python보다 느리다고 말하는지 확실하지 않습니다. 또한 내가 C ++에서하려는 작업을 빠르게 수행하는 방법을 알고 있다면 모두의 이익을 위해 공유해주세요. 감사. 명확히하기 위해 저는 파이썬을 좋아하지만 맹목적인 팬이 아니기 때문에 이것을하는 가장 빠른 방법을 배우려고 노력하고 있습니다.

— JJC

1

@JJC : Python의 구현이 더 빠르다는 점을 감안할 때 귀하의 구현이 차선책이라고 말하고 싶습니다. 언어 구현은 모서리를 줄일 수 있지만 궁극적으로 알고리즘 복잡성과 수작업 최적화가 승리합니다. 이 경우 기본적으로 Python이이 사용 사례에서 우위를 차지합니다.

— Matt Joiner

2

split1 구현을 취하고 다음을 변경하여 split2의 구현과 더 가깝게 일치하도록 서명을 변경하는 경우 :

void split1(vector<string> &tokens, const string &str, const string &delimiters = " ")

이에:

void split1(vector<string> &tokens, const string &str, const char delimiters = ' ')

split1과 split2 사이에 더 극적인 차이와 더 공정한 비교를 얻을 수 있습니다.

split1  C++   : Saw 10000000 lines in 41 seconds.  Crunch speed: 243902
split2  C++   : Saw 10000000 lines in 144 seconds.  Crunch speed: 69444
split1' C++   : Saw 10000000 lines in 33 seconds.  Crunch speed: 303030

— 폴 베 킹엄
소스

1

void split5(vector<string> &tokens, const string &str, char delim=' ') {

    enum { do_token, do_delim } state = do_delim;
    int idx = 0, tok_start = 0;
    for (string::const_iterator it = str.begin() ; ; ++it, ++idx) {
        switch (state) {
            case do_token:
                if (it == str.end()) {
                    tokens.push_back (str.substr(tok_start, idx-tok_start));
                    return;
                }
                else if (*it == delim) {
                    state = do_delim;
                    tokens.push_back (str.substr(tok_start, idx-tok_start));
                }
                break;

            case do_delim:
                if (it == str.end()) {
                    return;
                }
                if (*it != delim) {
                    state = do_token;
                    tok_start = idx;
                }
                break;
        }
    }
}

— 엔. '대명사'm.
소스

고마워요 nm! 안타깝게도 이것은 내 데이터 세트와 컴퓨터에서 원래 (분할 1) 구현과 거의 같은 속도로 실행되는 것 같습니다. $ / usr / bin / time cat test_lines_double | ./split8 21.89 실제 0.01 사용자 0.47 sys C ++ : 22 초 만에 20000000 줄을 보았습니다. 크런치 속도 : 909090

— JJC

내 컴퓨터에서 : split1 — 54s, split.py — 35s, split5 — 16s. 나는 모른다.

— n. '대명사'm.

데이터가 위에서 언급 한 형식과 일치합니까? 초기 디스크 캐시 채우기와 같은 일시적인 영향을 제거하기 위해 각각 여러 번 실행했다고 가정합니다.

— JJC 2012

0

나는 이것이 파이썬에서 sys.stdin의 버퍼링과 관련이 있다고 생각하지만 C ++ 구현에서는 버퍼링이 없습니다.

버퍼 크기를 변경하는 방법에 대한 자세한 내용은이 게시물 을 참조한 다음 다시 비교해보십시오. sys.stdin에 대해 더 작은 버퍼 크기 설정?

— 알렉스 콜린스
소스

1

흠 ... 나는 따라 가지 않는다. C ++에서는 (분할없이) 행을 읽는 것만이 Python보다 빠릅니다 (cin.sync_with_stdio (false); 행을 포함시킨 후). 그것은 위에서 언급 한 어제 내가 가진 문제였습니다.

— JJC