C ++에서 문자열을 어떻게 토큰 화합니까?

414

Java에는 편리한 분할 방법이 있습니다.

String str = "The quick brown fox";
String[] results = str.split(" ");

C ++에서 이것을 쉽게 할 수있는 방법이 있습니까?

— 도마뱀 빌
소스

172

나는이 일상적인 작업이 C ++의 두통이라고 믿을 수 없다

— wfbarksdale

6

C ++에서는 두통이 아닙니다. 다양한 방법으로 달성 할 수 있습니다. 프로그래머는 c #보다 c ++에 대해 잘 모르고 있습니다. 마케팅 및 투자에 관한 것입니다. 다양한 c ++ 옵션에 대해서는이를 참조하십시오. cplusplus.com/faq/sequences/strings/split

— hB0

9

@ hB0은 많은 질문에 대답하고 여전히 수단을 결정하지 않는 것은 두통입니다. 하나는 그 도서관이 필요하고, 다른 하나는 단지 공간을위한 것이고, 다른 하나는 공간을 다루지 않습니다.

— Paschalis

1

C ++에서 문자열 분할의

— KOB

2

C ++의 모든 것이 왜 어려움을 겪어야 하는가?

— Wael Assaf

145

C ++ 표준 라이브러리 알고리즘은 콘크리트 컨테이너가 아닌 반복자에 기반을두고 있습니다. 불행히도 이것이 splitC ++ 표준 라이브러리에서 Java와 유사한 기능 을 제공하는 것을 어렵게 만듭니다. 아무도 이것이 편리 할 것이라고 주장하지는 않습니다. 그러나 반환 유형은 무엇입니까? std::vector<std::basic_string<…>>? 어쩌면 우리는 (잠재적으로 중복되고 비용이 많이 드는) 할당을 수행해야합니다.

대신 C ++은 임의의 복잡한 구분 기호를 기반으로 문자열을 분할하는 다양한 방법을 제공하지만 다른 언어처럼 멋지게 캡슐화되지는 않습니다. 수많은 방법으로 전체 블로그 게시물 작성 있습니다.

가장 간단하게을 누를 std::string::find때까지를 반복 사용 std::string::npos하고를 사용하여 내용을 추출 할 수 있습니다std::string::substr .

공백 분할을위한보다 유동적 인 (관용적이지만 기본적인) 버전은 다음을 사용합니다 std::istringstream.

auto iss = std::istringstream{"The quick brown fox"};
auto str = std::string{};

while (iss >> str) {
    process(str);
}

s 사용std::istream_iterator , 문자열 스트림의 내용은 그 반복자 범위 생성자를 사용하여 벡터로 복사 할 수있다.

여러 라이브러리 (예 : Boost.Tokenizer) )는 특정 토큰 제공합니다.

고급 분할에는 정규식이 필요합니다. C ++ std::regex_token_iterator은이 목적을 위해 다음을 제공합니다 .

auto const str = "The quick brown fox"s;
auto const re = std::regex{R"(\s+)"};
auto const vec = std::vector<std::string>(
    std::sregex_token_iterator{begin(str), end(str), re, -1},
    std::sregex_token_iterator{}
);

— 콘라드 루돌프
소스

53

안타깝게도 모든 프로젝트에 항상 부스트 기능을 사용할 수있는 것은 아닙니다. 부스트가 아닌 답변을 찾아야합니다.

— FuzzyBunnySlippers

36

모든 프로젝트가 "오픈 소스"에 열려있는 것은 아닙니다. 나는 엄격하게 규제 된 산업에서 일합니다. 정말 문제가되지 않습니다. 인생의 사실 일뿐입니다. 부스트는 어디에서나 사용할 수 없습니다.

— FuzzyBunnySlippers

5

@NonlinearIdeas 다른 질문 / 답변은 오픈 소스 프로젝트에 관한 것이 아닙니다. 모든 프로젝트 에서도 마찬가지입니다 . 물론 MISRA C와 같은 제한된 표준에 대해 이해하고 있지만 (준수 라이브러리를 찾는 경우가 아니라면) 처음부터 모든 것을 구축하는 것으로 이해됩니다. 어쨌든 요점은 "부스트를 사용할 수 없습니다"라는 것이 거의 아닙니다. 거의 모든 범용 답변이 적합하지 않은 특별한 요구 사항 이 있다는 것입니다.

— Konrad Rudolph

1

@NonlinearIdeas 사례에서, 부스트가 아닌 다른 답변도 MISRA를 준수하지 않습니다.

— Konrad Rudolph

3

@Dmitry“STL barf”란 무엇입니까?! 그리고 전체 커뮤니티는 C 전처리기를 대체하는 데 매우 유리합니다. 실제로 그렇게 할 제안이 있습니다. 그러나 PHP 나 다른 언어를 사용하라는 제안은 거꾸로 거슬러 올라갑니다.

— Konrad Rudolph

188

부스트 토크 나이의 클래스는 매우 간단 이런 종류의 물건을 만들 수 있습니다 :

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer< char_separator<char> > tokens(text, sep);
    BOOST_FOREACH (const string& t, tokens) {
        cout << t << "." << endl;
    }
}

C ++ 11 용으로 업데이트 :

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const auto& t : tokens) {
        cout << t << "." << endl;
    }
}

— 페루 치오
소스

1

좋은 물건, 나는 최근에 이것을 활용했습니다. 공백을 사용하여 토큰 (text, sep) 비트 앞에 두 개의 ">"문자를 분리 할 때까지 Visual Studio 컴파일러에 이상한 영향이 있습니다. > ')

— AndyUK

@AndyUK 예, 공간이 없으면 컴파일러는 두 개의 닫는 템플릿 대신 추출 연산자로 구문 분석합니다.

— EnabrenTane

이론적으로 이것은 C ++ 0x에서 수정되었습니다

— David Souther

3

char_separator생성자 의 세 번째 매개 변수에주의하십시오 ( drop_empty_tokens기본값은 대체입니다 keep_empty_tokens).

— Benoit

5

@puk-C ++ 헤더 파일에 일반적으로 사용되는 접미사입니다. ( .hC 헤더와 동일)

— Ferruccio

167

여기 진짜 간단한 것이 있습니다 :

#include <vector>
#include <string>
using namespace std;

vector<string> split(const char *str, char c = ' ')
{
    vector<string> result;

    do
    {
        const char *begin = str;

        while(*str != c && *str)
            str++;

        result.push_back(string(begin, str));
    } while (0 != *str++);

    return result;
}

— 아담 피어스
소스

이 메소드의 프로토 타입을 .h 파일에 추가해야합니까?

— Suhrob Samiev

5

일반 C 상수 문자 배열 인 문자열 리터럴을 계속 사용하기 때문에 이것은 "최상의"답변이 아닙니다. 나는 질문자가 그가 "문자열"유형의 C ++ 문자열을 토큰 화 할 수 있는지 묻고 있다고 생각합니다.

— 비제이 쿠마 칸타

C ++ 11에 정규 표현식을 포함시키는 것이 가장 좋은 대답이 무엇인지 의심했기 때문에 새로운 대답이 필요합니다.

— Omnifarious

113

strtok을 사용하십시오. 내 의견으로는, strtok이 필요한 것을 제공하지 않으면 토큰 화와 관련된 클래스를 구축 할 필요가 없습니다. C와 C ++에서 15 년 이상 다양한 파싱 코드를 작성하면서 항상 strtok을 사용했습니다. 여기에 예가 있습니다

char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
    printf ("Token: %s\n", p);
    p = strtok(NULL, " ");
}

몇 가지주의 사항 (필요하지 않을 수도 있음). 문자열은 프로세스에서 "파기"됩니다. 즉 EOS 문자가 delimter 지점에 인라인으로 배치됩니다. 올바르게 사용하려면 문자열이 아닌 버전을 만들어야합니다. 구문 분석 중 구분 기호 목록을 변경할 수도 있습니다.

제 생각에는 위의 코드는 별도의 클래스를 작성하는 것보다 훨씬 간단하고 사용하기 쉽습니다. 나에게 이것은 언어가 제공하는 기능 중 하나이며 잘 작동합니다. 단순히 "C 기반"솔루션입니다. 적절하고 쉽고, 추가 코드를 많이 작성할 필요가 없습니다 :-)

— 표
소스

42

C를 싫어하지는 않지만 strtok은 스레드로부터 안전하지 않으며 전송하는 문자열에 버퍼 오버플로를 피하기 위해 null 문자가 포함되어 있는지 확인해야합니다.

— tloach

11

strtok_r이 있지만 이것은 C ++ 질문이었습니다.

— Falken 교수 계약은

3

@tloach : MS C ++ 컴파일러에서 strtok은 내부 정적 변수가 TLS (스레드 로컬 스토리지)에 생성되므로 스레드로부터 안전합니다 (실제로는 컴파일러에 따라 다름)

— Ahmed Said

3

@ahmed : 스레드 안전은 다른 스레드에서 함수를 두 번 실행할 수있는 것 이상을 의미합니다. 이 경우 strtok이 실행되는 동안 스레드가 수정되면 strtok을 전체 실행하는 동안 문자열을 유효하게 할 수 있지만 문자열이 변경되어 strtok은 여전히 엉망이되어 이미 null 문자를 지났습니다. 보안 위반이 발생하거나 널 문자를 찾을 때까지 메모리를 계속 읽습니다. 문제가 발생한 곳의 길이를 지정하지 않으면 원래 C 문자열 함수의 문제입니다.

— tloach

4

strtok은 c ++ 코드에서 찾을 수있는 일반적인 생물이 아닌 non-const null-terminated char 배열에 대한 포인터가 필요합니다 ... std :: string에서 이것을 변환하는 가장 좋아하는 방법은 무엇입니까?

— fuzzyTew

105

또 다른 빠른 방법은을 사용하는 것 getline입니다. 다음과 같은 것 :

stringstream ss("bla bla");
string s;

while (getline(ss, s, ' ')) {
 cout << s << endl;
}

원한다면을 split()반환하는 간단한 메소드를 만들 수 있습니다 vector<string>.

— 사용자 35978
소스

2

문자열에서 0x0A 문자 로이 기술을 사용하는 데 문제가있어 while 루프가 조기에 종료되었습니다. 그렇지 않으면 간단하고 빠른 해결책입니다.

— Ryan H.

4

이것은 좋지만이 작업을 수행하면 기본 구분 기호 '\ n'이 고려되지 않습니다. 이 예제는 작동하지만 다음과 같은 것을 사용하는 경우 : while (getline (inFile, word, '')) 여기서 inFile은 여러 줄을 포함하는 ifstream 객체이면 재미있는 결과를 얻을 수 있습니다.

— hackrock

getline이 문자열이 아닌 스트림을 반환하는 것은 너무 나쁩니다. 임시 저장 공간이없는 초기화 목록에서는 사용할 수 없습니다.

— fuzzyTew

1

멋있는! 부스트와 C ++ 11이 없으며 기존 프로젝트에 대한 좋은 솔루션입니다!

— Deqing

1

그것이 답입니다. 함수의 이름은 조금 어색합니다.

— Nils

82

스트림, 반복자 및 복사 알고리즘을 사용하여이를 직접 수행 할 수 있습니다.

#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>

int main()
{
  std::string str = "The quick brown fox";

  // construct a stream from the string
  std::stringstream strstr(str);

  // use stream iterators to copy the stream to the vector as whitespace separated strings
  std::istream_iterator<std::string> it(strstr);
  std::istream_iterator<std::string> end;
  std::vector<std::string> results(it, end);

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);
}

— KeithB
소스

17

나는 std ::를 읽는 것을 짜증나게한다. 왜 "using"을 사용하지 않는가?

— user35978

80

@Vadi : 다른 사람의 게시물을 수정하는 것은 상당히 방해가되기 때문입니다. @ pheze : 나는 std객체가 어디에서 왔는지 알기 위해 스타일을 지정하는 것을 선호합니다 .

— Matthieu M.

7

나는 당신의 이유를 이해하고 그것이 당신에게 효과가 있다면 실제로 좋은 선택이라고 생각합니다. 그러나 교육학적인 관점에서 나는 실제로 페즈에 동의합니다. 다음과 같은 행을 해석하는 데 적은 노력이 필요하기 때문에 맨 위에 "사용 네임 스페이스 std"가있는 이와 같은 완전히 다른 예제를 읽고 이해하는 것이 더 쉽습니다. 특히이 경우 모든 것이 표준 라이브러리에 있기 때문입니다. 일련의 "std :: string;"을 사용하여 객체의 위치를 쉽고 명확하게 읽을 수 있습니다. 특히 기능이 너무 짧기 때문에.

— cheshirekow

61

"std ::"접두사가 자극적이거나 추악하지만, 예제 코드에 접두사를 포함시켜 이러한 함수의 출처를 완전히 알 수 있습니다. 그들이 당신을 귀찮게한다면, 당신이 예제를 훔치고 자신의 것으로 주장한 후에 그것들을 "사용"으로 바꾸는 것은 사소한 일입니다.

— dlchambers

20

네! 그가 한 말! 모범 사례는 std 접두사를 사용하는 것입니다. 큰 코드 기반에는 자체 라이브러리와 네임 스페이스가있을 것입니다. "네임 스페이스 std 사용"을 사용하면 네임 스페이스 충돌이 발생할 때 두통이 발생합니다.

— Miek

48

기분 나쁘게의 사람들은,하지만 같은 간단한 문제에 대한, 당신은 물건 만들기없는 방법은 너무 복잡. Boost 를 사용해야하는 많은 이유가 있습니다 . 그러나이 간단한 것에 대해서는 20 # 썰매로 비행을하는 것과 같습니다.

void
split( vector<string> & theStringVector,  /* Altered/returned value */
       const  string  & theString,
       const  string  & theDelimiter)
{
    UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.

    size_t  start = 0, end = 0;

    while ( end != string::npos)
    {
        end = theString.find( theDelimiter, start);

        // If at end, use length=maxLength.  Else use length=end-start.
        theStringVector.push_back( theString.substr( start,
                       (end == string::npos) ? string::npos : end - start));

        // If at end, use start=maxSize.  Else use start=end+delimiter.
        start = (   ( end > (string::npos - theDelimiter.size()) )
                  ?  string::npos  :  end + theDelimiter.size());
    }
}

예를 들어 (Doug의 경우)

#define SHOW(I,X)   cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl

int
main()
{
    vector<string> v;

    split( v, "A:PEP:909:Inventory Item", ":" );

    for (unsigned int i = 0;  i < v.size();   i++)
        SHOW( i, v[i] );
}

그렇습니다. split ()이 새로운 벡터를 전달하지 않고 새로운 벡터를 반환하도록 할 수 있습니다. 그러나 내가하고있는 일에 따라 항상 새로운 객체를 만드는 것보다 기존 객체를 재사용하는 것이 좋습니다. (사이에 벡터를 비우는 것을 잊지 않는 한!)

참조 : http://www.cplusplus.com/reference/string/string/ .

(원래 Doug의 질문에 대한 응답을 작성했습니다 .C ++ Strings Modifying and Extracting based on Separators (closed) . 그러나 Martin York는 포인터를 사용하여 해당 질문을 마무리 했으므로 코드를 일반화합니다.)

— 미스터리
소스

12

한 곳에서만 사용하는 매크로를 정의해야하는 이유. 그리고 당신의 UASSERT는 표준 assert보다 어떻게 낫습니까? 비교를 그와 같은 3 개의 토큰으로 나누면 필요한 것보다 더 많은 쉼표가 필요합니다.

— crelbor

1

어쩌면 UASSERT 매크로는 비교 된 두 값 사이의 실제 관계를 (오류 메시지에) 표시합니까? 실제로 좋은 생각입니다. IMHO.

— GhassanPL

10

왜 std::string클래스에 split () 함수가 포함 되지 않습니까?

— Mr. Shickadance

while 루프의 마지막 줄은이어야 start = ((end > (theString.size() - theDelimiter.size())) ? string::npos : end + theDelimiter.size());하고 while 루프는이어야한다고 생각 while (start != string::npos)합니다. 또한 하위 문자열을 벡터에 삽입하기 전에 비어 있지 않은지 확인합니다.

— John K

@JohnK 입력에 연속 된 두 개의 구분 기호가 있으면 분명히 그 사이의 문자열이 비어 있으므로 벡터에 삽입해야합니다. 빈 값이 특정 목적에 적합하지 않은 경우 이는 또 다른 문제이지만 IMHO와 같은 제약은 이러한 종류의 매우 일반적인 목적 함수 외부에서 시행되어야합니다.

— Lauri Nurmi

46

regex_token_iterators를 사용하는 솔루션 :

#include <iostream>
#include <regex>
#include <string>

using namespace std;

int main()
{
    string str("The quick brown fox");

    regex reg("\\s+");

    sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
    sregex_token_iterator end;

    vector<string> vec(iter, end);

    for (auto a : vec)
    {
        cout << a << endl;
    }
}

— wb
소스

5

이것이 가장 높은 답변이어야합니다. 이것이 C ++> = 11에서 올바른 방법입니다.

— Omnifarious

1

나는이 답변으로 스크롤을 내려서 기쁘다 (현재는 9 개의 upvotes 만 있음). 이것이 바로이 작업에서 C ++ 11 코드의 모습입니다!

— YePhIcK

외부 라이브러리에 의존하지 않고 이미 사용 가능한 라이브러리를 사용하는 탁월한 답변

— Andrew

1

큰 대답은 구분 기호에서 최대한의 유연성을 제공합니다. 몇 가지주의 사항 : \ s + 정규식을 사용하면 텍스트 중간에 빈 토큰이 생기지 않지만 텍스트가 공백으로 시작하면 첫 번째 빈 토큰이 비워집니다. 또한 정규식이 느린 것처럼 보입니다. 노트북에서 20MB의 임의의 텍스트의 경우 strtok, strsep 또는 parham의 대답을 str.find_first_of를 사용하여 0.014 초 또는 Perl을 0.027 초, Python을 0.00.0 초로 사용하는 것과 비교하여 0.6 초가 걸립니다. . 짧은 텍스트의 경우 속도가 중요하지 않을 수 있습니다.

— Mark Gates

2

좋습니다. 어쩌면 멋지지만 정규 표현식을 과도하게 사용합니다. 성능에 관심이없는 경우에만 합리적입니다.

— Marek R

35

Boost 는 강력한 split 기능을 가지고 있습니다 : boost :: algorithm :: split .

샘플 프로그램 :

#include <vector>
#include <boost/algorithm/string.hpp>

int main() {
    auto s = "a,b, c ,,e,f,";
    std::vector<std::string> fields;
    boost::split(fields, s, boost::is_any_of(","));
    for (const auto& field : fields)
        std::cout << "\"" << field << "\"\n";
    return 0;
}

산출:

"a"
"b"
" c "
""
"e"
"f"
""

— 라즈
소스

26

C ++ 솔루션을 요청했지만 이것이 도움이 될 수 있습니다.

Qt

#include <QString>

...

QString str = "The quick brown fox"; 
QStringList results = str.split(" ");

이 예제에서 Boost에 비해 장점은 게시물 코드에 일대일로 직접 매핑된다는 것입니다.

Qt 문서 에서 더보기

— 시바 부드
소스

22

다음은 원하는 것을 수행 할 수있는 샘플 토크 나이저 클래스입니다.

//Header file
class Tokenizer 
{
    public:
        static const std::string DELIMITERS;
        Tokenizer(const std::string& str);
        Tokenizer(const std::string& str, const std::string& delimiters);
        bool NextToken();
        bool NextToken(const std::string& delimiters);
        const std::string GetToken() const;
        void Reset();
    protected:
        size_t m_offset;
        const std::string m_string;
        std::string m_token;
        std::string m_delimiters;
};

//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");

Tokenizer::Tokenizer(const std::string& s) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(DELIMITERS) {}

Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(delimiters) {}

bool Tokenizer::NextToken() 
{
    return NextToken(m_delimiters);
}

bool Tokenizer::NextToken(const std::string& delimiters) 
{
    size_t i = m_string.find_first_not_of(delimiters, m_offset);
    if (std::string::npos == i) 
    {
        m_offset = m_string.length();
        return false;
    }

    size_t j = m_string.find_first_of(delimiters, i);
    if (std::string::npos == j) 
    {
        m_token = m_string.substr(i);
        m_offset = m_string.length();
        return true;
    }

    m_token = m_string.substr(i, j - i);
    m_offset = j;
    return true;
}

예:

std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
    v.push_back(s.GetToken());
}

— vzczc
소스

19

이것은 간단한 STL 전용 솔루션 (~ 5 줄!) std::find이며 std::find_first_not_of구분 기호 (예 : 공백 또는 마침표)뿐만 아니라 선행 및 후행 구분 기호를 사용하여 반복자를 처리합니다.

#include <string>
#include <vector>

void tokenize(std::string str, std::vector<string> &token_v){
    size_t start = str.find_first_not_of(DELIMITER), end=start;

    while (start != std::string::npos){
        // Find next occurence of delimiter
        end = str.find(DELIMITER, start);
        // Push back the token found into vector
        token_v.push_back(str.substr(start, end-start));
        // Skip all occurences of the delimiter to find new start
        start = str.find_first_not_of(DELIMITER, end);
    }
}

라이브로 사용해보십시오 !

— 파햄
소스

3

이것은 좋은 것이지만 여러 구분 기호로 올바르게 작동하려면 find () 대신 find_first_of ()를 사용해야한다고 생각합니다.

2

@ user755921 find_first_not_of로 시작 위치를 찾을 때 여러 구분 기호를 건너 뜁니다.

— Beginner

16

pystring 은 split 메소드를 포함하여 많은 파이썬 문자열 함수를 구현하는 작은 라이브러리입니다.

#include <string>
#include <vector>
#include "pystring.h"

std::vector<std::string> chunks;
pystring::split("this string", chunks);

// also can specify a separator
pystring::split("this-string", chunks, "-");

— dbr
소스

3

와우, 당신은 나의 즉각적인 질문과 많은 미래의 질문에 대답했습니다. C ++이 강력하다는 것을 알았습니다. 그러나 문자열을 분할하면 위의 답변과 같은 소스 코드가 생성되는 것은 분명 실망입니다. 더 높은 수준의 언어 편의를 제공하는 이와 같은 다른 라이브러리를 알고 싶습니다.

— Ross

와우, 당신은 진지하게 내 하루를 만들었습니다! pystring에 대해 몰랐습니다. 이것은 많은 시간을 절약 할 것입니다!

— 17:14에

11

비슷한 질문 에이 답변을 게시했습니다.
바퀴를 재발 명하지 마십시오. 나는 많은 라이브러리를 사용했으며 가장 빠르고 유연하게 C ++ String Toolkit Library를 사용했습니다 .

다음은 스택 오버 플로우의 다른 곳에 게시 한 사용 방법의 예입니다.

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
       std::string s("Somewhere down the road");
       std::vector<std::string> result;
       if( strtk::parse( s, whitespace, result ) )
       {
           for(size_t i = 0; i < result.size(); ++i )
            std::cout << result[i] << std::endl;
       }
    }

    {  // parsing a string into a vector of floats with other separators
       // besides spaces

       std::string s("3.0, 3.14; 4.0");
       std::vector<float> values;
       if( strtk::parse( s, whitespace_and_punctuation, values ) )
       {
           for(size_t i = 0; i < values.size(); ++i )
            std::cout << values[i] << std::endl;
       }
    }

    {  // parsing a string into specific variables

       std::string s("angle = 45; radius = 9.9");
       std::string w1, w2;
       float v1, v2;
       if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
       {
           std::cout << "word " << w1 << ", value " << v1 << std::endl;
           std::cout << "word " << w2 << ", value " << v2 << std::endl;
       }
    }

    return 0;
}

— 대니 K
소스

8

이 예를 확인하십시오. 도움이 될 것입니다 ..

#include <iostream>
#include <sstream>

using namespace std;

int main ()
{
    string tmps;
    istringstream is ("the dellimiter is the space");
    while (is.good ()) {
        is >> tmps;
        cout << tmps << "\n";
    }
    return 0;
}

— 소헤 사도
소스

1

내가 할 것while ( is >> tmps ) { std::cout << tmps << "\n"; }

— jordix

6

MFC / ATL에는 매우 좋은 토크 나이저가 있습니다. MSDN에서 :

CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;

resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
   printf("Resulting token: %s\n", resToken);
   resToken= str.Tokenize("% #",curPos);
};

Output

Resulting Token: First
Resulting Token: Second
Resulting Token: Third

— 텍사스의 짐
소스

1

이 Tokenize () 함수는 빈 토큰을 건너 뜁니다. 예를 들어 기본 문자열에 하위 문자열 "%%"가 있으면 빈 토큰이 반환되지 않습니다. 건너 뜁니다.

— Sheen

4

C를 기꺼이 사용하려면 strtok 함수를 사용할 수 있습니다 . 멀티 스레딩 문제를 사용할 때는주의해야합니다.

— Freund에
소스

3

strtok은 검사중인 문자열을 수정하므로 복사하지 않고 const char * 문자열에서 사용할 수 없습니다.

— Graeme Perrow

9

멀티 스레딩 문제는 strtok이 전역 변수를 사용하여 현재 위치를 추적한다는 점입니다. 따라서 strtok을 사용하는 두 개의 스레드가있는 경우 정의되지 않은 동작이 발생합니다.

— JohnMcG

@JohnMcG 또는 strtok_s기본적 strtok으로 명시 적 상태 전달을 사용하십시오.

— Matthias

4

간단한 것들을 위해 나는 다음을 사용합니다.

unsigned TokenizeString(const std::string& i_source,
                        const std::string& i_seperators,
                        bool i_discard_empty_tokens,
                        std::vector<std::string>& o_tokens)
{
    unsigned prev_pos = 0;
    unsigned pos = 0;
    unsigned number_of_tokens = 0;
    o_tokens.clear();
    pos = i_source.find_first_of(i_seperators, pos);
    while (pos != std::string::npos)
    {
        std::string token = i_source.substr(prev_pos, pos - prev_pos);
        if (!i_discard_empty_tokens || token != "")
        {
            o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
            number_of_tokens++;
        }

        pos++;
        prev_pos = pos;
        pos = i_source.find_first_of(i_seperators, pos);
    }

    if (prev_pos < i_source.length())
    {
        o_tokens.push_back(i_source.substr(prev_pos));
        number_of_tokens++;
    }

    return number_of_tokens;
}

비겁한 면책 조항 : 데이터가 바이너리 파일, 소켓 또는 일부 API 호출 (I / O 카드, 카메라)을 통해 들어오는 실시간 데이터 처리 소프트웨어를 작성합니다. 시작시 외부 구성 파일을 읽는 것보다 더 복잡하거나 시간이 중요한 작업에는이 기능을 사용하지 않습니다.

— 질 드 위트
소스

4

간단히 정규식 라이브러리 를 사용할 수 있습니다 을 사용하여 해결할 .

표현식 (\ w +)과 \ 1의 변수 (또는 정규 표현식의 라이브러리 구현에 따라 $ 1)를 사용하십시오.

— Fawix
소스

정규 표현식을 제안하면 +1, 워프 속도가 필요하지 않은 경우 가장 유연한 솔루션이며 아직 모든 곳에서 지원되지 않지만 시간이 지남에 따라 그 중요성이 줄어들 것입니다.

— odinthenerd

+1하고 방금 c ++ 11에서 <regex>를 시도했습니다. 매우 간단하고 우아함

— StahlRat

4

여기에 지나치게 복잡한 제안이 많이 있습니다. 이 간단한 std :: string 솔루션을 사용해보십시오.

using namespace std;

string someText = ...

string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
    sepOff = someText.find(' ', sepOff);
    string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
    string token = someText.substr(tokenOff, tokenLen);
    if (!token.empty())
        /* do something with token */;
    tokenOff = sepOff;
}

— David919
소스

4

나는 그것이 >>문자열 스트림 의 연산자 인 것이라고 생각했습니다 .

string word; sin >> word;

— 대런 토마스
소스

1

나쁜 (너무 단순한) 예를 제시하는 나의 잘못. 내가 아는 한, 구분 기호가 공백 일 때만 작동합니다.

— Bill the Lizard

4

Adam Pierce의 답변 은을 (를)받는 손으로 회전 된 토크 나이저를 제공합니다 const char*. 종료 이터레이터를 증가시키는 string것은 정의되어 있지 않기 때문에 이터레이터와 관련하여 조금 더 문제가 있습니다. string str{ "The quick brown fox" }우리가 확실히 이것을 달성 할 수 있다면 , 그것은 말했다 :

auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };

while (start != cend(str)) {
    const auto finish = find(++start, cend(str), ' ');

    tokens.push_back(string(start, finish));
    start = finish;
}

Live Example

On Freund가 제안한 것처럼 표준 기능을 사용하여 복잡성을 추상화하려는 경우 다음과 같은 strtok간단한 옵션이 있습니다.

vector<string> tokens;

for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);

C ++ 17에 액세스 할 수없는 경우 다음 data(str)예와 같이 대체해야합니다 . http://ideone.com/8kAGoa

예제에서는 설명하지 않았지만 strtok각 토큰에 동일한 구분 기호를 사용할 필요는 없습니다. 그러나이 장점과 함께 몇 가지 단점이 있습니다.

strtokstrings동시에 여러 개 를 사용할 수 없음 : nullptr현재 토큰 화를 계속하려면 a 를 전달 string하거나 토큰화할 새로운 char*것을 전달해야합니다 (단,이를 지원하는 일부 비표준 구현이 있습니다 (예 :) strtok_s).
같은 이유로 strtok여러 스레드에서 동시에 사용할 수 없습니다 (그러나 구현 정의가 가능할 수 있습니다 (예 : Visual Studio의 구현은 스레드 안전 )).
호출 은 작동중인 파일을 strtok수정 string하므로 const strings, const char*s 또는 리터럴 문자열에서이를 사용하여 토큰을 작성 strtok하거나 string내용을 보존해야하는 사람 에 대해 작동하거나 str복사해야하는 경우 복사 할 수 없습니다. ~에 작동하다

c ++ 20우리에게 제공 split_view비파괴 방식으로, 토큰 화 스트링 : https://topanswers.xyz/cplusplus?q=749#a874

이전 메소드는 토큰 화 vector된 인플레 이스 를 생성 할 수 없습니다 . 즉, 초기화 할 수없는 도우미 함수로 추상화하지 않으면 의미가 없습니다 const vector<string> tokens. 공백 구분 기호 를 허용 하는 기능 과 기능은을 사용하여 활용할 수 있습니다 . 예를 들어 다음과 같이 할 수 있습니다.istream_iteratorconst string str{ "The quick \tbrown \nfox" }

istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };

Live Example

istringstream이 옵션 의 필수 구성은 이전 2 옵션보다 비용이 훨씬 높지만이 비용은 일반적으로 string할당 비용으로 숨겨져 있습니다.

위의 옵션 중 어느 것도 귀하의 토큰 화 요구에 충분히 융통성이 없다면, 가장 융통성있는 옵션은 regex_token_iterator물론 이러한 융통성으로 비용이 많이 들지만 string할당 비용에 숨겨져있을 가능성이 높습니다 . 예를 들어 다음과 같은 입력이 주어지면 이스케이프되지 않은 쉼표를 기반으로 토큰 화하고 공백을 사용한다고 가정 해보십시오 const string str{ "The ,qu\\,ick ,\tbrown, fox" }.

const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };

Live Example

— 조나단 미
소스

strtok_s그건 그렇고 C11 표준입니다. strtok_rPOSIX2001 표준입니다. 이 둘 사이에는 strtok대부분의 플랫폼 에 대한 표준 재진입 버전이 있습니다.

— Andon M. Coleman

@ AndonM.Coleman 그러나 이것은 C ++ 질문이며 C ++ #include <cstring>에는 c99 버전의 strtok. 그래서 내 가정은이 주석을 지원 자료로 제공하여 strtok확장 의 구현 별 가용성을 보여주고 있다고 가정 합니다.

— Jonathan Mee

1

사람들이 달리 믿을 수있는 것만 큼 비표준 적이 지 않다는 것입니다. strtok_sC11과 Microsoft C 런타임에서 독립형 확장으로 제공됩니다. 여기에 Microsoft의 _s기능이 C 표준이 된 흥미로운 역사가 있습니다 .

— Andon M. Coleman

@ AndonM.Coleman 그래, 난 너와 함께있어. 분명히 C11 표준에 있다면 인터페이스와 구현에는 플랫폼과 독립적으로 동일한 동작을 요구하는 제약 조건이 있습니다. 이제 유일한 문제는 C11 기능을 모든 플랫폼에서 사용할 수있게하는 것입니다. C11 표준이 C ++ 17 또는 C ++ 20이 선택하도록 선택한 것이기를 바랍니다.

— Jonathan Mee

3

이 질문에 이미 답변했지만 기여하고 싶습니다. 어쩌면 내 솔루션이 약간 간단하지만 이것이 내가 생각해 낸 것입니다.

vector<string> get_words(string const& text, string const& separator)
{
    vector<string> result;
    string tmp = text;

    size_t first_pos = 0;
    size_t second_pos = tmp.find(separator);

    while (second_pos != string::npos)
    {
        if (first_pos != second_pos)
        {
            string word = tmp.substr(first_pos, second_pos - first_pos);
            result.push_back(word);
        }
        tmp = tmp.substr(second_pos + separator.length());
        second_pos = tmp.find(separator);
    }

    result.push_back(tmp);

    return result;
}

내 코드에 더 나은 접근 방법이 있거나 잘못된 것이 있으면 의견을 말하십시오.

업데이트 : 일반 구분 기호 추가

— 호두 까는 집게
소스

군중의 솔루션을 사용했습니다 :) 구분 기호를 추가하도록 코드를 수정할 수 있습니까?

— Zac

1

@Zac 당신이 그것을 좋아하고 기꺼이 그것을 수정할 수 있습니다 ... 그냥 내 답변에 대담한 업데이트 섹션을 추가 ...

— NutCracker

2

빈 토큰이 포함되는지 (strsep와 같은) 또는 제외 (strtok과 같은)를 제어 할 수있는 방법이 있습니다.

#include <string.h> // for strchr and strlen

/*
 * want_empty_tokens==true  : include empty tokens, like strsep()
 * want_empty_tokens==false : exclude empty tokens, like strtok()
 */
std::vector<std::string> tokenize(const char* src,
                                  char delim,
                                  bool want_empty_tokens)
{
  std::vector<std::string> tokens;

  if (src and *src != '\0') // defensive
    while( true )  {
      const char* d = strchr(src, delim);
      size_t len = (d)? d-src : strlen(src);

      if (len or want_empty_tokens)
        tokens.push_back( std::string(src, len) ); // capture token

      if (d) src += len+1; else break;
    }

  return tokens;
}

— 대런 스미스
소스

2

우리 모두에게 속도에 민감한 머저리가 여기에 있기 때문에 아무도 구분 기호에 대한 컴파일 시간 생성 조회 테이블을 사용하는 버전을 제시하지 않은 것으로 보입니다. 조회 테이블과 반복자를 사용하면 효율성이 std :: regex를 능가해야합니다. 정규식을 이길 필요가 없다면 C ++ 11 기준의 표준과 매우 유연합니다.

일부는 이미 정규 표현식을 제안했지만 멍청한 놈을 위해 OP가 기대하는 것을 정확하게 수행 해야하는 패키지 된 예제가 있습니다.

std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
    std::smatch m{};
    std::vector<std::string> ret{};
    while (std::regex_search (it,end,m,e)) {
        ret.emplace_back(m.str());              
        std::advance(it, m.position() + m.length()); //next start position = match position + match length
    }
    return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){  //comfort version calls flexible version
    return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
    std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
    auto v = split(str);
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    std::cout << "crazy version:" << std::endl;
    v = split(str, std::regex{"[^e]+"});  //using e as delim shows flexibility
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    return 0;
}

더 빨라야하고 모든 문자가 8 비트 여야한다는 제약을 받아들이는 경우 메타 프로그래밍을 사용하여 컴파일 타임에 조회 테이블을 만들 수 있습니다.

template<bool...> struct BoolSequence{};        //just here to hold bools
template<char...> struct CharSequence{};        //just here to hold chars
template<typename T, char C> struct Contains;   //generic
template<char First, char... Cs, char Match>    //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
    Contains<CharSequence<Cs...>, Match>{};     //strip first and increase index
template<char First, char... Cs>                //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {}; 
template<char Match>                            //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};

template<int I, typename T, typename U> 
struct MakeSequence;                            //generic
template<int I, bool... Bs, typename U> 
struct MakeSequence<I,BoolSequence<Bs...>, U>:  //not last
    MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U> 
struct MakeSequence<0,BoolSequence<Bs...>,U>{   //last  
    using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
    /* could be made constexpr but not yet supported by MSVC */
    static bool isDelim(const char c){
        static const bool table[256] = {Bs...};
        return table[static_cast<int>(c)];
    }   
};
using Delims = CharSequence<'.',',',' ',':','\n'>;  //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;

그 자리에 getNextToken기능을 쉽게 만들 수 있습니다.

template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
    begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
    auto second = std::find_if(begin,end,Table{});      //find first delim or end
    return std::make_pair(begin,second);
}

그것을 사용하는 것도 쉽습니다 :

int main() {
    std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
    auto it = std::begin(s);
    auto end = std::end(s);
    while(it != std::end(s)){
        auto token = getNextToken(it,end);
        std::cout << std::string(token.first,token.second) << std::endl;
        it = token.second;
    }
    return 0;
}

실제 예는 다음과 같습니다. http://ideone.com/GKtkLQ

— 오딘 테너 드
소스

1

문자열 구분 기호로 토큰화할 수 있습니까?

— Galigator

이 버전은 단일 문자 구분 기호에만 최적화되어 있으며 룩업 테이블을 사용하면 다중 문자 (문자열) 구분 기호에 적합하지 않으므로 정규 표현식을 효율적으로 이길 수 없습니다.

— odinthenerd

1

boost :: make_find_iterator를 활용할 수 있습니다. 이것과 비슷한 것 :

template<typename CH>
inline vector< basic_string<CH> > tokenize(
    const basic_string<CH> &Input,
    const basic_string<CH> &Delimiter,
    bool remove_empty_token
    ) {

    typedef typename basic_string<CH>::const_iterator string_iterator_t;
    typedef boost::find_iterator< string_iterator_t > string_find_iterator_t;

    vector< basic_string<CH> > Result;
    string_iterator_t it = Input.begin();
    string_iterator_t it_end = Input.end();
    for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal()));
        i != string_find_iterator_t();
        ++i) {
        if(remove_empty_token){
            if(it != i->begin())
                Result.push_back(basic_string<CH>(it,i->begin()));
        }
        else
            Result.push_back(basic_string<CH>(it,i->begin()));
        it = i->end();
    }
    if(it != it_end)
        Result.push_back(basic_string<CH>(it,it_end));

    return Result;
}

— 발진
소스

1

다음은 문자열을 공백으로 나누고 작은 따옴표와 큰 따옴표로 묶은 문자열을 설명하고 결과에서 해당 문자를 제거하는 문자열 토큰 화 장치의 스위스 군용 나이프입니다. RegexBuddy 4.x를 사용하여 대부분 의 코드 스 니펫 을 생성 했지만 따옴표 제거 및 기타 몇 가지 사항에 대한 사용자 지정 처리를 추가했습니다.

#include <string>
#include <locale>
#include <regex>

std::vector<std::wstring> tokenize_string(std::wstring string_to_tokenize) {
    std::vector<std::wstring> tokens;

    std::wregex re(LR"(("[^"]*"|'[^']*'|[^"' ]+))", std::regex_constants::collate);

    std::wsregex_iterator next( string_to_tokenize.begin(),
                                string_to_tokenize.end(),
                                re,
                                std::regex_constants::match_not_null );

    std::wsregex_iterator end;
    const wchar_t single_quote = L'\'';
    const wchar_t double_quote = L'\"';
    while ( next != end ) {
        std::wsmatch match = *next;
        const std::wstring token = match.str( 0 );
        next++;

        if (token.length() > 2 && (token.front() == double_quote || token.front() == single_quote))
            tokens.emplace_back( std::wstring(token.begin()+1, token.begin()+token.length()-1) );
        else
            tokens.emplace_back(token);
    }
    return tokens;
}

— kayleeFrye_onDeck
소스

1

(Down) 투표는 공감 투표만큼이나 건설적 일 수 있지만, 왜 그런지에 대한 의견을 남기지 않을 때 ...

— kayleeFrye_onDeck

1

나는 당신을

— 골랐다.

감사합니다 @ mattshu! 정규식 세그먼트가 어려워 지거나 다른 것입니까?

— kayleeFrye_onDeck

0

토큰화할 입력 문자열의 최대 길이를 알고 있으면이를 활용하여 매우 빠른 버전을 구현할 수 있습니다. 아래의 기본 아이디어를 스케치하고 있는데, 이는 Jon Bentley의 "Programming Perls"2 판 15 장에 설명 된 strtok () 및 "접미사 배열"-데이터 구조에서 영감을 얻었습니다.이 경우 C ++ 클래스는 일부 조직과 편의를 제공합니다. 사용합니다. 표시된 구현은 토큰에서 선행 및 후행 공백 문자를 제거하기 위해 쉽게 확장 할 수 있습니다.

기본적으로 구분 문자를 문자열 종료 '\ 0'문자로 바꾸고 수정 된 문자열을 사용하여 토큰에 대한 포인터를 설정할 수 있습니다. 문자열이 분리 자로 만 구성된 극단적 인 경우 하나는 문자열 길이에 1 개의 빈 토큰을 더합니다. 수정할 문자열을 복제하는 것이 실용적입니다.

헤더 파일 :

class TextLineSplitter
{
public:

    TextLineSplitter( const size_t max_line_len );

    ~TextLineSplitter();

    void            SplitLine( const char *line,
                               const char sep_char = ',',
                             );

    inline size_t   NumTokens( void ) const
    {
        return mNumTokens;
    }

    const char *    GetToken( const size_t token_idx ) const
    {
        assert( token_idx < mNumTokens );
        return mTokens[ token_idx ];
    }

private:
    const size_t    mStorageSize;

    char           *mBuff;
    char          **mTokens;
    size_t          mNumTokens;

    inline void     ResetContent( void )
    {
        memset( mBuff, 0, mStorageSize );
        // mark all items as empty:
        memset( mTokens, 0, mStorageSize * sizeof( char* ) );
        // reset counter for found items:
        mNumTokens = 0L;
    }
};

구현 파일 :

TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
    mStorageSize ( max_line_len + 1L )
{
    // allocate memory
    mBuff   = new char  [ mStorageSize ];
    mTokens = new char* [ mStorageSize ];

    ResetContent();
}

TextLineSplitter::~TextLineSplitter()
{
    delete [] mBuff;
    delete [] mTokens;
}


void TextLineSplitter::SplitLine( const char *line,
                                  const char sep_char   /* = ',' */,
                                )
{
    assert( sep_char != '\0' );

    ResetContent();
    strncpy( mBuff, line, mMaxLineLen );

    size_t idx       = 0L; // running index for characters

    do
    {
        assert( idx < mStorageSize );

        const char chr = line[ idx ]; // retrieve current character

        if( mTokens[ mNumTokens ] == NULL )
        {
            mTokens[ mNumTokens ] = &mBuff[ idx ];
        } // if

        if( chr == sep_char || chr == '\0' )
        { // item or line finished
            // overwrite separator with a 0-terminating character:
            mBuff[ idx ] = '\0';
            // count-up items:
            mNumTokens ++;
        } // if

    } while( line[ idx++ ] );
}

사용 시나리오는 다음과 같습니다.

// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
    printf( "%s\n", spl.GetToken( i ) );
}

산출:

Item1

Item2
Item3

— 엔젤 시니 거 스키
소스

0

boost::tokenizer친구이지만 레거시 / 유형 대신 wstring/ wchar_t를 사용하여 국제화 (i18n) 문제를 참조하여 코드를 이식 가능하게 만드는 것이 좋습니다 .stringchar

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

using namespace std;
using namespace boost;

typedef tokenizer<char_separator<wchar_t>,
                  wstring::const_iterator, wstring> Tok;

int main()
{
  wstring s;
  while (getline(wcin, s)) {
    char_separator<wchar_t> sep(L" "); // list of separator characters
    Tok tok(s, sep);
    for (Tok::iterator beg = tok.begin(); beg != tok.end(); ++beg) {
      wcout << *beg << L"\t"; // output (or store in vector)
    }
    wcout << L"\n";
  }
  return 0;
}

— 요헨 라이너
소스

"레거시"는 확실히 정확 wchar_t하지 않으며 절대적으로 필요한 경우가 아니면 아무도 사용하지 않아야하는 끔찍한 구현 종속 유형입니다.

— CoffeeandCode

wchar_t를 사용해도 i18n 문제가 자동으로 해결되지는 않습니다. 인코딩을 사용하여 해당 문제를 해결하십시오. 문자열을 구분 기호로 분할하는 경우 구분 기호가 문자열 내부의 토큰의 인코딩 된 내용과 충돌하지 않습니다. 이스케이프 등이 필요할 수 있습니다. wchar_t는 이에 대한 마법의 해결책이 아닙니다.

— yonil

0

간단한 C ++ 코드 (표준 C ++ 98)는 여러 구분 기호 (std :: string에 지정)를 허용하며 벡터, 문자열 및 반복자 만 사용합니다.

#include <iostream>
#include <vector>
#include <string>
#include <stdexcept> 

std::vector<std::string> 
split(const std::string& str, const std::string& delim){
    std::vector<std::string> result;
    if (str.empty())
        throw std::runtime_error("Can not tokenize an empty string!");
    std::string::const_iterator begin, str_it;
    begin = str_it = str.begin(); 
    do {
        while (delim.find(*str_it) == std::string::npos && str_it != str.end())
            str_it++; // find the position of the first delimiter in str
        std::string token = std::string(begin, str_it); // grab the token
        if (!token.empty()) // empty token only when str starts with a delimiter
            result.push_back(token); // push the token into a vector<string>
        while (delim.find(*str_it) != std::string::npos && str_it != str.end())
            str_it++; // ignore the additional consecutive delimiters
        begin = str_it; // process the remaining tokens
        } while (str_it != str.end());
    return result;
}

int main() {
    std::string test_string = ".this is.a.../.simple;;test;;;END";
    std::string delim = "; ./"; // string containing the delimiters
    std::vector<std::string> tokens = split(test_string, delim);           
    for (std::vector<std::string>::const_iterator it = tokens.begin(); 
        it != tokens.end(); it++)
            std::cout << *it << std::endl;
}

— vsoftco
소스