공백을 자르려면 어떻게합니까?

1071

문자열에서 공백 (공백 및 탭)을 자르는 Python 함수가 있습니까?

예 : \t example string\t→example string

— 크리스
소스

1

고마워요 스트립 기능을 더 일찍 발견했지만 입력에 대해 작동하지 않는 것 같습니다.

— Chris

1

과 동일 : stackoverflow.com/questions/761804/trimming-a-string-in-python (이 질문은 조금 더 명확하더라도, IMHO). 이것도 거의 동일합니다 : stackoverflow.com/questions/959215/…

— Jonik

6

python은 공백이에 저장된 것으로 간주되는 문자입니다 string.whitespace.

— John Fouhy

2

"스트립 기능"이란 스트립 방법을 의미합니까? "내 입력에서 작동하지 않는 것 같습니다"코드, 입력 및 출력을 제공하십시오.

— S.Lott

5

파이썬에서 문자열 트리밍

— Breno Baiardi

1599

양쪽에 공백 :

s = "  \t a string example\t  "
s = s.strip()

오른쪽의 공백 :

s = s.rstrip()

왼쪽의 공백 :

s = s.lstrip()

으로 thedz는 지적,이 같은 이러한 기능 중 하나에 임의의 문자를 제거하기 위해 인수를 제공 할 수 있습니다 :

s = s.strip(' \t\n\r')

이 모든 공간을 제거합니다, \t, \n, 또는 \r왼쪽의 문자, 오른쪽, 또는 문자열의 양쪽.

위의 예제는 문자열의 왼쪽과 오른쪽에서만 문자열을 제거합니다. 문자열 중간에서 문자를 제거하려면 re.sub다음을 시도하십시오 .

import re
print re.sub('[\s+]', '', s)

인쇄해야합니다.

astringexample

— 제임스 톰슨
소스

18

strip ()은 어떤 여행을할지 말해주는 논쟁을한다. 시도 : strip ( '\ t \ n \ r')

— thedz

3

예제에 대한 결과는 매우 도움이 될 것입니다 :)

— ton

4

공백 문자를 나열 할 필요가 없습니다. docs.python.org/2/library/string.html#string.whitespace

— jesuis

3

마지막 예는 정확히을 사용하는 것 str.replace(" ","")입니다. re공간이 두 개 이상인 경우를 제외하고는 을 사용할 필요 가 없으며 예제가 작동하지 않습니다. []단일 문자를 표시하도록 설계되었으므로 just을 사용하는 경우 불필요합니다 \s. 사용 중 \s+또는 [\s]+(불필요한)하지만 [\s+]일을하지 않습니다, 당신은 회전과 같은 하나 하나에 여러 공백을 대체 할 경우 특히 "this example" 에 "this example".

— Jorge E. Cardona

3

@ JorgeE.Cardona-당신이 약간 잘못 한 점 \s은 탭을 포함하지만 replace(" ", "")그렇지 않을 것입니다.

— ArtOfWarfare

72

파이썬 trim메소드는 strip다음과 같습니다.

str.strip() #trim
str.lstrip() #ltrim
str.rstrip() #rtrim

— gcb
소스

5

s tri p가 거의 tri m 처럼 보이기 때문에 기억하기 쉽습니다 .

— isar

22

선행 및 후행 공백의 경우 :

s = '   foo    \t   '
print s.strip() # prints "foo"

그렇지 않으면 정규 표현식이 작동합니다.

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

— ars
소스

1

정규식을 컴파일하지 않았습니다. 당신은 그것을해야합니다pat = re.compile(r'\s+')

— Evan Fosmark

일반적으로 나중에 단어를 병합 sub(" ", s)하지 않기를 원하며 ""더 이상 .split(" ")토큰 화 에 사용할 수 없습니다 .

— user3467349

이 print문장 의 출력을 보는 것이 좋을 것입니다

— Ron Klein

19

매우 간단하고 기본적인 함수 인 str.replace () 를 사용할 수 있으며 공백 및 탭과 함께 작동합니다.

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

간단하고 쉽습니다.

— 루카스
소스

2

그러나 이것은 슬프게도 내부 공간을 제거하는 반면 원래 질문의 예는 내부 공간을 그대로 유지합니다.

— Brandon Rhodes

12

#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']

— 로버트 킹
소스

4

아직이 정규식 솔루션을 게시 한 사람이 없습니다.

어울리는:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

검색 중 ( "공백 만"입력 케이스를 다르게 처리해야 함) :

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

을 사용 re.sub하면 내부 공백을 제거 할 수 있으며 이는 바람직하지 않을 수 있습니다.

— 사용자 1149913
소스

3

공백은 공백, 탭 및 CRLF를 포함 합니다 . 따라서 사용할 수있는 우아하고 한 줄짜리 문자열 함수는 translate 입니다.

' hello apple'.translate(None, ' \n\t\r')

또는 철저하고 싶다면

import string
' hello  apple'.translate(None, string.whitespace)

— MaK
소스

3

(re.sub ( '+', '', (my_str.replace ( '\ n', '')))). strip ()

불필요한 공백과 개행 문자가 모두 제거됩니다. 이 도움을 바랍니다

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

결과는 다음과 같습니다.

'a b \ nc' 는 'ab c' 로 변경됩니다

— 사프 반 CK
소스

2

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

산출:

please_remove_all_whitespaces

답변에 Le Droid의 의견 추가. 공백으로 분리하려면 다음을 수행하십시오.

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

산출:

여분의 공백을 모두 제거하십시오

— pbn
소스

1

간단하고 효율적입니다. "".join (...을 사용하여 단어를 공백으로 구분할 수 있습니다.

— Le Droid

1

Python 3을 사용하는 경우 : print 문에서 sep = ""로 완료하십시오. 모든 공간이 분리됩니다.

예:

txt="potatoes"
print("I love ",txt,"",sep="")

이 인쇄됩니다 : 나는 감자를 사랑 해요.

대신 : 나는 감자를 좋아합니다.

귀하의 경우 \ t를 타려고하므로 sep = "\ t"를 수행하십시오.

— Morgansmnm
소스

1

다양한 이해 수준으로 여기에서 꽤 많은 솔루션을 살펴본 결과 문자열이 쉼표로 구분되어 있으면 어떻게 해야하는지 궁금했습니다 ...

문제

csv의 연락처 정보를 처리하는 동안 외부 공백과 정크를 자르지 만 후행 쉼표와 내부 공백은 유지하는이 문제가 해결되었습니다. 연락처에 메모가있는 필드로 작업하면서 쓰레기를 제거하고 좋은 물건을 남기고 싶었습니다. 구두점과 채찍을 모두 없애고 나중에 다시 작성하고 싶지 않기 때문에 복합 토큰 사이의 공백을 잃고 싶지 않았습니다.

정규식과 패턴 : `[\s_]+?\W+`

와 (가능한 한 적은 수의 문자) 공백 문자와 유유히 무제한 1에서 밑줄 ( '_')의 단일 인스턴스에 대한 패턴 외모 [\s_]+?단어가 아닌 문자가 무제한 1에서 발생하기 전에 올 이것으로 시간 : \W+(와 동일 [^a-zA-Z0-9_]). 특히 공백 문자 (\ 0), 탭 (\ t), 줄 바꿈 (\ n), 피드 포워드 (\ f), 캐리지 리턴 (\ r)을 찾습니다.

나는 이것에 대한 이점을 두 가지로 봅니다.

전체 단어 / 토큰 사이에 공백을 제거하지 않고 함께 유지하려고합니다.
파이썬의 내장 문자열 메소드 strip()는 문자열 내부를 다루지 않고 왼쪽과 오른쪽 끝을 처리하며 기본 arg는 null 문자입니다 (아래 예 참조 : 텍스트에 여러 줄 바꿈이 있으며 strip()정규식 패턴이 수행하는 동안 줄 바꿈이 모두 제거되지는 않습니다) .text.strip(' \n\t\r')

이것은 OPs 질문을 뛰어 넘지 만, 텍스트 데이터 내에 이상한 병리학 적 사례가있을 수있는 경우가 많이 있다고 생각합니다 (일부 텍스트에서 이스케이프 문자가 끝나는 방식). 또한 목록과 같은 문자열에서 구분 기호가 두 개의 공백 문자 또는 '-,'또는 '-, ,,,'와 같은 단어가 아닌 문자를 분리하지 않는 한 구분 기호를 제거하고 싶지 않습니다.

NB : CSV 자체의 구분 기호에 대해 이야기하지 않습니다. 데이터가 목록과 같은 CSV 내 인스턴스, 즉 cs 문자열 하위 문자열입니다.

전체 공개 : 나는 약 한 달 동안 텍스트를 조작하고 지난 2 주 동안 정규 표현식을 사용했기 때문에 누락 된 뉘앙스가 있다고 확신합니다. 즉, 작은 문자열 모음 (광산은 12,000 행과 40 개의 홀수 열의 데이터 프레임에 있음)의 경우 외부 문자를 제거하기위한 마지막 단계로 특히 공백이 많은 경우 특히 잘 작동합니다. 단어가 아닌 문자로 결합 된 텍스트를 분리하려고하지만 이전에없는 곳에 공백을 추가하고 싶지 않습니다.

예를 들면 :

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

이 결과는 다음과 같습니다.

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

따라서 strip은 한 번에 하나의 공백을 제거합니다. 따라서 OP의 경우 strip()에는 괜찮습니다. 그러나 상황이 더 복잡해지면 정규식과 유사한 패턴이 더 일반적인 설정에 가치가있을 수 있습니다.

실제로 보아라

— 조슈아 피들러
소스

0

번역을 시도

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

— 海洋顶端
소스

0

문자열의 시작과 끝 부분에서 공백을 제거하려면 다음과 같이 할 수 있습니다.

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

이것은 내부 공백을 남겨두고 앞뒤 공백을 제거한다는 점에서 Qt의 QString :: trimmed () 메소드와 매우 유사합니다.

그러나 Qt의 QString :: simplified () 메소드를 사용하여 선행 및 후행 공백을 제거 할뿐만 아니라 모든 연속적인 내부 공백을 하나의 공백 문자로 "중지" .split()하고 다음 " ".join과 같은 조합을 사용할 수 있습니다 .

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

이 마지막 예에서 문자열의 시작과 끝에서 공백을 자르면서 내부 공백의 각 시퀀스는 단일 공백으로 바뀝니다.

— JL
소스

-1

일반적으로 다음 방법을 사용하고 있습니다.

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

참고 : 이것은 "\ n", "\ r"및 "\ t"만 제거하기위한 것입니다. 여분의 공백은 제거하지 않습니다.

— 메이어 코스 티
소스

-2

문자열 중간에서 공백을 제거하기위한 것

$p = "ATGCGAC ACGATCGACC";
$p =~ s/\s//g;
print $p;

산출:

ATGCGACACGATCGACC

— 마스터 로시
소스

1

이 질문은 자바 스크립트 나 펄이 아닌 파이썬에 관한 것입니다.

— phuclv

-17

문자열의 시작과 끝에서 모든 공백과 줄 바꿈이 제거됩니다.

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

— 레이프
소스

8

s.strip()정확히 이것을 할 때 왜 정규 표현식을 사용 합니까?

— Ned Batchelder

1

s.strip()초기 공백 만 처리 하지만 다른 원치 않는 문자를 제거한 후 공백 "발견" 은 처리 하지 않습니다. 이렇게하면 최종 행간 이후 공백도 제거됩니다.\n

— Rafe

누군가이 답변을 다운 투표했지만 왜 결함이 있는지 설명하지 않았습니다. 부끄러운 줄 아세요 (@@ edBatchelder 만약 당신의 질문을 설명하고 당신이 실제로 내 대답에 깨진 것을 언급하지 않은 것처럼 투표율이 하락했다면 반대하십시오)

— Rafe

10

Rafe, 다시 확인하고 싶을 수도 있습니다 s.strip(). 정규식과 정확히 동일한 결과를 생성합니다.

— Ned Batchelder

3

@Rafe, 트림과 혼동하고 있습니다. Strip은 필요한 작업을 수행합니다.

— iMitwe

공백을 자르려면 어떻게합니까?

문제

정규식과 패턴 : [\s_]+?\W+

정규식과 패턴 : `[\s_]+?\W+`