공백 / 탭 / 줄 바꿈 제거-Python

Question 1

Linux에서 python 2.7의 모든 공백 / 탭 / 줄 바꿈을 제거하려고합니다.

나는 이것을 썼다.

myString="I want to Remove all white \t spaces, new lines \n and tabs \t"
myString = myString.strip(' \n\t')
print myString

산출:

I want to Remove all white   spaces, new lines 
 and tabs

간단한 일처럼 보이지만 여기에 뭔가 빠졌습니다. 뭔가를 가져와야하나요?

Question 2

또는 str.split([sep[, maxsplit]])없이 사용 :sepsep=None

에서 문서 :

경우 sep지정되지 않았거나 None연속적인 공백의 실행은 하나의 구분으로 간주하고, 문자열이 선행 또는 후행 공백 경우 결과는 시작 또는 끝 부분에 빈 문자열을 포함하지 않습니다 : 다른 분할 알고리즘이 적용됩니다.

데모:

>>> myString.split()
['I', 'want', 'to', 'Remove', 'all', 'white', 'spaces,', 'new', 'lines', 'and', 'tabs']

str.join이 출력을 얻으려면 반환 된 목록에서 사용하십시오 .

>>> ' '.join(myString.split())
'I want to Remove all white spaces, new lines and tabs'

Question 3

여러 개의 공백 항목을 제거하고 단일 공백으로 바꾸려면 가장 쉬운 방법은 다음과 같은 정규 표현식을 사용하는 것입니다.

>>> import re
>>> myString="I want to Remove all white \t spaces, new lines \n and tabs \t"
>>> re.sub('\s+',' ',myString)
'I want to Remove all white spaces, new lines and tabs '

그런 다음 .strip()원하는 경우 후행 공백을 제거 할 수 있습니다 .

Question 4

사용 재 라이브러리를

import re
myString = "I want to Remove all white \t spaces, new lines \n and tabs \t"
myString = re.sub(r"[\n\t\s]*", "", myString)
print myString

산출:

모든 공백, 줄 바꿈 및 탭 제거

Question 5

import re

mystr = "I want to Remove all white \t spaces, new lines \n and tabs \t"
print re.sub(r"\W", "", mystr)

Output : IwanttoRemoveallwhitespacesnewlinesandtabs

Question 6

이렇게하면 탭, 줄 바꿈, 공백 및 기타 항목 만 제거됩니다.

import re
myString = "I want to Remove all white \t spaces, new lines \n and tabs \t"
output   = re.sub(r"[\n\t\s]*", "", myString)

산출:

IwantoRemoveallwhiespaces, newlinesandtabs

좋은 날!

Question 7

정규식 사용을 제안하는 위의 솔루션은 이것이 매우 작은 작업이고 정규식에는 작업의 단순성이 정당화하는 것보다 더 많은 리소스 오버 헤드가 필요하기 때문에 이상적이지 않습니다.

내가하는 일은 다음과 같습니다.

myString = myString.replace(' ', '').replace('\t', '').replace('\n', '')

또는 한 줄 솔루션이 무의미하게 길어 지도록 제거 할 항목이 많은 경우 :

removal_list = [' ', '\t', '\n']
for s in removal_list:
  myString = myString.replace(s, '')

Question 8

더 복잡한 것이 없기 때문에 도움이되었으므로 공유하고 싶었습니다.

이것이 내가 원래 사용한 것입니다.

import requests
import re

url = '/programming/10711116/strip-spaces-tabs-newlines-python' # noqa
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print("{}".format(r.content))

원하지 않는 결과 :

b'<!DOCTYPE html>\r\n\r\n\r\n    <html itemscope itemtype="http://schema.org/QAPage" class="html__responsive">\r\n\r\n    <head>\r\n\r\n        <title>string - Strip spaces/tabs/newlines - python - Stack Overflow</title>\r\n        <link

이것이 내가 그것을 변경 한 것입니다.

import requests
import re

url = '/programming/10711116/strip-spaces-tabs-newlines-python' # noqa
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
regex = r'\s+'
print("CNT: {}".format(re.sub(regex, " ", r.content.decode('utf-8'))))

원하는 결과 :

<!DOCTYPE html> <html itemscope itemtype="http://schema.org/QAPage" class="html__responsive"> <head> <title>string - Strip spaces/tabs/newlines - python - Stack Overflow</title>

@MattH가 언급 한 정확한 정규식은 그것을 내 코드에 맞추는 데 효과적이었습니다. 감사!

참고 : 이것은 python3

Question 9

조인 내에서 목록 이해를 사용하는 한 줄짜리는 어떻습니까?

>>> foobar = "aaa bbb\t\t\tccc\nddd"
>>> print(foobar)
aaa bbb                 ccc
ddd

>>> print(''.join([c for c in foobar if c not in [' ', '\t', '\n']]))
aaabbbcccddd