파이썬에서 문자열을 부동으로 변환 할 수 있는지 확인

182

문자열 목록을 통해 실행되고 가능한 경우 정수 또는 부동 소수점 숫자로 변환하는 Python 코드가 있습니다. 정수로 이것을하는 것은 매우 쉽습니다.

if element.isdigit():
  newelement = int(element)

부동 소수점 숫자는 더 어렵습니다. 지금 partition('.')은 문자열을 분할하고 한쪽 또는 양쪽이 숫자인지 확인하는 데 사용하고 있습니다.

partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit()) 
    or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
  newelement = float(element)

이것은 작동하지만 분명히 if 문은 약간의 곰입니다. 내가 고려한 다른 해결책은 이 질문에 설명 된 것처럼 변환을 try / catch 블록으로 감싸고 성공했는지 확인하는 것 입니다.

다른 아이디어가 있습니까? 파티션 및 시도 / 캐치 접근의 상대적인 장점에 대한 의견?

python string type-conversion

— 크리스 업 처치
소스

305

난 그냥 사용합니다 ..

try:
    float(element)
except ValueError:
    print "Not a float"

.. 그것은 간단하고 작동합니다

또 다른 옵션은 정규식입니다.

import re
if re.match(r'^-?\d+(?:\.\d+)?$', element) is None:
    print "Not float"

— dbr
소스

3

@ S.Lott :이 문자열이 적용되는 대부분의 문자열은 int 또는 float로 나타납니다.

— Chris Upchurch

10

정규식이 최적이 아닙니다. "^ \ d + \. \ d + $"는 위와 동일한 속도로 경기에 실패하지만 더 빨리 성공합니다. 또한 더 정확한 방법은 다음과 같습니다. "^ [+-]? \ d (>? \. \ d +)? $"그러나 여전히 다음과 같은 숫자와 일치하지 않습니다. +

— 1.0e

86

함수 이름을 "will_it_float"로 지정하지 않은 것을 제외하고.

— 마운트

3

두 번째 옵션은 2e3과 같은 nan 및 지수 표현을 포착하지 않습니다.

— Patrick B.

4

정규식이 음수를 구문 분석하지 않는 것 같습니다.

— Carlos

191

float 확인을위한 Python 메소드 :

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

플로트 보트에 숨어있는 고블린들에게 물지 마세요! 단위 테스트를 수행하십시오!

플로트가 아닌 것은 무엇입니까?

Command to parse                        Is it a float?  Comment
--------------------------------------  --------------- ------------
print(isfloat(""))                      False
print(isfloat("1234567"))               True 
print(isfloat("NaN"))                   True            nan is also float
print(isfloat("NaNananana BATMAN"))     False
print(isfloat("123.456"))               True
print(isfloat("123.E4"))                True
print(isfloat(".1"))                    True
print(isfloat("1,234"))                 False
print(isfloat("NULL"))                  False           case insensitive
print(isfloat(",1"))                    False           
print(isfloat("123.EE4"))               False           
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777"))              True            This is same as Inf
print(isfloat("-iNF"))                  True
print(isfloat("1.797693e+308"))         True
print(isfloat("infinity"))              True
print(isfloat("infinity and BEYOND"))   False
print(isfloat("12.34.56"))              False           Two dots not allowed.
print(isfloat("#56"))                   False
print(isfloat("56%"))                   False
print(isfloat("0E0"))                   True
print(isfloat("x86E0"))                 False
print(isfloat("86-5"))                  False
print(isfloat("True"))                  False           Boolean is not a float.   
print(isfloat(True))                    True            Boolean is a float
print(isfloat("+1e1^5"))                False
print(isfloat("+1e1"))                  True
print(isfloat("+1e1.3"))                False
print(isfloat("+1.3P1"))                False
print(isfloat("-+1"))                   False
print(isfloat("(1)"))                   False           brackets not interpreted

— 에릭 레친 스키
소스

6

좋은 대답입니다. float = True : isfloat(" 1.23 ")및을 2 개 더 추가하면 isfloat(" \n \t 1.23 \n\t\n")됩니다. 웹 요청에 유용합니다. 공백을 먼저 다듬을 필요가 없습니다.

— BareNakedCoder

22

'1.43'.replace('.','',1).isdigit()

true'.'가 없거나없는 경우에만 반환 됩니다. 자릿수로.

'1.4.3'.replace('.','',1).isdigit()

돌아올 것이다 false

'1.ww'.replace('.','',1).isdigit()

돌아올 것이다 false

— 툴라시
소스

3

최적은 아니지만 실제로는 영리합니다. +/- 및 지수를 처리하지 않습니다.

— Mad Physicist

몇 년 늦었지만 이것은 좋은 방법입니다. 팬더 데이터 프레임에서 다음을 사용하여 나를 위해 일했습니다.[i for i in df[i].apply(lambda x: str(x).replace('.','').isdigit()).any()]

— Mark Moretto

1

@MarkMoretto 음수의 존재를 알면 충격을받을 것입니다

— David Heffernan

긍정적 인 수레 또는 숫자를 확인 해야하는 시나리오에 가장 적합한 라이너. 나는 좋아한다.

— MJohnyJ

8

TL; DR :

입력이 대부분 float로 변환 될 수 있는 문자열 인 경우이 try: except:방법이 가장 적합한 기본 Python 방법입니다.
입력이 대부분 부동 소수점으로 변환 할 수없는 문자열 인 경우 정규식 또는 파티션 방법이 더 좋습니다.
1) 입력이 확실하지 않거나 속도가 더 필요하고 2) 타사 C 확장을 신경 쓰지 않고 설치할 수 있으면 빠른 번호 가 매우 잘 작동합니다.

fastnumbers 라는 타사 모듈을 통해 사용할 수있는 다른 방법이 있습니다 (공개, 저는 저자입니다). isfloat 라는 함수를 제공합니다 . 이 답변 에서 Jacob Gabrielson 이 간략하게 설명한 unittest 예제를 취 했지만 fastnumbers.isfloat방법을 추가했습니다 . 또한 Jacob의 예제는 점 연산자로 인해 전역 조회에 소비 되었기 때문에 정규식 옵션에 대한 정의를 수행하지 않았다는 점에 유의해야합니다 try: except:.

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$").match
def is_float_re(str):
    return True if _float_regexp(str) else False

def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True
    else:
        return False

from fastnumbers import isfloat


if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
            print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
            print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()

        def test_fn_perf(self):
            print
            print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
            print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()


        def test_part_perf(self):
            print
            print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
            print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()

    unittest.main()

내 컴퓨터에서 출력은 다음과 같습니다.

fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s

OK

보시다시피, 정규 표현식은 실제로 원래 보이는 것만 큼 나쁘지 않으며, 실제로 속도가 필요한 경우 fastnumbers방법이 좋습니다.

— 세스
소스

당신이 수레로 변환 할 수없는 문자열의 대부분이있는 경우 빠른 번호는 정말 감사합니다, 최대 것들을 속도를 너무 잘 작동 확인

— ragardner

5

성능에 관심이 있다면 (그리고 내가 제안하지는 않겠지 만) 시도 기반 접근 방식은 파티션 기반 접근 방식이나 정규 표현식 접근 방식과 비교할 때 확실한 승자가 될 것입니다. 유효하지 않은 문자열.이 경우 잠재적으로 느릴 수 있습니다 (아마 예외 처리 비용으로 인해).

다시 한 번 말하지만, 성능에 신경 쓰지 말고 초당 100 억 번을 수행하는 경우 데이터를 제공하는 것입니다. 또한 파티션 기반 코드는 하나 이상의 유효한 문자열을 처리하지 않습니다.

$ ./floatstr.py
에프..
슬픈 파티션 : 3.1102449894
행복한 파티션 : 2.09208488464
..
다시 슬퍼 : 7.76906108856
다시 행복 : 7.09421992302
..
슬프다 : 12.1525540352
행복해 : 1.44165301323
.
===================================================== =====================
실패 : test_partition (__ main __. ConvertTests)
-------------------------------------------------- --------------------
역 추적 (가장 최근 통화) :
  test_partition의 48 행 "./floatstr.py"파일
    self.failUnless (is_float_partition ( "20e2"))
AssertionError

-------------------------------------------------- --------------------
33.670에서 8 번의 테스트 실행

실패 (실패 = 1)

코드는 다음과 같습니다 (Python 2.6, John Gietzen의 답변 에서 가져온 regexp ) :

def is_float_try(str):
    try:
        float(str)
        return True
    except ValueError:
        return False

import re
_float_regexp = re.compile(r"^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$")
def is_float_re(str):
    return re.match(_float_regexp, str)


def is_float_partition(element):
    partition=element.partition('.')
    if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa\
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
        return True

if __name__ == '__main__':
    import unittest
    import timeit

    class ConvertTests(unittest.TestCase):
        def test_re(self):
            self.failUnless(is_float_re("20e2"))

        def test_try(self):
            self.failUnless(is_float_try("20e2"))

        def test_re_perf(self):
            print
            print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
            print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()

        def test_try_perf(self):
            print
            print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
            print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()

        def test_partition_perf(self):
            print
            print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
            print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()

        def test_partition(self):
            self.failUnless(is_float_partition("20e2"))

        def test_partition2(self):
            self.failUnless(is_float_partition(".2"))

        def test_partition3(self):
            self.failIf(is_float_partition("1234x.2"))

    unittest.main()

— 제이콥 가브리엘 슨
소스

4

다양성을 위해 여기에 또 다른 방법이 있습니다.

>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False

편집 : 특히 지수가있을 때 플로트의 모든 경우를 견딜 수는 없습니다. 이를 해결하기 위해 다음과 같이 보입니다. 이것은 val은 int에 대한 float 및 False이지만 True 만 반환하지만 정규 표현식보다 성능이 떨어집니다.

>>> def isfloat(val):
...     return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val],  len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False

— 피터 무어
소스

isnumeric 함수는 분수와 같은 다양한 유니 코드 문자에서 true를 반환하므로 좋지 않은 선택처럼 보입니다. 문서 말 : "숫자 문자가 숫자 문자 및 유니 코드 숫자 값의 속성이 모든 문자, 예를 들어 U + 2155, 상 분수 ONE 다섯 번째 포함"

— gwideman

3

이 정규식은 과학 부동 소수점 숫자를 확인합니다.

^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?$

그러나 최선의 방법은 파서를 시도해 보는 것입니다.

— 존 기첸
소스

2

과학적 또는 다른 숫자 표현에 대해 걱정할 필요가없고 마침표가 있거나없는 숫자 일 수있는 문자열로만 작업하는 경우 :

함수

def is_float(s):
    result = False
    if s.count(".") == 1:
        if s.replace(".", "").isdigit():
            result = True
    return result

람다 버전

is_float = lambda x: x.replace('.','',1).isdigit() and "." in x

예

if is_float(some_string):
    some_string = float(some_string)
elif some_string.isdigit():
    some_string = int(some_string)
else:
    print "Does not convert to int or float."

이렇게하면 실수로 정수가되어야하는 것을 실수로 변환하지 않습니다.

— 코데 조이
소스

2

함수의 단순화 된 버전으로 is_digit(str) 대부분의 경우 충분합니다 ( 지수 표기법 및 "NaN" 값은 고려하지 않음 ).

def is_digit(str):
    return str.lstrip('-').replace('.', '').isdigit()

— 심후 마일 코
소스

1

이미 언급 한 함수를 사용했지만 곧 "Nan", "Inf"로 문자열을 인식하고 그 변형이 숫자로 간주됩니다. 따라서 향상된 버전의 함수를 제안합니다. 그러면 해당 유형의 입력에서 false가 반환되고 "1e3"변형이 실패하지 않습니다.

def is_float(text):
    # check for nan/infinity etc.
    if text.isalpha():
        return False
    try:
        float(text)
        return True
    except ValueError:
        return False

— 수학
소스

1

if text.isalpha():수표로 바로 시작할 수 없습니까?

— Csaba Toth

BTW 나는 똑같이 필요합니다 : NaN, Inf 및 물건을 받아들이고 싶지 않습니다

— Csaba Toth

1

float로 변환하십시오. 오류가 있으면 ValueError 예외를 인쇄하십시오.

try:
    x = float('1.23')
    print('val=',x)
    y = float('abc')
    print('val=',y)
except ValueError as err:
    print('floatErr;',err)

산출:

val= 1.23
floatErr: could not convert string to float: 'abc'

— edW
소스

1

사전을 인수로 전달하면 문자열을 부동으로 변환하고 다른 문자열을 떠날 수 있습니다.

def covertDict_float(data):
        for i in data:
            if data[i].split(".")[0].isdigit():
                try:
                    data[i] = float(data[i])
                except:
                    continue
        return data

— 라훌자인
소스

0

비슷한 코드를 찾고 있었지만 try / excepts를 사용하는 것이 가장 좋습니다. 사용중인 코드는 다음과 같습니다. 입력이 유효하지 않은 경우 재시도 기능을 포함합니다. 입력이 0보다 큰지 확인하고 그렇다면 float로 변환하십시오.

def cleanInput(question,retry=False): 
    inputValue = input("\n\nOnly positive numbers can be entered, please re-enter the value.\n\n{}".format(question)) if retry else input(question)
    try:
        if float(inputValue) <= 0 : raise ValueError()
        else : return(float(inputValue))
    except ValueError : return(cleanInput(question,retry=True))


willbefloat = cleanInput("Give me the number: ")

— 로키
소스

0

def try_parse_float(item):
  result = None
  try:
    float(item)
  except:
    pass
  else:
    result = float(item)
  return result

— 타 완다 마 테레 케
소스

2

이 코드가 문제를 해결하는 방법과 이유에 대한 설명 을 포함 하여 문제를 해결할 수는 있지만 게시물의 품질을 향상시키는 데 도움이되며 더 많은 투표를 할 수 있습니다. 지금 질문하는 사람뿐만 아니라 앞으로 독자들에게 질문에 대답하고 있음을 기억하십시오. 제발 편집 설명을 추가하고 제한 및 가정이 적용 무엇의 표시를 제공하는 답변을.

— 2

0

플로트로 변환하는 것에 대한 try 테스트를 사용하여 위의 간단한 옵션 중 일부를 시도했지만 대부분의 답글에 문제가 있음을 발견했습니다.

간단한 시험 (위의 답변을 따라) :

entry = ttk.Entry(self, validate='key')
entry['validatecommand'] = (entry.register(_test_num), '%P')

def _test_num(P):
    try: 
        float(P)
        return True
    except ValueError:
        return False

다음과 같은 경우에 문제가 발생합니다.

음수를 시작하려면 '-'를 입력하십시오.

그런 다음 float('-')실패한 것을 시도하고 있습니다.

숫자를 입력했지만 모든 숫자를 삭제하려고합니다.

그런 다음 float('')마찬가지로 실패 하는 것을 시도하고 있습니다.

내가 가진 빠른 해결책은 다음과 같습니다.

def _test_num(P):
    if P == '' or P == '-': return True
    try: 
        float(P)
        return True
    except ValueError:
        return False

— 리차드
소스

-2

str(strval).isdigit()

간단한 것 같습니다.

문자열 또는 int 또는 float로 저장된 값을 처리합니다.

— 묵스
소스

[2]에서 : '123,123'.isdigit () Out [2] : False

— Daniil Mashkin

1

음수에 대해서는 작동하지 않습니다. 답을 수정하십시오

— RandomEli

'39 .1'.isdigit ()

— Ohad the Lad

str (VAR) .strip ( '-'). replace ( ',', '.'). split ( '.')]에서 x에 대한 all ([x.isdigit ()) x 더 완전한 경우 이행.

— lotrus28