Numpy isnan ()이 float 배열에서 실패합니다 (pandas 데이터 프레임 적용).

104

팬더 데이터 프레임에서 적용되는 플로트 배열 (일부 일반 숫자, 일부 nans)이 있습니다.

어떤 이유로 numpy.isnan이이 배열에서 실패하지만 아래에 표시된 것처럼 각 요소는 부동 소수점이고 numpy.isnan은 각 요소에서 올바르게 실행되며 변수 유형은 확실히 numpy 배열입니다.

무슨 일이야?!

set([type(x) for x in tester])
Out[59]: {float}

tester
Out[60]: 
array([-0.7000000000000001, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
   nan, nan], dtype=object)

set([type(x) for x in tester])
Out[61]: {float}

np.isnan(tester)
Traceback (most recent call last):

File "<ipython-input-62-e3638605b43c>", line 1, in <module>
np.isnan(tester)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

set([np.isnan(x) for x in tester])
Out[65]: {False, True}

type(tester)
Out[66]: numpy.ndarray

— 팀 654321
소스

166

np.isnan 네이티브 dtype (예 : np.float64)의 NumPy 배열에 적용 할 수 있습니다.

In [99]: np.isnan(np.array([np.nan, 0], dtype=np.float64))
Out[99]: array([ True, False], dtype=bool)

그러나 객체 배열에 적용될 때 TypeError를 발생시킵니다.

In [96]: np.isnan(np.array([np.nan, 0], dtype=object))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Pandas가 있으므로 pd.isnull대신 사용할 수 있습니다. 객체 또는 기본 dtype의 NumPy 배열을 허용 할 수 있습니다.

In [97]: pd.isnull(np.array([np.nan, 0], dtype=float))
Out[97]: array([ True, False], dtype=bool)

In [98]: pd.isnull(np.array([np.nan, 0], dtype=object))
Out[98]: array([ True, False], dtype=bool)

참고 None또한 객체 배열에 널 (null) 값으로 간주됩니다.

— Unutbu
소스

3

감사합니다-pd.isnull ()을 사용했습니다. 성능에 영향을 미치지 않는 것으로 보입니다.

— tim654321

12

np.isnan () 및 pd.isnull ()의 훌륭한 대체품은 다음과 같습니다.

for i in range(0,a.shape[0]):
    if(a[i]!=a[i]):
       //do something here
       //a[i] is nan

nan만이 자신과 같지 않기 때문입니다.

— Statham
소스

잘 알려진 "ValueError : Truth value of a xxx is ambiguous"가 발생하기 때문에 배열에서 작동하지 않을 수 있습니다.

— MSeifert

@MSeifert 당신은 파이썬 에 대해 이야기하고 있습니까? 이 방법을 사용하여 기계 학습에서 무언가를 수행합니다. 잘 알려진 오류가 발생하지 않은 이유는 무엇입니까?

— Statham

예, 이전에 numpy 또는 pandas를 사용하지 않은 것 같습니다. import numpy as np; a = np.array([1,2,3, np.nan])코드를 사용 하고 실행하십시오.

— MSeifert

@MSeifert er, 나는 numpy를 처음 사용했지만 코드가 정상적으로 실행되었으며 오류가 발생하지 않았습니다.

— Statham

In [1] : import numpy as np In [2] : a = np.array ([1,2,3, np.nan]) In [3] : print a [1. 2. 3. nan] In [ 4] : print a [3] == a [3] False

— Statham

10

@unutbu 답변 위에 pandas numpy 객체 배열을 네이티브 (float64) 유형으로 강제 할 수 있습니다.

import pandas as pd
pd.to_numeric(df['tester'], errors='coerce')

숫자 값으로 구문 분석 할 수없는 문자열이 NaN이되도록하려면 errors = 'coerce'를 지정하십시오. 열 유형은 dtype: float64이며 isnan확인이 작동해야합니다.

— Severin Pappadeux
소스

그의 이름은 다음과 같습니다 unutbu.)

— Dr_Zaszuś

@ Dr_Zaszuś 감사합니다, 수정되었습니다

— Severin Pappadeux

0

Pandas를 사용하여 csv 파일을 가져와야합니다.

import pandas as pd

condition = pd.isnull(data[i][j])

— 다리 스완 얀 웨리 P.
소스