Pandas / NumPy에서 열 / 변수가 숫자인지 여부를 확인하는 방법은 무엇입니까?


91

인지 여부를 변수를 결정하는 더 좋은 방법이 있나요 Pandas및 / 또는 NumPy입니다 numeric여부는?

나는 정의 자체가 dictionary가진 dtypes키와 같은 numeric/ not값으로한다.


16
당신은 확인할 수 있습니다 dtype.kind in 'biufc'.
Jaime

1
제이미에 의해 게시이 하나 위의 의견은 ... 덕분에 아래의 것보다 간단이고 완벽하게 일 것으로 보인다
hfrog713

답변:


102

에서 pandas 0.20.2당신이 할 수 있습니다 :

import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

나는 이것이 더 우아한 해결책이라고 말할 것입니다. 감사합니다
as-if

85

np.issubdtypedtype이의 하위 dtype인지 확인 하는 데 사용할 수 있습니다 np.number. 예 :

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series

이것은 numpy의 dtypes에서 작동하지만 Thomas가 지적한 것처럼 pd.Categorical과 같은 팬더 특정 유형에서는 실패합니다 . is_numeric_dtypepandas의 categoricals 함수를 사용 하는 경우 np.issubdtype보다 나은 대안입니다.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                   'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
df
Out: 
   A    B   C  D
0  1  1.0  1j  a
1  2  2.0  2j  b
2  3  3.0  3j  c

df.dtypes
Out: 
A         int64
B       float64
C    complex128
D        object
dtype: object

np.issubdtype(df['A'].dtype, np.number)
Out: True

np.issubdtype(df['B'].dtype, np.number)
Out: True

np.issubdtype(df['C'].dtype, np.number)
Out: True

np.issubdtype(df['D'].dtype, np.number)
Out: False

For multiple columns you can use np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
is_number(df.dtypes)
Out: array([ True,  True,  True, False], dtype=bool)

And for selection, pandas now has select_dtypes:

df.select_dtypes(include=[np.number])
Out: 
   A    B   C
0  1  1.0  1j
1  2  2.0  2j
2  3  3.0  3j

1
This does not seem to work reliably with pandas DataFrames, since those might return categories unknown to numpy like "category". Numpy then throws "TypeError: data type not understood"
Thomas

23

Based on @jaime's answer in the comments, you need to check .dtype.kind for the column of interest. For example;

>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False

NB The meaning of biufc: b bool, i int (signed), u unsigned int, f float, c complex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind


3
Here is the list of all dtype kinds [1]. Lowercase u is for unsigned integer; uppercase U is for unicode. [1]: docs.scipy.org/doc/numpy/reference/generated/…
cbarrick

7

Pandas has select_dtype function. You can easily filter your columns on int64, and float64 like this:

df.select_dtypes(include=['int64','float64'])

4

This is a pseudo-internal method to return only the numeric type data

In [27]: df = DataFrame(dict(A = np.arange(3), 
                             B = np.random.randn(3), 
                             C = ['foo','bar','bah'], 
                             D = Timestamp('20130101')))

In [28]: df
Out[28]: 
   A         B    C                   D
0  0 -0.667672  foo 2013-01-01 00:00:00
1  1  0.811300  bar 2013-01-01 00:00:00
2  2  2.020402  bah 2013-01-01 00:00:00

In [29]: df.dtypes
Out[29]: 
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [30]: df._get_numeric_data()
Out[30]: 
   A         B
0  0 -0.667672
1  1  0.811300
2  2  2.020402

Yes, I was trying to figure how do they do that. One would expect an internal IsNumeric function ran per column... but still didn't find it in the code
user2808117

You can apply this per column, but much easier just to check the dtype. in any event pandas operations exclude non-numeric when needed. what are you trying to do?
Jeff

4

How about just checking type for one of the values in the column? We've always had something like this:

isinstance(x, (int, long, float, complex))

When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:

df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
    df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
df.dtypes

time     datetime64[ns]
test1            object
test2            object
dtype: object

When I do the following, it seems to give me accurate result:

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))

returns

True

1

You can also try:

df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

It returns a list of booleans: True if numeric, False if not.


1

Just to add to all other answers, one can also use df.info() to get whats the data type of each column.


1

You can check whether a given column contains numeric values or not using dtypes

numerical_features = [feature for feature in train_df.columns if train_df[feature].dtypes != 'O']

Note: "O" should be capital

당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.