프로그램 작성 pandas

7

다음과 같은 df가 있습니다. frame = pd.DataFrame({'a' : ['a,b,c', 'a,c,f', 'b,d,f','a,z,c']}) 그리고 아이템 목록 : letters = ['a','c'] 내 목표는 frame적어도 두 요소를 포함하는 모든 행을 얻는 것 입니다.letters 이 솔루션을 생각해 냈습니다. for i in letters: subframe = frame[frame['a'].str.contains(i)] 이것은 내가 원하는 것을 제공하지만 확장 성 측면에서 가장 좋은 …

20 python pandas

5

각 변수에 대해 스택 및 반환 값 카운트?

객관식 질문을 통해 19717 명이 선택한 프로그래밍 언어의 응답을 기록하는 데이터 프레임이 있습니다. 첫 번째 열은 물론 응답자의 성별이며 나머지는 그들이 선택한 선택입니다. 따라서 Python을 선택하면 내 응답이 Python 열에 기록되고 bash가 아니며 그 반대도 마찬가지입니다. ID Gender Python Bash R JavaScript C++ 0 Male Python nan nan JavaScript nan …

19 python pandas dataframe

4

팬더 데이터 프레임의 데이터를 사용하여 열을 일치시킵니다.

나는 두가 pandas데이터 프레임을, a그리고 b: a1 a2 a3 a4 a5 a6 a7 1 3 4 5 3 4 5 0 2 0 3 0 2 1 2 5 6 5 2 1 2 과 b1 b2 b3 b4 b5 b6 b7 3 5 4 5 1 4 3 0 …

18 python python-3.x pandas

3

팬더 데이터 프레임을 계층 사전으로 변환하는 방법

다음과 같은 팬더 데이터 프레임이 있습니다. df1 = pd.DataFrame({'date': [200101,200101,200101,200101,200102,200102,200102,200102],'blockcount': [1,1,2,2,1,1,2,2],'reactiontime': [350,400,200,250,100,300,450,400]}) 포함 된 사전의 값을 목록으로 사용하여 계층 적 사전을 만들려고합니다. {200101: {1:[350, 400], 2:[200, 250]}, 200102: {1:[100, 300], 2:[450, 400]}} 어떻게해야합니까? 가장 가까운 것은이 코드를 사용하는 것입니다. df1.set_index('date').groupby(level='date').apply(lambda x: x.set_index('blockcount').squeeze().to_dict()).to_dict() 다음을 반환합니다 : {200101: {1: 400, 2: 250}, …

16 python pandas

4

두 열의 목록을 행별로 효율적으로 비교

다음과 같이 Pandas DataFrame이있는 경우 : import pandas as pd import numpy as np df = pd.DataFrame({'today': [['a', 'b', 'c'], ['a', 'b'], ['b']], 'yesterday': [['a', 'b'], ['a'], ['a']]}) today yesterday 0 ['a', 'b', 'c'] ['a', 'b'] 1 ['a', 'b'] ['a'] 2 ['b'] ['a'] ... etc 그러나 약 10 만 개의 …

16 python pandas numpy dataframe

5

행을 인덱싱하고 삽입하는 동안 팬더 데이터 프레임의 강제 방지

팬더 데이터 프레임의 개별 행을 사용하고 있지만 행을 인덱싱하고 삽입하는 동안 강제 문제로 인해 어려움을 겪고 있습니다. 팬더는 항상 혼합 int / float에서 all-float 유형으로 강제 변환하려고하는 것처럼 보이며이 동작에 대한 명확한 제어 기능을 볼 수 없습니다. 예를 들어 다음은 aas int및 bas를 사용하는 간단한 데이터 프레임입니다 float. import pandas …

16 python pandas coercion

6

Numpy에서 대칭 쌍을 빠르게 찾기

from itertools import product import pandas as pd df = pd.DataFrame.from_records(product(range(10), range(10))) df = df.sample(90) df.columns = "c1 c2".split() df = df.sort_values(df.columns.tolist()).reset_index(drop=True) # c1 c2 # 0 0 0 # 1 0 1 # 2 0 2 # 3 0 3 # 4 0 4 # .. .. .. # …

15 python pandas numpy

3

팬더에서 두 개의 지리 데이터 프레임으로 가장 가까운 거리를 얻으십시오.

여기 내 첫 번째 지오 데이트 프레임이 있습니다. !pip install geopandas import pandas as pd import geopandas city1 = [{'City':"Buenos Aires","Country":"Argentina","Latitude":-34.58,"Longitude":-58.66}, {'City':"Brasilia","Country":"Brazil","Latitude":-15.78 ,"Longitude":-70.66}, {'City':"Santiago","Country":"Chile ","Latitude":-33.45 ,"Longitude":-70.66 }] city2 = [{'City':"Bogota","Country":"Colombia ","Latitude":4.60 ,"Longitude":-74.08}, {'City':"Caracas","Country":"Venezuela","Latitude":10.48 ,"Longitude":-66.86}] city1df = pd.DataFrame(city1) city2df = pd.DataFrame(city2) gcity1df = geopandas.GeoDataFrame( city1df, geometry=geopandas.points_from_xy(city1df.Longitude, city1df.Latitude)) gcity2df = geopandas.GeoDataFrame( city2df, …

14 python pandas dataframe geolocation geopandas

4

팬더 데이터 프레임에서 유사한 값의 백분율 계산

df스크립트 (텍스트 포함) 및 스피커 : 두 개의 열 이있는 하나의 데이터 프레임이 있습니다. Script Speaker aze Speaker 1 art Speaker 2 ghb Speaker 3 jka Speaker 1 tyc Speaker 1 avv Speaker 2 bhj Speaker 1 그리고 나는 다음 목록을 가지고 있습니다 : L = ['a','b','c'] 다음 코드를 사용하면 …

14 python python-3.x pandas dataframe

2

날짜 시간이 어떤 열인지 추론

많은 열이있는 거대한 데이터 프레임이 있으며 그중 많은 유형이 datetime.datetime있습니다. 문제는 많은 사람들이 예를 들어 datetime.datetime값과 None값 (및 잠재적으로 다른 유효하지 않은 값)을 포함한 혼합 유형을 가지고 있다는 것입니다 . 0 2017-07-06 00:00:00 1 2018-02-27 21:30:05 2 2017-04-12 00:00:00 3 2017-05-21 22:05:00 4 2018-01-22 00:00:00 ... 352867 2019-10-04 00:00:00 352868 …

14 python pandas

3

[:] 대 iloc [:]로 할당하면 판다에서 다른 결과가 나오는 이유는 무엇입니까?

iloc팬더에서 사용하는 다른 인덱싱 방법과 혼동됩니다 . 1-d 데이터 프레임을 2-d 데이터 프레임으로 변환하려고한다고 가정 해 봅시다. 먼저 다음 1-d 데이터 프레임이 있습니다. a_array = [1,2,3,4,5,6,7,8] a_df = pd.DataFrame(a_array).T 그리고 크기를 2 차원 데이터 프레임으로 변환하려고합니다 2x4. 다음과 같이 2 차원 데이터 프레임을 사전 설정하여 시작합니다. b_df = pd.DataFrame(columns=range(4),index=range(2)) 그런 …

13 python pandas dataframe

1

팬더의 to_excel 함수는 예기치 않은 TypeError를 생성합니다.

팬더 데이터 프레임의 사전을 만들었습니다. d[k] = pd.DataFrame(data=data[i]) 그래서 d[k]올바른 팬더 데이터 프레임 이라고 가정합니다 . 그때 for k in d.keys(): d[k].to_excel (file_name) 그런 다음 오류가 있습니다. TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element Python 3.7, pandas 0.25.3을 사용하고 있습니다. 업데이트 : 나는 대체하는 …

13 python pandas export-to-excel

3

느린 팬더 DataFrame MultiIndex 재색 인

팬더 DataFrame 형식이 있습니다. id start_time sequence_no value 0 71 2018-10-17 20:12:43+00:00 114428 3 1 71 2018-10-17 20:12:43+00:00 114429 3 2 71 2018-10-17 20:12:43+00:00 114431 79 3 71 2019-11-06 00:51:14+00:00 216009 100 4 71 2019-11-06 00:51:14+00:00 216011 150 5 71 2019-11-06 00:51:14+00:00 216013 180 6 92 2019-12-01 00:51:14+00:00 114430 19 …

13 python pandas numpy dataframe

2

필터링 된 이진 카티 전 곱 생성

문제 설명 특정 배타적 조건으로 필터링 된 전체 이진 카티 전 곱 (특정 수의 열이있는 True 및 False의 모든 조합이있는 테이블)을 생성하는 효율적인 방법을 찾고 있습니다. 예를 들어, 세 개의 열 / 비트의 n=3경우 전체 테이블을 얻습니다. df_combs = pd.DataFrame(itertools.product(*([[True, False]] * n))) 0 1 2 0 True True True …

12 python pandas dataframe

1

pandasUDF와 pyarrow 0.15.0

최근 pysparkEMR 클러스터에서 실행 되는 여러 작업에서 많은 오류가 발생하기 시작했습니다 . 침식은 java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543) at org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58) at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:132) at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:181) at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:172) at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:65) at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:162) at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.sql.execution.python.ArrowEvalPythonExec$$anon$2.<init>(ArrowEvalPythonExec.scala:98) at org.apache.spark.sql.execution.python.ArrowEvalPythonExec.evaluate(ArrowEvalPythonExec.scala:96) at org.apache.spark.sql.execution.python.EvalPythonExec$$anonfun$doExecute$1.apply(EvalPythonExec.scala:127)... 그것들은 모두 apply판다 시리즈의 기능 에서 일어나는 것처럼 …

12 pandas apache-spark pyspark pyarrow

«pandas» 태그된 질문