Pandas를 사용하여 기존 Excel 파일에 새 시트를 저장하는 방법은 무엇입니까?

Question 1

파이썬으로 정교한 데이터를 저장하기 위해 엑셀 파일을 사용하고 싶습니다. 내 문제는 기존 Excel 파일에 시트를 추가 할 수 없다는 것입니다. 이 문제에 도달하기 위해 작업 할 샘플 코드를 제안합니다.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

이 코드는 각각 "x1"및 "x2"라는 두 개의 시트에 두 개의 DataFrame을 저장합니다. 두 개의 새 DataFrame을 만들고 동일한 코드를 사용하여 두 개의 새 시트 'x3'및 'x4'를 추가하려고하면 원래 데이터가 손실됩니다.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

'x1', 'x2', 'x3', 'x4'의 네 장이있는 엑셀 파일을 원합니다. 나는 'xlsxwriter'가 유일한 "엔진"이 아니라 'openpyxl'이라는 것을 알고 있습니다. 나는 또한 이미이 문제에 대해 글을 쓴 다른 사람들이 있다는 것을 보았지만 여전히 어떻게해야할지 이해할 수 없습니다.

여기이 링크 에서 가져온 코드

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

그들은 그것이 효과가 있다고 말하지만 방법을 알아내는 것은 어렵습니다. 이 컨텍스트에서 "ws.title", "ws"및 "dict"가 무엇인지 이해하지 못합니다.

"x1"및 "x2"를 저장 한 다음 파일을 닫고 다시 열고 "x3"및 "x4"를 추가하는 가장 좋은 방법은 무엇입니까?

Question 2

감사합니다. 나는 완전한 예가 같은 문제를 가진 다른 사람들에게 좋을 수 있다고 믿습니다.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

여기에서 Excel 파일을 생성합니다. "xslxwriter"엔진을 통해 생성되었는지 "openpyxl"엔진을 통해 생성되었는지는 실제로 중요하지 않습니다.

원본 데이터를 잃지 않고 쓰고 싶을 때

import pandas as pd
import numpy as np
from openpyxl import load_workbook

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

이 코드는 작업을 수행합니다!

Question 3

공유 한 예제에서 기존 파일을로로드 book하고 writer.book값을 book. 줄 writer.sheets = dict((ws.title, ws) for ws in book.worksheets)에서 통합 문서의 각 시트에 ws. 그러면 시트 제목이 표시 ws되므로 {sheet_titles: sheet}키, 값 쌍 의 사전을 만듭니다 . 이 사전은 writer.sheets로 설정됩니다. 기본적으로 이러한 단계는 기존 데이터를'Masterfile.xlsx' 작성자를 채우는 것입니다.

이제 당신은 이미있는 파일이 있다고 가정하자 x1및 x2시트로. 예제 코드를 사용하여 파일을로드 한 다음 이와 같은 작업을 수행하여 x3및 x4.

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()

그것은 당신이 찾고있는 것을해야합니다.

Question 4

한 번에 여러 데이터를 작성하는 간단한 예입니다. 또한 작성된 엑셀 파일 (닫힌 엑셀 파일)의 시트에 데이터를 추가하려는 경우에도 마찬가지입니다.

엑셀에 처음 쓰는 경우. ( "1st_sheet"및 "2nd_sheet"에 "df1"및 "df2"쓰기)

import pandas as pd 
from openpyxl import load_workbook

df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])

excel_dir = "my/excel/dir"

with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:    
    df1.to_excel(writer, '1st_sheet')   
    df2.to_excel(writer, '2nd_sheet')   
    writer.save()

Excel을 닫은 후 동일한 Excel 파일에 다른 시트에 데이터를 "추가"하려는 경우 시트 이름 "3rd_sheet"에 "df3"을 입력합니다.

book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)    

    ## Your dataframe to append. 
    df3.to_excel(writer, '3rd_sheet')  

    writer.save()

Excel 형식은 xls가 아니어야하며 xlsx 1을 사용할 수 있습니다.

Question 5

이제 Pandas DataFrames를 지원하므로 openpyxl로 직접 작업하는 것이 좋습니다. .

이를 통해 관련 Excel 및 Pandas 코드에 집중할 수 있습니다.

Question 6

새 파일 생성 용

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)
with pd.ExcelWriter('sample.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='x1')

파일에 추가하려면에서 인수 mode='a'를 사용하십시오 pd.ExcelWriter.

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)
with pd.ExcelWriter('sample.xlsx', engine='openpyxl', mode='a') as writer:  
    df2.to_excel(writer, sheet_name='x2')

기본값은 mode ='w'입니다. 설명서를 참조하십시오 .

Question 7

ExcelWriter를 사용하지 않고 openpyxl의 도구를 사용하여 수행 할 수 있습니다. 이렇게하면 새 시트에 글꼴을 훨씬 쉽게 추가 할 수 있습니다. openpyxl.styles

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'

#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'

data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}

#The dataframe you want to add
df = pd.DataFrame(data)

#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")

#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
    sheet.append(row)

#Save the modified excel at desired location    
book.save(writeLocation)

Question 8

관심있는 기존 시트 (예 : 'x1', 'x2')를 메모리에 읽고 새 시트를 추가하기 전에 다시 '쓰기'할 수 있습니다 (파일의 시트와 메모리의 시트는 서로 다릅니다. 읽지 않으면 잃어 버릴 것입니다). 이 접근 방식은 'xlsxwriter'만 사용하고 openpyxl은 사용하지 않습니다.

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>

# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()

기존 시트를 모두 보존하려면 시작과 끝 사이의 위 코드를 다음으로 바꿀 수 있습니다.

# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
    df = xlsx.parse(sheet_name=sheet, index_col=0)
    df.to_excel(writer, sheet_name=sheet)

Question 9

#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook

df = pd.read_excel("urlsearch_test.xlsx")

#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\

l = [] #To make a list in for loop

#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
    try: 
        str = (row['TEXT']) #string to read and iterate
        y = (index)
        str_pos = str.index('http') #fetched the index position for http
        str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
        str_op = str[str_pos:str_pos1] #Substring the domain name
        l.append(str_op) #append the list with domain names

    #Error handling to skip the error rows and continue.
    except ValueError:
            print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end

#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()

#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

Question 10

이것에 대해 매우 간단한 또 다른 방법은 다음과 같은 방법을 만드는 것입니다.

def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
    book = None
    try:
        book = load_workbook(path_to_file)
    except Exception:
        logging.debug('Creating new workbook at %s', path_to_file)
    with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
        if book is not None:
            writer.book = book
        data_frame.to_excel(writer, sheet_name, index=False)

여기서 아이디어는 통합 문서가있는 경우 path_to_file 에로드 한 다음 data_frame을 sheet_name 으로 새 시트로 추가하는 것 입니다. 통합 문서가 없으면 만들어집니다. openpyxl 또는 xlsxwriter가 추가 되지 않는 것 같습니다. 위의 @Stefano의 예에서와 같이 실제로로드 한 다음 다시 작성하여 추가해야합니다.