Have you ever found it tedious to read CSV or text files when processing data analysis in Excel?
By using a combination of Python’s pandas and openpyxl, you can write data to Excel in an instant!
This article explains (1) how to read multiple CSV and text files using pandas and (2) how to write data to Excel using a combination of pandas and openpyxl.
Please refer to the following article to learn how to obtain multiple file paths in GUI format using the mouse
All codes for CSV files
A file selection screen will appear if you do the following
Multiple selected CSV files can be written to Excel
If name == 'main':
Change the excel_name one line below to change the name of the Excel file to be saved.
import pandas as pd
from tkinter import filedialog
import openpyxl as opx
def read_filepaths() -> tuple:
typ = [('CSV', '*.csv')]
dir = r'C:'
return filedialog.askopenfilenames(filetypes=typ, initialdir=dir)
def read_csv(paths):
dfs = [pd.read_csv(path, header=0, encoding='utf-8') for path in paths]
return dfs
def write_excel(dfs, excel_name, paths):
wb = opx.Workbook()
wb.save('data/'+excel_name+'.xlsx')
with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
for df, path in zip(dfs, paths):
df.to_excel(writer, index=False, sheet_name=path.split('/')[-1].replace('.csv', ''))
if __name__ == '__main__':
# File name of Excel
excel_name = '230429_person_info'
paths = read_filepaths()
dfs = read_csv(paths)
write_excel(dfs, excel_name, paths)
Prepare CSV text file to be used
First, prepare a CSV or text file to be read.
CSV files are comma-delimited and text files are tab-delimited.
File 1: East Japan Personnel Information
The data include the names and ages of people whose addresses are in eastern Japan.
Name | Age | Address |
---|---|---|
James Anderson | 25 | New York |
Sarah Davis | 32 | Washington |
Ryan Taylor | 18 | Boston |
Laura Miller | 41 | Philadelphia |
CSV File
Name,Age,Address
James Anderson,25,New York
Sarah Davis,32,Washington
Ryan Taylor,18,Boston
Laura Miller,41,Philadelphia
Text File
Name Age Address
James Anderson 25 New York
Sarah Davis 32 Washington
Ryan Taylor 18 Boston
Laura Miller 41 Philadelphia
File2: West Japan Personnel Information
The data include the names and ages of people whose addresses are in western Japan.
Name | Age | Address |
---|---|---|
Tom Smith | 29 | San Francisco |
Emily Johnson | 36 | San Diego |
Jake Williams | 22 | Seattle |
Michael Brown | 45 | Portland |
CSV File
Name,Age,Address
Tom Smith,29,San Francisco
Emily Johnson,36,San Diego
Jake Williams,22,Seattle
Michael Brown,45,Portland
Text File
Name Age Address
Tom Smith 29 San Francisco
Emily Johnson 36 San Diego
Jake Williams 22 Seattle
Michael Brown 45 Portland
Read multiple CSV and text files
pandas can read multiple CSV and text files at high speed
Below is an example code for reading two CSV files
# Path to CSV or text file
paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
# Replace each file with a pandas DataFrame and list
dfs = [pd.read_csv(path, header=0, encoding='utf-8') for path in paths]
# Result output
print(dfs)
Read a single CSV or text file
When reading files in pandas, use read_csv
or read_table
read_csv
: Read comma-delimited filesread_table
: Read tab-delimited files
comma-delimited file
path = 'data/230430_person_info_east.csv'
df_csv = pd.read_csv(path)
print(df_csv)
tab-delimited file
path = 'data/230430_person_info_east.txt'
df_txt = pd.read_table(path)
print(df_txt)
Read multiple files as a list type
Multiple files are read with a list of file paths and a for statement
paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
dfs = []
for path in paths:
dfs.append(pd.read_csv(path))
print(dfs)
When appending to an empty list, it is faster and cleaner to use comprehensions
paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
dfs = [pd.read_csv(path) for path in paths]
Specify the number of lines in the header
You can specify the number of lines for the header as a number.
If there is no header, enter None.
- no header specified : 1st line is header
- header=1 : 2nd line is header
- header=None : Numerical value is assigned to the header
df_csv = pd.read_csv(path, header=1)
print(df_csv)
Resolve UnicodeDecodeError errors
If you get a character error such as UnicodeDecodeError, change the encoding method
- utf-8 : Support all languages. Default
- shift-jis : Japanese. JIS X 0208 character code
- cp932 : Extended version of shift-jis
df_csv = pd.read_csv(path, encoding='utf-8')
print(df_csv)
Change the data delimiter
Data delimiters other than commas and tabs can be handled by specifying sep as an argument
When reading a text file with read_csv
, sep=’\t
path = 'data/230430_person_info_east.txt'
df_txt = pd.read_csv(path, sep='\t')
print(df_txt)
Write data into Excel
Write the list of data frames (dfs) read in the previous chapter into a new Excel file
Create a new Excel sheet for each data frame in the dfs and save it to data_useful-python.xlsx
excel_name = 'data_useful-python'
# Create an Excel file
wb = opx.Workbook()
# Save an Excel file
wb.save('data/'+excel_name+'.xlsx')
# Write pandas data
with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
for df, path in zip(dfs, paths):
df.to_excel(writer, index=False, sheet_name=path.split('/')[-1].replace('.csv', ''))
Create and Save a new Excel file
Create a new Excel file in openpyxl using the following steps
wb = openpyxl.Workbook()
: create a new Excel objectwb.save(path)
: Save the object created in (1) to PC
excel_name = 'data_useful-python'
# Create an Excel file
wb = opx.Workbook()
# Save an Excel file
wb.save('data/'+excel_name+'.xlsx')
Write pandas data frames
With pandas.to_excel
you can write to Excel in one line!
Simply specify the path where you want to save the file
df.to_excel('data/'+excel_name+'.xlsx')
Write multiple data frames
To save multiple data frames in pandas to the same Excel file, use ExcelWriter
The argument mode=a
is used to overwrite the file
.
path
: Path to the destination Excel filemode
: w (new) or a (overwrite)engine
: Path to the destination Excel file if saving to an existing Excel file openpyxl
with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
for df in dfs:
df.to_excel(writer)
Specify the name of the sheet to save to
Use sheet_name as the argument for the name of the Excel sheet to save to
In this article, the CSV file name is used as the sheet name
path = 'data/230430_person_info_east.csv'
df.to_excel('data/'+excel_name+'.xlsx', sheet_name=path.split('/')[-1].replace('.csv', ''))
Do not write the index of the data frame
You can specify whether to write the index (leftmost column of the table) of the pandas data frame
Enter True or False for the index argument
df.to_excel('data/'+excel_name+'.xlsx', index=False)
References
pandas.read_csv
pandas.read_csv
pandas.DataFrame.to_excel
pandas.ExcelWriter
Comments