Pandas | Write multiple CSV and text files to Excel

Have you ever found it tedious to read CSV or text files when processing data analysis in Excel?

By using a combination of Python’s pandas and openpyxl, you can write data to Excel in an instant!

This article explains (1) how to read multiple CSV and text files using pandas and (2) how to write data to Excel using a combination of pandas and openpyxl.

Please refer to the following article to learn how to obtain multiple file paths in GUI format using the mouse

Table of Contents

All codes for CSV files

A file selection screen will appear if you do the following

Multiple selected CSV files can be written to Excel

If name == 'main': Change the excel_name one line below to change the name of the Excel file to be saved.

import pandas as pd
from tkinter import filedialog
import openpyxl as opx


def read_filepaths() -> tuple:
    typ = [('CSV', '*.csv')]
    dir = r'C:'
    return filedialog.askopenfilenames(filetypes=typ, initialdir=dir)

def read_csv(paths):
    dfs = [pd.read_csv(path, header=0, encoding='utf-8') for path in paths]
    return dfs

def write_excel(dfs, excel_name, paths):
    wb = opx.Workbook()
    wb.save('data/'+excel_name+'.xlsx')
    with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
        for df, path in zip(dfs, paths):
            df.to_excel(writer, index=False, sheet_name=path.split('/')[-1].replace('.csv', ''))

if __name__ == '__main__':
    # File name of Excel
    excel_name = '230429_person_info'
    paths = read_filepaths()
    dfs = read_csv(paths)
    write_excel(dfs, excel_name, paths)

Prepare CSV text file to be used

First, prepare a CSV or text file to be read.

CSV files are comma-delimited and text files are tab-delimited.

File 1: East Japan Personnel Information

The data include the names and ages of people whose addresses are in eastern Japan.

NameAgeAddress
James Anderson25New York
Sarah Davis32Washington
Ryan Taylor18Boston
Laura Miller41Philadelphia

CSV File

Name,Age,Address
James Anderson,25,New York
Sarah Davis,32,Washington
Ryan Taylor,18,Boston
Laura Miller,41,Philadelphia

Text File

Name    Age    Address
James Anderson    25    New York
Sarah Davis    32    Washington
Ryan Taylor    18    Boston
Laura Miller    41    Philadelphia

File2: West Japan Personnel Information

The data include the names and ages of people whose addresses are in western Japan.

NameAgeAddress
Tom Smith29San Francisco
Emily Johnson36San Diego
Jake Williams22Seattle
Michael Brown45Portland

CSV File

Name,Age,Address
Tom Smith,29,San Francisco
Emily Johnson,36,San Diego
Jake Williams,22,Seattle
Michael Brown,45,Portland

Text File

Name    Age    Address
Tom Smith    29    San Francisco
Emily Johnson    36    San Diego
Jake Williams    22    Seattle
Michael Brown    45    Portland

Read multiple CSV and text files

pandas can read multiple CSV and text files at high speed

Below is an example code for reading two CSV files

# Path to CSV or text file
paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
# Replace each file with a pandas DataFrame and list
dfs = [pd.read_csv(path, header=0, encoding='utf-8') for path in paths]
# Result output
print(dfs)

Read a single CSV or text file

When reading files in pandas, use read_csv or read_table

Read files in pandas
  • read_csv : Read comma-delimited files
  • read_table : Read tab-delimited files

comma-delimited file

path = 'data/230430_person_info_east.csv'
df_csv = pd.read_csv(path)
print(df_csv)

tab-delimited file

path = 'data/230430_person_info_east.txt'
df_txt = pd.read_table(path)
print(df_txt)

Read multiple files as a list type

Multiple files are read with a list of file paths and a for statement

paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
dfs = []
for path in paths:
    dfs.append(pd.read_csv(path))
print(dfs)

When appending to an empty list, it is faster and cleaner to use comprehensions

paths = ['data/230430_person_info_east.csv', 'data/230430_person_info_west.csv']
dfs = [pd.read_csv(path) for path in paths]

Specify the number of lines in the header

You can specify the number of lines for the header as a number.

If there is no header, enter None.

Header Designation
  • no header specified : 1st line is header
  • header=1 : 2nd line is header
  • header=None : Numerical value is assigned to the header
df_csv = pd.read_csv(path, header=1)
print(df_csv)

Resolve UnicodeDecodeError errors

If you get a character error such as UnicodeDecodeError, change the encoding method

Typical encoding methods
  • utf-8 : Support all languages. Default
  • shift-jis : Japanese. JIS X 0208 character code
  • cp932 : Extended version of shift-jis
df_csv = pd.read_csv(path, encoding='utf-8')
print(df_csv)

If you still get errors, you can use encoding_errors=’ignore’.

Change the data delimiter

Data delimiters other than commas and tabs can be handled by specifying sep as an argument

When reading a text file with read_csv, sep=’\t

path = 'data/230430_person_info_east.txt'
df_txt = pd.read_csv(path, sep='\t')
print(df_txt)

Write data into Excel

Write the list of data frames (dfs) read in the previous chapter into a new Excel file

Create a new Excel sheet for each data frame in the dfs and save it to data_useful-python.xlsx

excel_name = 'data_useful-python'
# Create an Excel file
wb = opx.Workbook()
# Save an Excel file
wb.save('data/'+excel_name+'.xlsx')
# Write pandas data
with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
    for df, path in zip(dfs, paths):
        df.to_excel(writer, index=False, sheet_name=path.split('/')[-1].replace('.csv', ''))

Create and Save a new Excel file

Create a new Excel file in openpyxl using the following steps

Steps to create a new Excel file
  • wb = openpyxl.Workbook() : create a new Excel object
  • wb.save(path) : Save the object created in (1) to PC
excel_name = 'data_useful-python'
# Create an Excel file
wb = opx.Workbook()
# Save an Excel file
wb.save('data/'+excel_name+'.xlsx')

Note that without wb.save(), the file will not be saved to the PC all the time!

Write pandas data frames

With pandas.to_excel you can write to Excel in one line!

Simply specify the path where you want to save the file

df.to_excel('data/'+excel_name+'.xlsx')

Write multiple data frames

To save multiple data frames in pandas to the same Excel file, use ExcelWriter

The argument mode=a is used to overwrite the file.

How to use ExcelWriter
  • path : Path to the destination Excel file
  • mode : w (new) or a (overwrite)
  • engine: Path to the destination Excel file if saving to an existing Excel file openpyxl
with pd.ExcelWriter('data/'+excel_name+'.xlsx', engine='openpyxl', mode='a') as writer:
    for df in dfs:
        df.to_excel(writer)

Specify the name of the sheet to save to

Use sheet_name as the argument for the name of the Excel sheet to save to

In this article, the CSV file name is used as the sheet name

path = 'data/230430_person_info_east.csv'
df.to_excel('data/'+excel_name+'.xlsx', sheet_name=path.split('/')[-1].replace('.csv', ''))

Do not write the index of the data frame

You can specify whether to write the index (leftmost column of the table) of the pandas data frame

Enter True or False for the index argument

df.to_excel('data/'+excel_name+'.xlsx', index=False)

References

pandas.read_csv

pandas.read_csv

pandas.DataFrame.to_excel

pandas.ExcelWriter

I hope you will share it with me!
  • URLをコピーしました!

Comments

コメントする

Table of Contents