Files
Pandas can import data from local computer or from an URL into a DataFrame object. It can read various formats like CSV, HTML, JSON, etc. It can also export a DataFrame object to CSV or other formats.
File formats that can be read in/write to
# File formats that can be read in/write to
# Format Type Data Description Reader Writer
# text CSV read_csv to_csv
# text JSON read_json to_json
# text HTML read_html to_html
# text Local clipboard read_clipboard to_clipboard
# binary MS Excel read_excel to_excel
# binary HDF5 Format read_hdf to_hdf
# binary Feather Format read_feather to_feather
# binary Msgpack read_msgpack to_msgpack
# binary Stata read_stata to_stata
# binary SAS read_sas
# binary Python Pickle Format read_pickle to_pickle
# binary Parquet Form read_parquet to_parquet
# SQL SQL read_sql to_sql
# SQL Google Big Query read_gbq to_gbq
Read csv file
df = pd.read_csv('local_path/file.csv')
Read csv file with | as fields separator
df = pd.read_csv('local_path/file.csv', sep='|')
Read only 10 rows of csv file
pd.read_csv('local_path/file.csv', nrows=10)
Read csv file and skip first 3 rows
df = pd.read_csv('local_path/file.csv', skiprows=3)
Read csv file and parse/treat columns as dates
df = pd.read_csv('local_path/file.csv',
parse_dates=['column_x_date', 'column_y_date'])
Read csv file while specifying column names
df = pd.read_csv('local_path/file.csv', names=['column_1', 'column_2'])
Read csv file and set the index as column_id
df = pd.read_csv('local_path/file.csv', index_col='column_id')
Read csv file and set the index as multi columns
df = pd.read_csv('local_path/file.csv', index_col=['column_id1', 'column_id2'])
Read csv file and set “.” as missing values
df = pd.read_csv('local_path/file.csv', na_values=['.'])
Read csv file and set different missing values on columns
# `column_x` missing values to be set as `nope`
# `column_y` missing values to be set as `.`
missings = {'column_x': ['nope'], 'column_y': ['.']}
df = pd.read_csv('local_path/file.csv', na_values=missings)
Read csv file and prevent pandas from interpreting “NA” as NaN in a string
# The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND',
# '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '#NA', 'NULL',
# 'null', 'NaN', '-NaN', 'nan', '-nan', ''].
df = pd.read_csv('local_path/file.csv', keep_default_na=False)
Read csv file and ignore missing value markers
# improve performance of reading a large file in data without any NAs
df = pd.read_csv('local_path/file.csv', header=0, na_filter=False)
Read csv file and change the data type of a column
pd.read_csv('local_path/file.csv', dtype={'column_x': float})
Write csv file
df.to_csv('local_path/file.csv')
Write csv file with | as fields separator
df = df.to_csv('local_path/file.csv', sep='|')
Write csv file but not save the DataFrame index
df.to_csv('local_path/file.csv', index=False)
Write csv file but not save the DataFrame header
df.to_csv('local_path/file.csv', header=False)
Write csv file but only specific columns
df.to_csv('local_path/file.csv', columns=['column_x', 'column_y'])
Write csv file; don’t use quotes and use backslash as escape character
df.to_csv('local_path/file.csv', quoting=csv.QUOTE_NONE, escapechar='\\')
Append DataFrame to csv file
df.to_csv('local_path/file.csv', mode='a')