Examine Data

Common operations to explore pandas DataFrames are: find number of rows and columns, display top and tail rows, show columns types, sort values.

Show the first 30 and last 30 rows

df

Show info on index, data types, memory usage

df.info()

Show type of df object

type(df)
# >>> <class 'pandas.core.frame.DataFrame'>

Show the first 5 rows

df.head()

Show the first 10 rows

df.head(10)

Show the last 5 rows

df.tail()

Show “the index” (aka “the labels”)

df.index
type(df.index)
# >>> <class 'pandas.core.indexes.base.Index'>

Show the column names

df.columns
type(df.columns)
# >>> <class 'pandas.core.indexes.base.Index'>

Show data types of each column

df.dtypes

Show number of rows and columns

df.shape

Show number of rows only

df.shape[0]

Show number of columns only

df.shape[1]

Get DataFrame values as numpy array

df.values

Show info on rows and columns indexes

df.axes

Get a concise summary of a DataFrame

import pandas as pd
data = [['Alex', 10], ['Bob', 12], ['Clarke', 13]]
df = pd.DataFrame(data, columns=['Name', 'Age'], index=['a', 'b', 'c'])

>>> df.info()
#<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, a to c
Data columns (total 2 columns):
Name    3 non-null object
Age     3 non-null int64
dtypes: int64(1), object(1)
memory usage: 72.0+ bytes

Get memory usage by column

df.memory_usage()

Set maximum number of rows and columns printed to unlimited

# default is 60 rows
pd.set_option('max_rows', None)
# default is 20 columns
pd.set_option('max_columns', None)

Reset maximum number of rows and columns printed to default

pd.reset_option('max_rows')
pd.reset_option('max_columns')

Suppress scientific notation

# display all floats with commas and two decimal places
pd.set_option('display.float_format', lambda x: '{:,.2f}'.format(x))
# or
pd.options.display.float_format = "{:,.2f}".format

Reset floats display

pd.reset_option('display.float_format')

Change float format to two decimal places

# using rounding
df['col_x'].round(2)
# using apply
df['col_x'].apply(lambda x: '{:.2f}'.format(x))

Set maximum number of rows and columns printed to unlimited temporary

# settings are restored when you exit the 'with' block
with pd.option_context('max_rows', None, 'max_columns', None):
    print(df)

Sort dataframe by one column values ascending

df = df.sort_values('col_x')

Sort the dataframe by one column values descending

df = df.sort_values('col_x', ascending=False)

Sort the dataframe by one column values ascending, inplace

df.sort_values('col_x', inplace=True)

Sort the dataframe by multiple columns values

df = df.sort_values(['col_x', 'col_y', 'col_z'], ascending=[True, True, False])

Sort the dataframe based on the index labels ascending

df = df.sort_index()

Sort the dataframe based on the columns labels ascending

df = df.sort_index(axis=1)

Sort the dataframe based on the columns labels ascending

import pandas as pd
data = {'Name': ['Tom', 'Jack', 'Steve', 'Ricky'], 'Age': [28, 34, 29, 42]}
df = pd.DataFrame(data)

>>> df
    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42

df = df.sort_index(axis=1)

# note how `Age` column moved first
>>> df
   Age   Name
0   28    Tom
1   34   Jack
2   29  Steve
3   42  Ricky