Pandas

I/O

pd.read_csv()
pd.read_json()

dtype

Pandas dtype

Python type

NumPy type

Description

object

str

str_, string_, unicode_

int64

int

int_, int8, int16, int32, int64, uint8, uint16,uint32,uint64

float64

float

float_, float16, float32, float64

bool

bool

bool_

datetime64

--

datetime64[ns]

timedelta[ns]

--

--

Difference between two datetimes

category

--

--

Finite list of text values

Schema

Columns

df.columns

SQL

select

loc selects rows by indexes, columns by labels

iloc selects rows by indexes, columns by positions

at selects one element using row index and column label

iat selects one element using row index and column position

Select by rows or columns only

Select notnull

Boolean index to position index

where

Use the pattern df[df[column] boolean expr]

Use query

Note that query is more efficient because it does not need to generate boolean index array.

distinct

df.drop_duplicates()

count (length)

df.shape[0] or len(df)

group by

df.groupby('key').size() or df['key'].value_counts()

Multiple aggregation functions

order

df.sort_values

drop

df.drop

Functional programming

df.apply

Applies a function to each row or each column.

df.applymap

Applies a function to a dataframe elementwise.

series.apply

Applies a function to a series elementwise.

References

Last updated

Was this helpful?