Pandas
I/O
pd.read_csv()
pd.read_json()dtype
Pandas dtype
Python type
NumPy type
Description
object
str
str_, string_, unicode_
int64
int
int_, int8, int16, int32, int64, uint8, uint16,uint32,uint64
float64
float
float_, float16, float32, float64
bool
bool
bool_
datetime64
--
datetime64[ns]
timedelta[ns]
--
--
Difference between two datetimes
category
--
--
Finite list of text values
Schema
Columns
df.columnsSQL
select
loc selects rows by indexes, columns by labels
iloc selects rows by indexes, columns by positions
at selects one element using row index and column label
iat selects one element using row index and column position
Select by rows or columns only
Select notnull
Boolean index to position index
where
Use the pattern df[df[column] boolean expr]
Use query
Note that query is more efficient because it does not need to generate boolean index array.
distinct
df.drop_duplicates()
count (length)
df.shape[0] or len(df)
group by
df.groupby('key').size() or df['key'].value_counts()
Multiple aggregation functions
order
df.sort_values
drop
df.drop
Functional programming
df.apply
Applies a function to each row or each column.
df.applymap
Applies a function to a dataframe elementwise.
series.apply
Applies a function to a series elementwise.
References
Last updated
Was this helpful?