Learning Journal

A journal of habit-forming shenanigans


Understanding Datasets

Function Description
dataframe.head(n) to show the first n rows of dataframe || else top 5
dataframe.tail(n) to show the bottom n rows of dataframe || else bottom 5
dataframe.shape size of dataframe (rows, columns)
dataframe.dtypes() to check data types
dataframe.describe() returns a statistical summary
dataframe.describe(include="all") returns a statistical summary of all including objects
dataframe.info() provides a concise summary of all your dataframe
dataframe.astype() to convert data types
dataframe.columns.values to get a array of column headers from
dataframe.index.values to get a array of index


Dataframe Manipulation

Command Description
dataframe.set_index("key", inplace=True) changes the index of the dataframe to given column "key"
dataframe.reset_index() resets the index of the dataframe
dataframe.dropna(axis=0) drops the entire row if found empty
dataframe.dropna(axis=1) drops the entier column if found empty
dataframe.dropna( [keys] ,inplace=True) applies changes to the actual dataframe itself
dataframe.replace( missing_value, new_value ) replace value of one to another in dataframe
dataframe.rename( columns={prevkey:newkey} ) rename columns of the dataframe
dataframe.sum( axis=1 ) returnsm sum of the row
dataframe["key"].std() returns the standard deviation
dataframe["key"].mean() returns the mean of the column
pd.get_dummies(df[column]) Convert categorical variables to dummy variables (0 or 1) [One - hot encoding]
dataframe.groupby([column, column]) Group dataframe based on column
dataframe[column].agg('function') Perform an operation in a particular coulmn


Viewing Datasets

Command Description
dataframe.loc[label] filters by the labels of the index/column

# For val of particular col and row use dataframe.loc['label','col']
dataframe.iloc[index] filters by the position of the index/coumn


Miscellaneous

Data Format Read Save
csv pd.read_csv() df.to_csv()
json pd.read_json() df.to_json()
Excel pd.read_excel() df.to_excel()
sql pd.read_sql() df.to_sql()