Week 6 glossary#
Online resources#
The official pandas documentation page is at https://pandas.pydata.org/docs/reference/index.html.
The documentation of matplolib.dates submodule is at https://matplotlib.org/stable/api/dates_api.html
Internet download functions#
urlretrive()(fromurllib.request): retrieve a file from the internet to the local file system.gdown.download(): download a public Google Drive file to the local file systemzipfile.Zipfile(),<Zipfile>.extractall(): extract the content of a zip file.
Pandas functions#
pd.DataFrame(): create a new DataFrame manuallypd.read_csv(): reading a csv file into a DataFramepd.read_excel(): reading an Excel file into a DataFramepd.isna(): vectorized check on which values are missingpd.to_datetime(): convert strings to datetime objectspd.Series(): create a pandas Series from arrays or pandas indices
Pandas DataFrame attributes#
<DataFrame>.size: the size (number of entries) of the DataFrame<DataFrame>.shape: the shape (number of rows, columns) of the DataFrame<DataFrame>.ndim: the number of dimension of the DataFrame<DataFrame>.columns: the column names of the DataFrame<DataFrame>.index: the index (row labels) of the DataFrame
Pandas DataFrame methods#
DataFrame manipulation#
<DataFrame>.info(): print out summary information about the DataFrame<DataFrame>.iloc[]: subset the DataFrame by row and column indices<DataFrame>.loc[]: subset the DataFrame by row and column labels<DataFrame>.dropna(): dropping rows with missing values from the DataFrame<DataFrame>.sort_index(): sort the rows of a DataFrame by the values of its index<DataFrame>.sort_values(): sort the rows of a DataFrame by values from certain columns<DataFrame>.reset_index(): reset the index of a DataFrame as a regular column<DataFrame>.set_index(): set a column of the DataFrame as the DataFrame’s row index<DataFrame>.to_csv(): export a DataFrame as an external csv file
Descriptive statistics#
<DataFrame>.describe(): compute multiple descriptive statistics for all columns of a DataFrame all at once<DataFrame>.mean(): compute the mean for all columns of a DataFrame<DataFrame>.median(): compute the median for all columns of a DataFrame<DataFrame>.quantile(): compute a given quantile for all columns of a DataFrame<DataFrame>.var(): compute the variance for all columns of a DataFrame<DataFrame>.std(): compute the standard deviation for all columns of a DataFrame<DataFrame>.sem(): compute the standard error for all columns of a DataFrame
Pandas datetime Series attributes and methods#
<Series>.dt.strftime(): produce datetime strings from datetime Series<Series>.dt.year: extract the years of the datetime Series<Series>.dt.month: sxtract the months of the datetime Series<Series>.dt.day: extract the days of the datetime Series<Series>.dt.hour: extract the hours of the datetime Series<Series>.dt.minute: extract the minutes of the datetime Series<Series>.dt.dayofyear: extract the days of year of the datetime Series
Datetime plotting utilities from matplotlib.dates#
mdates.YearLocator(): place ticks on year-level intervalsmdates.MonthLocator(): place ticks on month-level intervalsmdates.DayLocator(): place ticks on day-level intervalsmdates.HourLocator(): place ticks on hour-level intervalsmdates.MinuteLocator(): place ticks on minute-level intervals