bolster.stats
Basic statistics and data frame helpers.
Simple functions for common data manipulation tasks: - add_totals/drop_totals: manage row/column totals in DataFrames - top_n: truncate DataFrames to top N rows with ‘others’ aggregation - fix_datetime_tz_columns: strip timezone info from datetime columns
Plus distribution fitting in the distributions submodule.
Submodules
Functions
|
Add Row and Column totals to a dataframe (in place). |
|
Remove Row and Column totals from a dataframe (in place). |
|
Strip Timezone information from relevant datetime columns in a dataframe. |
|
Truncate the DataFrame to the top 'n' rows, summing all subsequent rows into an 'others' row. |
Package Contents
- bolster.stats.add_totals(df, column_total='total', row_total='total', inplace=True)[source]
Add Row and Column totals to a dataframe (in place).
>>> add_totals(pd.DataFrame([[0,1,2],[3,4,5]])) 0 1 2 total 0 0 1 2 3 1 3 4 5 12 total 3 5 7 15
>>> add_totals(pd.DataFrame([[0,1,2],[3,4,5]]),'ctot', 'rtot') 0 1 2 rtot 0 0 1 2 3 1 3 4 5 12 ctot 3 5 7 15
>>> df = pd.DataFrame([[0,1,2],[3,4,5]]) >>> add_totals(df, inplace=False) 0 1 2 total 0 0 1 2 3 1 3 4 5 12 total 3 5 7 15
>>> df 0 1 2 0 0 1 2 1 3 4 5
- bolster.stats.drop_totals(df, column_total='total', row_total='total', inplace=True)[source]
Remove Row and Column totals from a dataframe (in place).
- Parameters:
df (pd.DataFrame) – The DataFrame from which to remove totals.
column_total (AnyStr, optional) – The name of the column total, by default “total”.
row_total (AnyStr, optional) – The name of the row total, by default “total”.
inplace (bool, optional) – Whether to modify the DataFrame in place, by default True.
Returns
--------
pd.DataFrame – The DataFrame with totals removed.
Examples
--------
pd.DataFrame({'A' (>>> df =)
[6 (>>> df.loc['total'] =)
15
21]
drop_totals(df) (>>>) – A B
4 (0 1)
5 (1 2)
6 (2 3)
- bolster.stats.fix_datetime_tz_columns(df, inplace=True)[source]
Strip Timezone information from relevant datetime columns in a dataframe.
- Parameters:
df (pandas.DataFrame)
(bool) (inplace)
Returns:
df
- bolster.stats.top_n(df, n, others='others')[source]
Truncate the DataFrame to the top ‘n’ rows, summing all subsequent rows into an ‘others’ row.
- Parameters:
df (pd.DataFrame) – The DataFrame to truncate.
n (int) – The number of top rows to keep.
Returns
--------
pd.DataFrame – The truncated DataFrame with an ‘others’ row.
Examples
--------
pd.DataFrame({'A' (>>> df =)
top_n(df (>>>) – A B
doctest (3) #) – A B
5 (0 1)
4 (1 2)
3 (others 9)
3