bolster.stats

Basic statistics and data frame helpers.

Simple functions for common data manipulation tasks: - add_totals/drop_totals: manage row/column totals in DataFrames - top_n: truncate DataFrames to top N rows with ‘others’ aggregation - fix_datetime_tz_columns: strip timezone info from datetime columns

Plus distribution fitting in the distributions submodule.

Submodules

Functions

add_totals(df[, column_total, row_total, inplace])

Add Row and Column totals to a dataframe (in place).

drop_totals(df[, column_total, row_total, inplace])

Remove Row and Column totals from a dataframe (in place).

fix_datetime_tz_columns(df[, inplace])

Strip Timezone information from relevant datetime columns in a dataframe.

top_n(df, n[, others])

Truncate the DataFrame to the top 'n' rows, summing all subsequent rows into an 'others' row.

Package Contents

bolster.stats.add_totals(df, column_total='total', row_total='total', inplace=True)[source]

Add Row and Column totals to a dataframe (in place).

>>> add_totals(pd.DataFrame([[0,1,2],[3,4,5]]))
       0  1  2  total
0      0  1  2      3
1      3  4  5     12
total  3  5  7     15
>>> add_totals(pd.DataFrame([[0,1,2],[3,4,5]]),'ctot', 'rtot')
      0  1  2  rtot
0     0  1  2     3
1     3  4  5    12
ctot  3  5  7    15
>>> df = pd.DataFrame([[0,1,2],[3,4,5]])
>>> add_totals(df, inplace=False)
       0  1  2  total
0      0  1  2      3
1      3  4  5     12
total  3  5  7     15
>>> df
   0  1  2
0  0  1  2
1  3  4  5
bolster.stats.drop_totals(df, column_total='total', row_total='total', inplace=True)[source]

Remove Row and Column totals from a dataframe (in place).

Parameters:
  • df (pd.DataFrame) – The DataFrame from which to remove totals.

  • column_total (AnyStr, optional) – The name of the column total, by default “total”.

  • row_total (AnyStr, optional) – The name of the row total, by default “total”.

  • inplace (bool, optional) – Whether to modify the DataFrame in place, by default True.

  • Returns

  • --------

  • pd.DataFrame – The DataFrame with totals removed.

  • Examples

  • --------

  • pd.DataFrame({'A' (>>> df =)

  • [6 (>>> df.loc['total'] =)

  • 15

  • 21]

  • drop_totals(df) (>>>) – A B

  • 4 (0 1)

  • 5 (1 2)

  • 6 (2 3)

bolster.stats.fix_datetime_tz_columns(df, inplace=True)[source]

Strip Timezone information from relevant datetime columns in a dataframe.

Parameters:

Returns:

df

bolster.stats.top_n(df, n, others='others')[source]

Truncate the DataFrame to the top ‘n’ rows, summing all subsequent rows into an ‘others’ row.

Parameters:
  • df (pd.DataFrame) – The DataFrame to truncate.

  • n (int) – The number of top rows to keep.

  • Returns

  • --------

  • pd.DataFrame – The truncated DataFrame with an ‘others’ row.

  • Examples

  • --------

  • pd.DataFrame({'A' (>>> df =)

  • top_n(df (>>>) – A B

  • doctest (3) #) – A B

  • 5 (0 1)

  • 4 (1 2)

  • 3 (others 9)

  • 3