Logo

statsmodels.stats.weightstats.DescrStatsW

class statsmodels.stats.weightstats.DescrStatsW(data, weights=None, ddof=0)[source]

descriptive statistics and tests with weights for case weights

Assumes that the data is 1d or 2d with (nobs, nvars) observations in rows, variables in columns, and that the same weight applies to each column.

If degrees of freedom correction is used, then weights should add up to the number of observations. ttest also assumes that the sum of weights corresponds to the sample size.

This is essentially the same as replicating each observations by its weight, if the weights are integers, often called case or frequency weights.

Parameters:

data : array_like, 1-D or 2-D

dataset

weights : None or 1-D ndarray

weights for each observation, with same length as zero axis of data

ddof : int

default ddof=0, degrees of freedom correction used for second moments, var, std, cov, corrcoef. However, statistical tests are independent of ddof, based on the standard formulas.

Examples

Note: I don’t know the seed for the following, so the numbers will differ

>>> x1_2d = 1.0 + np.random.randn(20, 3)
>>> w1 = np.random.randint(1,4, 20)
>>> d1 = DescrStatsW(x1_2d, weights=w1)
>>> d1.mean
array([ 1.42739844,  1.23174284,  1.083753  ])
>>> d1.var
array([ 0.94855633,  0.52074626,  1.12309325])
>>> d1.std_mean
array([ 0.14682676,  0.10878944,  0.15976497])
>>> tstat, pval, df = d1.ttest_mean(0)
>>> tstat; pval; df
array([  9.72165021,  11.32226471,   6.78342055])
array([  1.58414212e-12,   1.26536887e-14,   2.37623126e-08])
44.0
>>> tstat, pval, df = d1.ttest_mean([0, 1, 1])
>>> tstat; pval; df
array([ 9.72165021,  2.13019609,  0.52422632])
array([  1.58414212e-12,   3.87842808e-02,   6.02752170e-01])
44.0

#if weiqhts are integers, then asrepeats can be used

>>> x1r = d1.asrepeats()
>>> x1r.shape
...
>>> stats.ttest_1samp(x1r, [0, 1, 1])
...

Methods

asrepeats() get array that has repeats given by floor(weights)
corrcoef() weighted correlation with default ddof
cov() weighted covariance of data if data is 2 dimensional
demeaned() data with weighted mean subtracted
get_compare(other[, weights]) return an instance of CompareMeans with self and other
mean() weighted mean of data
nobs() alias for number of observations/cases, equal to sum of weights
std() standard deviation with default degrees of freedom correction
std_ddof([ddof]) standard deviation of data with given ddof
std_mean() standard deviation of weighted mean
sum() weighted sum of data
sum_weights()
sumsquares() weighted sum of squares of demeaned data
tconfint_mean([alpha, alternative]) two-sided confidence interval for weighted mean of data
ttest_mean([value, alternative]) ttest of Null hypothesis that mean is equal to value.
ttost_mean(low, upp) test of (non-)equivalence of one sample
var() variance with default degrees of freedom correction
var_ddof([ddof]) variance of data given ddof
zconfint_mean([alpha, alternative]) two-sided confidence interval for weighted mean of data
ztest_mean([value, alternative]) z-test of Null hypothesis that mean is equal to value.
ztost_mean(low, upp) test of (non-)equivalence of one sample, based on z-test

Previous topic

statsmodels.sandbox.stats.multicomp.tiecorrect

Next topic

statsmodels.stats.weightstats.DescrStatsW.asrepeats

This Page