pyblock.blocking¶

Tools for reblocking of data to remove serial correlation from data sets.

pyblock.blocking.reblock(data, rowvar=1, ddof=None, weights=None)¶

Blocking analysis of correlated data.

Repeatedly average neighbouring data points in order to remove the effect of serial correlation on the estimate of the standard error of a data set, as described by Flyvbjerg and Petersen [Flyvbjerg]. The standard error is constant (within error bars) once the correlation has been removed.

If a weighting is provided then the weighted variance and standard error of each variable is calculated, as described in [Pozzi]. Bessel correction is obtained using the “effective sample size” from [Madansky].

data : numpy.ndarray: 1D or 2D array containing multiple variables and data points. See rowvar.
rowvar : int: If rowvar is non-zero (default) then each row represents a variable and each column a data point per variable. Otherwise the relationship is swapped. Only used if data is a 2D array.
ddof : int: If not None, then the standard error and covariance are normalised by \((N - \text{ddof})\), where \(N\) is the number of data points per variable. Otherwise, the numpy default is used (i.e. \((N - 1)\)).
weights : numpy.array: A 1D weighting of the data to be reblocked. For multidimensional data an identical weighting is applied to the data for each variable.

block_info : list of collections.namedtuple()

Statistics from each reblocking iteration. Each tuple contains:

block : int

blocking iteration. Each iteration successively averages neighbouring pairs of data points. The final data point is discarded if the number of data points is odd.

ndata: int

number of data points in the blocking iteration.

mean : numpy.ndarray

mean of each variable in the data set.

cov : numpy.ndarray

covariance matrix.

std_err : numpy.ndarray

standard error of each variable.

std_err_err : numpy.ndarray

an estimate of the error in the standard error, assuming a Gaussian distribution.

[Flyvbjerg]

“Error estimates on averages of correlated data”, H. Flyvbjerg, H.G. Petersen, J. Chem. Phys. 91, 461 (1989).

[Pozzi]

“Exponential smoothing weighted correlations”, F. Pozzi, T. Matteo, T. Aste, Eur. Phys. J. B. 85, 175 (2012).

[Madansky]

“An Analysis of WinCross, SPSS, and Mentor Procedures for Estimating the Variance of a Weighted Mean”, A. Madansky, H. G. B. Alexander, www.analyticalgroup.com/download/weighted_variance.pdf

pyblock.blocking.find_optimal_block(ndata, stats)¶

Find the optimal block length from a reblocking calculation.

Inspect a reblocking calculation and find the block length which minimises the stochastic error and removes the effect of correlation from the data set. This follows the procedures detailed by [Wolff] and [Lee] et al.

ndata : int: number of data points (‘observations’) in the data set.
stats : list of tuples: statistics in the format as returned by pyblock.blocking.reblock().

list of int: the optimal block index for each variable (i.e. the first block index in which the correlation has been removed). If NaN, then the statistics provided were not sufficient to estimate the correlation length and more data should be collected.

[Wolff] (Eq 47) and [Lee] et al. (Eq 14) give the optimal block size to be

\[B^3 = 2 n n_{\text{corr}}^2\]

where \(n\) is the number of data points in the data set, \(B\) is the number of data points in each ‘block’ (ie the data set has been divided into \(n/B\) contiguous blocks) and \(n_{\text{corr}}\). [todo] - describe n_corr. Following the scheme proposed by [Lee] et al., we hence look for the largest block size which satisfies

\[B^3 >= 2 n n_{\text{corr}}^2.\]

From Eq 13 in [Lee] et al. (which they cast in terms of the variance):

\[n_{\text{err}} SE = SE_{\text{true}}\]

where the ‘error factor’, \(n_{\text{err}}\), is the square root of the estimated correlation length, \(SE\) is the standard error of the data set and \(SE_{\text{true}}\) is the true standard error once the correlation length has been taken into account. Hence the condition becomes:

\[B^3 >= 2 n (SE(B) / SE(0))^4\]

where \(SE(B)\) is the estimate of the standard error of the data divided in blocks of size \(B\).

I am grateful to Will Vigor for discussions and the initial implementation.

[Wolff]

(1, 2) “Monte Carlo errors with less errors”, U. Wolff, Comput. Phys. Commun. 156, 143 (2004) and arXiv:hep-lat/0306017.

[Lee]

(1, 2, 3, 4) “Strategies for improving the efficiency of quantum Monte Carlo calculations”, R.M. Lee, G.J. Conduit, N. Nemec, P. Lopez Rios, N.D. Drummond, Phys. Rev. E. 83, 066706 (2011).