amep.statistics.distribution#

amep.statistics.distribution(data: ndarray, weights: ndarray | None = None, xmin: float | None = None, xmax: float | None = None, nbins: int | None = 10, density: bool = True, logbins: bool = False) → tuple[ndarray, ndarray]#

Calculates the distribution function of the given data from a histogram.

Notes

An optimal number of bins can be estimated using the Freedman–Diaconis rule (see https://en.wikipedia.org/wiki/Freedman–Diaconis_rule and Ref. [1] for further information). If nbins is set to None, this rule will be applied. Note that an error could occur for very large data arrays. Therefore, for large data arrays, it is recommended to fix the number of bins manually.

References

Parameters:

data (np.ndarray of shape (M,)) – Data of which the distribution should be calculated.
weights (np.ndarray or None, optional) – Weight each data point with these weights. Must have the same shape as data. If density is True, the weights are normalized. The default is None.
xmin (float, optional) – Minimum value of the bins. The default is None.
xmax (float, optional) – Maximum value of the bins. The default is None.
nbins (int or None, optional) – Number of bins. If None, the Freedman-Diaconis rule [1] is used to estimate an optimal number of bins. Using this rule is only recommended for small data arrays. The default is 10.
density (bool, optional) – If True, the distribution is normalized. If False, a simple histogram is returned. The default is True.
logbins (bool, optional) – If True, the bins are logarithmically spaced. Only possible when nbins is given. The default is False.

Returns:

np.ndarray – Histogram/Distribution function.
np.ndarray – Bins.

Examples

>>> import amep
>>> import numpy as np
>>> ndata = np.random.normal(
...     loc=0.0, scale=1.0, size=100000
... )
>>> a, abins = amep.statistics.distribution(
...     ndata, nbins=None
... )
>>> gfit = amep.functions.NormalizedGaussian()
>>> gfit.fit(abins, a)
>>> print(gfit.results)
{'mu': (0.003195771437101405, 0.002420397928883222),
 'sig': (0.9982728437059646, 0.001951630585125867)}
>>> fig, axs = amep.plot.new()
>>> axs.plot(abins, a, label='histogram', ls='')
>>> axs.plot(
...     abins, gfit.generate(abins),
...     label='Gaussian fit', marker=''
... )
>>> axs.legend()
>>> axs.set_xlabel(r'$x$')
>>> axs.set_ylabel(r'$p(x)$')
>>> fig.savefig('./figures/statistics/statistics-distribution.png')
>>>