Mixture

Mixture(name, *args, **kwargs)

Mixture log-likelihood

NormalMixture(name, *args, **kwargs)

Normal mixture log-likelihood

MixtureSameFamily(name, *args, **kwargs)

Mixture Same Family log-likelihood This distribution handles mixtures of multivariate distributions in a vectorized manner.

class pymc3.distributions.mixture.Mixture(name, *args, **kwargs)

Mixture log-likelihood

Often used to model subpopulation heterogeneity

\[f(x \mid w, \theta) = \sum_{i = 1}^n w_i f_i(x \mid \theta_i)\]

Support

\(\cup_{i = 1}^n \textrm{support}(f_i)\)

Mean

\(\sum_{i = 1}^n w_i \mu_i\)

Parameters
w: array of floats

w >= 0 and w <= 1 the mixture weights

comp_dists: multidimensional PyMC3 distribution (e.g. `pm.Poisson.dist(…)`)

or iterable of PyMC3 distributions the component distributions \(f_1, \ldots, f_n\)

Examples

# 2-Mixture Poisson distribution
with pm.Model() as model:
    lam = pm.Exponential('lam', lam=1, shape=(2,))  # `shape=(2,)` indicates two mixture components.

    # As we just need the logp, rather than add a RV to the model, we need to call .dist()
    components = pm.Poisson.dist(mu=lam, shape=(2,))

    w = pm.Dirichlet('w', a=np.array([1, 1]))  # two mixture component weights.

    like = pm.Mixture('like', w=w, comp_dists=components, observed=data)

# 2-Mixture Poisson using iterable of distributions.
with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)

    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)

    w = pm.Dirichlet('w', a=np.array([1, 1]))

    like = pm.Mixture('like', w=w, comp_dists = [pois1, pois2], observed=data)

# npop-Mixture of multidimensional Gaussian
npop = 5
nd = (3, 4)
with pm.Model() as model:
    mu = pm.Normal('mu', mu=np.arange(npop), sigma=1, shape=npop) # Each component has an independent mean

    w = pm.Dirichlet('w', a=np.ones(npop))

    components = pm.Normal.dist(mu=mu, sigma=1, shape=nd + (npop,))  # nd + (npop,) shaped multinomial

    like = pm.Mixture('like', w=w, comp_dists = components, observed=data, shape=nd)  # The resulting mixture is nd-shaped

# Multidimensional Mixture as stacked independent mixtures
with pm.Model() as model:
    mu = pm.Normal('mu', mu=np.arange(5), sigma=1, shape=5) # Each component has an independent mean

    w = pm.Dirichlet('w', a=np.ones(3, 5))  # w is a stack of 3 independent 5 component weight arrays

    components = pm.Normal.dist(mu=mu, sigma=1, shape=(3, 5))

    # The mixture is an array of 3 elements.
    # Each can be thought of as an independent scalar mixture of 5 components
    like = pm.Mixture('like', w=w, comp_dists = components, observed=data, shape=3)
infer_comp_dist_shapes(point=None)

Try to infer the shapes of the component distributions, comp_dists, and how they should broadcast together. The behavior is slightly different if comp_dists is a Distribution as compared to when it is a list of Distribution`s. When it is a list the following procedure is repeated for each element in the list: 1. Look up the `comp_dists.shape 2. If it is not empty, use it as comp_dist_shape 3. If it is an empty tuple, a single random sample is drawn by calling comp_dists.random(point=point, size=None), and the returned test_sample’s shape is used as the inferred comp_dists.shape

Parameters
point: None or dict (optional)

Dictionary that maps rv names to values, to supply to self.comp_dists.random

Returns
comp_dist_shapes: shape tuple or list of shape tuples.

If comp_dists is a Distribution, it is a shape tuple of the inferred distribution shape. If comp_dists is a list of Distribution`s, it is a list of shape tuples inferred for each element in `comp_dists

broadcast_shape: shape tuple

The shape that results from broadcasting all component’s shapes together.

logp(value)

Calculate log-probability of defined Mixture distribution at specified value.

Parameters
value: numeric

Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or theano tensor

Returns
TensorVariable
random(point=None, size=None)

Draw random values from defined Mixture distribution.

Parameters
point: dict, optional

Dict of variable values on which random values are to be conditioned (uses default point if not specified).

size: int, optional

Desired size of random sample (returns one sample if not specified).

Returns
array
class pymc3.distributions.mixture.MixtureSameFamily(name, *args, **kwargs)

Mixture Same Family log-likelihood This distribution handles mixtures of multivariate distributions in a vectorized manner. It is used over Mixture distribution when the mixture components are not present on the last axis of components’ distribution.

Support

\(\textrm{support}(f)\)

Mean

\(w\mu\)

Parameters
w: array of floats

w >= 0 and w <= 1 the mixture weights

comp_dists: PyMC3 distribution (e.g. `pm.Multinomial.dist(…)`)

The comp_dists can be scalar or multidimensional distribution. Assuming its shape to be - (i_0, …, i_n, mixture_axis, i_n+1, …, i_N), the mixture_axis is consumed resulting in the shape of mixture as - (i_0, …, i_n, i_n+1, …, i_N).

mixture_axis: int, default = -1

Axis representing the mixture components to be reduced in the mixture.

Notes

The default behaviour resembles Mixture distribution wherein the last axis of component distribution is reduced.

logp(value)

Calculate log-probability of defined MixtureSameFamily distribution at specified value.

Parameters
valuenumeric

Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or theano tensor

Returns
TensorVariable
random(point=None, size=None)

Draw random values from defined MixtureSameFamily distribution.

Parameters
pointdict, optional

Dict of variable values on which random values are to be conditioned (uses default point if not specified).

sizeint, optional

Desired size of random sample (returns one sample if not specified).

Returns
array
class pymc3.distributions.mixture.NormalMixture(name, *args, **kwargs)

Normal mixture log-likelihood

\[f(x \mid w, \mu, \sigma^2) = \sum_{i = 1}^n w_i N(x \mid \mu_i, \sigma^2_i)\]

Support

\(x \in \mathbb{R}\)

Mean

\(\sum_{i = 1}^n w_i \mu_i\)

Variance

\(\sum_{i = 1}^n w_i^2 \sigma^2_i\)

Parameters
w: array of floats

w >= 0 and w <= 1 the mixture weights

mu: array of floats

the component means

sigma: array of floats

the component standard deviations

tau: array of floats

the component precisions

comp_shape: shape of the Normal component

notice that it should be different than the shape of the mixture distribution, with one axis being the number of components.

Notes

You only have to pass in sigma or tau, but not both.

Examples

n_components = 3

with pm.Model() as gauss_mix:
    μ = pm.Normal(
        "μ",
        data.mean(),
        10,
        shape=n_components,
        transform=pm.transforms.ordered,
        testval=[1, 2, 3],
    )
    σ = pm.HalfNormal("σ", 10, shape=n_components)
    weights = pm.Dirichlet("w", np.ones(n_components))

    pm.NormalMixture("y", w=weights, mu=μ, sigma=σ, observed=data)