Mixture¶

`Mixture`(name, args, *kwargs)	Mixture log-likelihood
`NormalMixture`(name, args, *kwargs)	Normal mixture log-likelihood
`MixtureSameFamily`(name, args, *kwargs)	Mixture Same Family log-likelihood This distribution handles mixtures of multivariate distributions in a vectorized manner.

class pymc3.distributions.mixture.Mixture(name, *args, **kwargs)¶

Mixture log-likelihood

Often used to model subpopulation heterogeneity

\[f(x \mid w, \theta) = \sum_{i = 1}^n w_i f_i(x \mid \theta_i)\]

Support	\(\cup_{i = 1}^n \textrm{support}(f_i)\)
Mean	\(\sum_{i = 1}^n w_i \mu_i\)

Parameters

w: array of floats: w >= 0 and w <= 1 the mixture weights
comp_dists: multidimensional PyMC3 distribution (e.g. `pm.Poisson.dist(…)`): or iterable of PyMC3 distributions the component distributions \(f_1, \ldots, f_n\)

Examples

# 2-Mixture Poisson distribution
with pm.Model() as model:
    lam = pm.Exponential('lam', lam=1, shape=(2,))  # `shape=(2,)` indicates two mixture components.

    # As we just need the logp, rather than add a RV to the model, we need to call .dist()
    components = pm.Poisson.dist(mu=lam, shape=(2,))

    w = pm.Dirichlet('w', a=np.array([1, 1]))  # two mixture component weights.

    like = pm.Mixture('like', w=w, comp_dists=components, observed=data)

# 2-Mixture Poisson using iterable of distributions.
with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)

    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)

    w = pm.Dirichlet('w', a=np.array([1, 1]))

    like = pm.Mixture('like', w=w, comp_dists = [pois1, pois2], observed=data)

# npop-Mixture of multidimensional Gaussian
npop = 5
nd = (3, 4)
with pm.Model() as model:
    mu = pm.Normal('mu', mu=np.arange(npop), sigma=1, shape=npop) # Each component has an independent mean

    w = pm.Dirichlet('w', a=np.ones(npop))

    components = pm.Normal.dist(mu=mu, sigma=1, shape=nd + (npop,))  # nd + (npop,) shaped multinomial

    like = pm.Mixture('like', w=w, comp_dists = components, observed=data, shape=nd)  # The resulting mixture is nd-shaped

# Multidimensional Mixture as stacked independent mixtures
with pm.Model() as model:
    mu = pm.Normal('mu', mu=np.arange(5), sigma=1, shape=5) # Each component has an independent mean

    w = pm.Dirichlet('w', a=np.ones(3, 5))  # w is a stack of 3 independent 5 component weight arrays

    components = pm.Normal.dist(mu=mu, sigma=1, shape=(3, 5))

    # The mixture is an array of 3 elements.
    # Each can be thought of as an independent scalar mixture of 5 components
    like = pm.Mixture('like', w=w, comp_dists = components, observed=data, shape=3)

infer_comp_dist_shapes(point=None)¶

Try to infer the shapes of the component distributions, comp_dists, and how they should broadcast together. The behavior is slightly different if comp_dists is a Distribution as compared to when it is a list of Distribution`s. When it is a list the following procedure is repeated for each element in the list: 1. Look up the `comp_dists.shape 2. If it is not empty, use it as comp_dist_shape 3. If it is an empty tuple, a single random sample is drawn by calling comp_dists.random(point=point, size=None), and the returned test_sample’s shape is used as the inferred comp_dists.shape

Parameters

point: None or dict (optional): Dictionary that maps rv names to values, to supply to self.comp_dists.random

Returns

comp_dist_shapes: shape tuple or list of shape tuples.: If comp_dists is a Distribution, it is a shape tuple of the inferred distribution shape. If comp_dists is a list of Distribution`s, it is a list of shape tuples inferred for each element in `comp_dists
broadcast_shape: shape tuple: The shape that results from broadcasting all component’s shapes together.

logp(value)¶

Calculate log-probability of defined Mixture distribution at specified value.

Parameters

value: numeric: Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or theano tensor

Returns

TensorVariable

random(point=None, size=None)¶

Draw random values from defined Mixture distribution.

Parameters

point: dict, optional: Dict of variable values on which random values are to be conditioned (uses default point if not specified).
size: int, optional: Desired size of random sample (returns one sample if not specified).

Returns

array

class pymc3.distributions.mixture.MixtureSameFamily(name, *args, **kwargs)¶

Mixture Same Family log-likelihood This distribution handles mixtures of multivariate distributions in a vectorized manner. It is used over Mixture distribution when the mixture components are not present on the last axis of components’ distribution.

Support	\(\textrm{support}(f)\)
Mean	\(w\mu\)

Parameters

w: array of floats: w >= 0 and w <= 1 the mixture weights
comp_dists: PyMC3 distribution (e.g. `pm.Multinomial.dist(…)`): The comp_dists can be scalar or multidimensional distribution. Assuming its shape to be - (i_0, …, i_n, mixture_axis, i_n+1, …, i_N), the mixture_axis is consumed resulting in the shape of mixture as - (i_0, …, i_n, i_n+1, …, i_N).
mixture_axis: int, default = -1: Axis representing the mixture components to be reduced in the mixture.

Notes

The default behaviour resembles Mixture distribution wherein the last axis of component distribution is reduced.

logp(value)¶

Calculate log-probability of defined MixtureSameFamily distribution at specified value.

Parameters

valuenumeric: Value(s) for which log-probability is calculated. If the log probabilities for multiple values are desired the values must be provided in a numpy array or theano tensor

Returns

TensorVariable

random(point=None, size=None)¶

Draw random values from defined MixtureSameFamily distribution.

Parameters

pointdict, optional: Dict of variable values on which random values are to be conditioned (uses default point if not specified).
sizeint, optional: Desired size of random sample (returns one sample if not specified).

Returns

array

class pymc3.distributions.mixture.NormalMixture(name, *args, **kwargs)¶

Normal mixture log-likelihood

\[f(x \mid w, \mu, \sigma^2) = \sum_{i = 1}^n w_i N(x \mid \mu_i, \sigma^2_i)\]

Support	\(x \in \mathbb{R}\)
Mean	\(\sum_{i = 1}^n w_i \mu_i\)
Variance	\(\sum_{i = 1}^n w_i^2 \sigma^2_i\)

Parameters

w: array of floats: w >= 0 and w <= 1 the mixture weights
mu: array of floats: the component means
sigma: array of floats: the component standard deviations
tau: array of floats: the component precisions
comp_shape: shape of the Normal component: notice that it should be different than the shape of the mixture distribution, with one axis being the number of components.

Notes

You only have to pass in sigma or tau, but not both.

Examples

n_components = 3

with pm.Model() as gauss_mix:
    μ = pm.Normal(
        "μ",
        data.mean(),
        10,
        shape=n_components,
        transform=pm.transforms.ordered,
        testval=[1, 2, 3],
    )
    σ = pm.HalfNormal("σ", 10, shape=n_components)
    weights = pm.Dirichlet("w", np.ones(n_components))

    pm.NormalMixture("y", w=weights, mu=μ, sigma=σ, observed=data)