""" Linear mixed effects models are regression models for dependent data. They can be used to estimate regression relationships involving both means and variances. These models are also known as multilevel linear models, and hierarchical linear models. The MixedLM class fits linear mixed effects models to data, and provides support for some common post-estimation tasks. This is a group-based implementation that is most efficient for models in which the data can be partitioned into independent groups. Some models with crossed effects can be handled by specifying a model with a single group. The data are partitioned into disjoint groups. The probability model for group i is: Y = X*beta + Z*gamma + epsilon where * n_i is the number of observations in group i * Y is a n_i dimensional response vector (called endog in MixedLM) * X is a n_i x k_fe dimensional design matrix for the fixed effects (called exog in MixedLM) * beta is a k_fe-dimensional vector of fixed effects parameters (called fe_params in MixedLM) * Z is a design matrix for the random effects with n_i rows (called exog_re in MixedLM). The number of columns in Z can vary by group as discussed below. * gamma is a random vector with mean 0. The covariance matrix for the first `k_re` elements of `gamma` (called cov_re in MixedLM) is common to all groups. The remaining elements of `gamma` are variance components as discussed in more detail below. Each group receives its own independent realization of gamma. * epsilon is a n_i dimensional vector of iid normal errors with mean 0 and variance sigma^2; the epsilon values are independent both within and between groups Y, X and Z must be entirely observed. beta, Psi, and sigma^2 are estimated using ML or REML estimation, and gamma and epsilon are random so define the probability model. The marginal mean structure is E[Y | X, Z] = X*beta. If only the mean structure is of interest, GEE is an alternative to using linear mixed models. Two types of random effects are supported. Standard random effects are correlated with each other in arbitrary ways. Every group has the same number (`k_re`) of standard random effects, with the same joint distribution (but with independent realizations across the groups). Variance components are uncorrelated with each other, and with the standard random effects. Each variance component has mean zero, and all realizations of a given variance component have the same variance parameter. The number of realized variance components per variance parameter can differ across the groups. The primary reference for the implementation details is: MJ Lindstrom, DM Bates (1988). "Newton Raphson and EM algorithms for linear mixed effects models for repeated measures data". Journal of the American Statistical Association. Volume 83, Issue 404, pages 1014-1022. See also this more recent document: http://econ.ucsb.edu/~doug/245a/Papers/Mixed%20Effects%20Implement.pdf All the likelihood, gradient, and Hessian calculations closely follow Lindstrom and Bates 1988, adapted to support variance components. The following two documents are written more from the perspective of users: http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf http://lme4.r-forge.r-project.org/slides/2009-07-07-Rennes/3Longitudinal-4.pdf Notation: * `cov_re` is the random effects covariance matrix (referred to above as Psi) and `scale` is the (scalar) error variance. For a single group, the marginal covariance matrix of endog given exog is scale*I + Z * cov_re * Z', where Z is the design matrix for the random effects in one group. * `vcomp` is a vector of variance parameters. The length of `vcomp` is determined by the number of keys in either the `exog_vc` argument to ``MixedLM``, or the `vc_formula` argument when using formulas to fit a model. Notes: 1. Three different parameterizations are used in different places. The regression slopes (usually called `fe_params`) are identical in all three parameterizations, but the variance parameters differ. The parameterizations are: * The "user parameterization" in which cov(endog) = scale*I + Z * cov_re * Z', as described above. This is the main parameterization visible to the user. * The "profile parameterization" in which cov(endog) = I + Z * cov_re1 * Z'. This is the parameterization of the profile likelihood that is maximized to produce parameter estimates. (see Lindstrom and Bates for details). The "user" cov_re is equal to the "profile" cov_re1 times the scale. * The "square root parameterization" in which we work with the Cholesky factor of cov_re1 instead of cov_re directly. This is hidden from the user. All three parameterizations can be packed into a vector by (optionally) concatenating `fe_params` together with the lower triangle or Cholesky square root of the dependence structure, followed by the variance parameters for the variance components. The are stored as square roots if (and only if) the random effects covariance matrix is stored as its Cholesky factor. Note that when unpacking, it is important to either square or reflect the dependence structure depending on which parameterization is being used. Two score methods are implemented. One takes the score with respect to the elements of the random effects covariance matrix (used for inference once the MLE is reached), and the other takes the score with respect to the parameters of the Cholesky square root of the random effects covariance matrix (used for optimization). The numerical optimization uses GLS to avoid explicitly optimizing over the fixed effects parameters. The likelihood that is optimized is profiled over both the scale parameter (a scalar) and the fixed effects parameters (if any). As a result of this profiling, it is difficult and unnecessary to calculate the Hessian of the profiled log likelihood function, so that calculation is not implemented here. Therefore, optimization methods requiring the Hessian matrix such as the Newton-Raphson algorithm cannot be used for model fitting. """ import warnings import numpy as np import pandas as pd import patsy from scipy import sparse from scipy.stats.distributions import norm from statsmodels.base._penalties import Penalty import statsmodels.base.model as base from statsmodels.tools import data as data_tools from statsmodels.tools.decorators import cache_readonly from statsmodels.tools.sm_exceptions import ConvergenceWarning _warn_cov_sing = "The random effects covariance matrix is singular." def _dot(x, y): """ Returns the dot product of the arrays, works for sparse and dense. """ if isinstance(x, np.ndarray) and isinstance(y, np.ndarray): return np.dot(x, y) elif sparse.issparse(x): return x.dot(y) elif sparse.issparse(y): return y.T.dot(x.T).T # From numpy, adapted to work with sparse and dense arrays. def _multi_dot_three(A, B, C): """ Find best ordering for three arrays and do the multiplication. Doing in manually instead of using dynamic programing is approximately 15 times faster. """ # cost1 = cost((AB)C) cost1 = (A.shape[0] * A.shape[1] * B.shape[1] + # (AB) A.shape[0] * B.shape[1] * C.shape[1]) # (--)C # cost2 = cost((AB)C) cost2 = (B.shape[0] * B.shape[1] * C.shape[1] + # (BC) A.shape[0] * A.shape[1] * C.shape[1]) # A(--) if cost1 < cost2: return _dot(_dot(A, B), C) else: return _dot(A, _dot(B, C)) def _dotsum(x, y): """ Returns sum(x * y), where '*' is the pointwise product, computed efficiently for dense and sparse matrices. """ if sparse.issparse(x): return x.multiply(y).sum() else: # This way usually avoids allocating a temporary. return np.dot(x.ravel(), y.ravel()) class VCSpec: """ Define the variance component structure of a multilevel model. An instance of the class contains three attributes: - names : names[k] is the name of variance component k. - mats : mats[k][i] is the design matrix for group index i in variance component k. - colnames : colnames[k][i] is the list of column names for mats[k][i]. The groups in colnames and mats must be in sorted order. """ def __init__(self, names, colnames, mats): self.names = names self.colnames = colnames self.mats = mats def _get_exog_re_names(self, exog_re): """ Passes through if given a list of names. Otherwise, gets pandas names or creates some generic variable names as needed. """ if self.k_re == 0: return [] if isinstance(exog_re, pd.DataFrame): return exog_re.columns.tolist() elif isinstance(exog_re, pd.Series) and exog_re.name is not None: return [exog_re.name] elif isinstance(exog_re, list): return exog_re # Default names defnames = [f"x_re{k + 1:1d}" for k in range(exog_re.shape[1])] return defnames class MixedLMParams: """ This class represents a parameter state for a mixed linear model. Parameters ---------- k_fe : int The number of covariates with fixed effects. k_re : int The number of covariates with random coefficients (excluding variance components). k_vc : int The number of variance components parameters. Notes ----- This object represents the parameter state for the model in which the scale parameter has been profiled out. """ def __init__(self, k_fe, k_re, k_vc): self.k_fe = k_fe self.k_re = k_re self.k_re2 = k_re * (k_re + 1) // 2 self.k_vc = k_vc self.k_tot = self.k_fe + self.k_re2 + self.k_vc self._ix = np.tril_indices(self.k_re) def from_packed(params, k_fe, k_re, use_sqrt, has_fe): """ Create a MixedLMParams object from packed parameter vector. Parameters ---------- params : array_like The mode parameters packed into a single vector. k_fe : int The number of covariates with fixed effects k_re : int The number of covariates with random effects (excluding variance components). use_sqrt : bool If True, the random effects covariance matrix is provided as its Cholesky factor, otherwise the lower triangle of the covariance matrix is stored. has_fe : bool If True, `params` contains fixed effects parameters. Otherwise, the fixed effects parameters are set to zero. Returns ------- A MixedLMParams object. """ k_re2 = int(k_re * (k_re + 1) / 2) # The number of covariance parameters. if has_fe: k_vc = len(params) - k_fe - k_re2 else: k_vc = len(params) - k_re2 pa = MixedLMParams(k_fe, k_re, k_vc) cov_re = np.zeros((k_re, k_re)) ix = pa._ix if has_fe: pa.fe_params = params[0:k_fe] cov_re[ix] = params[k_fe:k_fe+k_re2] else: pa.fe_params = np.zeros(k_fe) cov_re[ix] = params[0:k_re2] if use_sqrt: cov_re = np.dot(cov_re, cov_re.T) else: cov_re = (cov_re + cov_re.T) - np.diag(np.diag(cov_re)) pa.cov_re = cov_re if k_vc > 0: if use_sqrt: pa.vcomp = params[-k_vc:]**2 else: pa.vcomp = params[-k_vc:] else: pa.vcomp = np.array([]) return pa from_packed = staticmethod(from_packed) def from_components(fe_params=None, cov_re=None, cov_re_sqrt=None, vcomp=None): """ Create a MixedLMParams object from each parameter component. Parameters ---------- fe_params : array_like The fixed effects parameter (a 1-dimensional array). If None, there are no fixed effects. cov_re : array_like The random effects covariance matrix (a square, symmetric 2-dimensional array). cov_re_sqrt : array_like The Cholesky (lower triangular) square root of the random effects covariance matrix. vcomp : array_like The variance component parameters. If None, there are no variance components. Returns ------- A MixedLMParams object. """ if vcomp is None: vcomp = np.empty(0) if fe_params is None: fe_params = np.empty(0) if cov_re is None and cov_re_sqrt is None: cov_re = np.empty((0, 0)) k_fe = len(fe_params) k_vc = len(vcomp) k_re = cov_re.shape[0] if cov_re is not None else cov_re_sqrt.shape[0] pa = MixedLMParams(k_fe, k_re, k_vc) pa.fe_params = fe_params if cov_re_sqrt is not None: pa.cov_re = np.dot(cov_re_sqrt, cov_re_sqrt.T) elif cov_re is not None: pa.cov_re = cov_re pa.vcomp = vcomp return pa from_components = staticmethod(from_components) def copy(self): """ Returns a copy of the object. """ obj = MixedLMParams(self.k_fe, self.k_re, self.k_vc) obj.fe_params = self.fe_params.copy() obj.cov_re = self.cov_re.copy() obj.vcomp = self.vcomp.copy() return obj def get_packed(self, use_sqrt, has_fe=False): """ Return the model parameters packed into a single vector. Parameters ---------- use_sqrt : bool If True, the Cholesky square root of `cov_re` is included in the packed result. Otherwise the lower triangle of `cov_re` is included. has_fe : bool If True, the fixed effects parameters are included in the packed result, otherwise they are omitted. """ if self.k_re > 0: if use_sqrt: try: L = np.linalg.cholesky(self.cov_re) except np.linalg.LinAlgError: L = np.diag(np.sqrt(np.diag(self.cov_re))) cpa = L[self._ix] else: cpa = self.cov_re[self._ix] else: cpa = np.zeros(0) if use_sqrt: vcomp = np.sqrt(self.vcomp) else: vcomp = self.vcomp if has_fe: pa = np.concatenate((self.fe_params, cpa, vcomp)) else: pa = np.concatenate((cpa, vcomp)) return pa def _smw_solver(s, A, AtA, Qi, di): r""" Returns a solver for the linear system: .. math:: (sI + ABA^\prime) y = x The returned function f satisfies f(x) = y as defined above. B and its inverse matrix are block diagonal. The upper left block of :math:`B^{-1}` is Qi and its lower right block is diag(di). Parameters ---------- s : scalar See above for usage A : ndarray p x q matrix, in general q << p, may be sparse. AtA : square ndarray :math:`A^\prime A`, a q x q matrix. Qi : square symmetric ndarray The matrix `B` is q x q, where q = r + d. `B` consists of a r x r diagonal block whose inverse is `Qi`, and a d x d diagonal block, whose inverse is diag(di). di : 1d array_like See documentation for Qi. Returns ------- A function for solving a linear system, as documented above. Notes ----- Uses Sherman-Morrison-Woodbury identity: https://en.wikipedia.org/wiki/Woodbury_matrix_identity """ # Use SMW identity qmat = AtA / s m = Qi.shape[0] qmat[0:m, 0:m] += Qi if sparse.issparse(A): qmat[m:, m:] += sparse.diags(di) def solver(rhs): ql = A.T.dot(rhs) # Based on profiling, the next line can be the # majority of the entire run time of fitting the model. ql = sparse.linalg.spsolve(qmat, ql) if ql.ndim < rhs.ndim: # spsolve squeezes nx1 rhs ql = ql[:, None] ql = A.dot(ql) return rhs / s - ql / s**2 else: d = qmat.shape[0] qmat.flat[m*(d+1)::d+1] += di qmati = np.linalg.solve(qmat, A.T) def solver(rhs): # A is tall and qmati is wide, so we want # A * (qmati * rhs) not (A * qmati) * rhs ql = np.dot(qmati, rhs) ql = np.dot(A, ql) return rhs / s - ql / s**2 return solver def _smw_logdet(s, A, AtA, Qi, di, B_logdet): r""" Returns the log determinant of .. math:: sI + ABA^\prime Uses the matrix determinant lemma to accelerate the calculation. B is assumed to be positive definite, and s > 0, therefore the determinant is positive. Parameters ---------- s : positive scalar See above for usage A : ndarray p x q matrix, in general q << p. AtA : square ndarray :math:`A^\prime A`, a q x q matrix. Qi : square symmetric ndarray The matrix `B` is q x q, where q = r + d. `B` consists of a r x r diagonal block whose inverse is `Qi`, and a d x d diagonal block, whose inverse is diag(di). di : 1d array_like See documentation for Qi. B_logdet : real The log determinant of B Returns ------- The log determinant of s*I + A*B*A'. Notes ----- Uses the matrix determinant lemma: https://en.wikipedia.org/wiki/Matrix_determinant_lemma """ p = A.shape[0] ld = p * np.log(s) qmat = AtA / s m = Qi.shape[0] qmat[0:m, 0:m] += Qi if sparse.issparse(qmat): qmat[m:, m:] += sparse.diags(di) # There are faster but much more difficult ways to do this # https://stackoverflow.com/questions/19107617 lu = sparse.linalg.splu(qmat) dl = lu.L.diagonal().astype(np.complex128) du = lu.U.diagonal().astype(np.complex128) ld1 = np.log(dl).sum() + np.log(du).sum() ld1 = ld1.real else: d = qmat.shape[0] qmat.flat[m*(d+1)::d+1] += di _, ld1 = np.linalg.slogdet(qmat) return B_logdet + ld + ld1 def _convert_vc(exog_vc): vc_names = [] vc_colnames = [] vc_mats = [] # Get the groups in sorted order groups = set() for k, v in exog_vc.items(): groups |= set(v.keys()) groups = list(groups) groups.sort() for k, v in exog_vc.items(): vc_names.append(k) colnames, mats = [], [] for g in groups: try: colnames.append(v[g].columns) except AttributeError: colnames.append([str(j) for j in range(v[g].shape[1])]) mats.append(v[g]) vc_colnames.append(colnames) vc_mats.append(mats) ii = np.argsort(vc_names) vc_names = [vc_names[i] for i in ii] vc_colnames = [vc_colnames[i] for i in ii] vc_mats = [vc_mats[i] for i in ii] return VCSpec(vc_names, vc_colnames, vc_mats) class MixedLM(base.LikelihoodModel): """ Linear Mixed Effects Model Parameters ---------- endog : 1d array_like The dependent variable exog : 2d array_like A matrix of covariates used to determine the mean structure (the "fixed effects" covariates). groups : 1d array_like A vector of labels determining the groups -- data from different groups are independent exog_re : 2d array_like A matrix of covariates used to determine the variance and covariance structure (the "random effects" covariates). If None, defaults to a random intercept for each group. exog_vc : VCSpec instance or dict-like (deprecated) A VCSPec instance defines the structure of the variance components in the model. Alternatively, see notes below for a dictionary-based format. The dictionary format is deprecated and may be removed at some point in the future. use_sqrt : bool If True, optimization is carried out using the lower triangle of the square root of the random effects covariance matrix, otherwise it is carried out using the lower triangle of the random effects covariance matrix. missing : str The approach to missing data handling Notes ----- If `exog_vc` is not a `VCSpec` instance, then it must be a dictionary of dictionaries. Specifically, `exog_vc[a][g]` is a matrix whose columns are linearly combined using independent random coefficients. This random term then contributes to the variance structure of the data for group `g`. The random coefficients all have mean zero, and have the same variance. The matrix must be `m x k`, where `m` is the number of observations in group `g`. The number of columns may differ among the top-level groups. The covariates in `exog`, `exog_re` and `exog_vc` may (but need not) partially or wholly overlap. `use_sqrt` should almost always be set to True. The main use case for use_sqrt=False is when complicated patterns of fixed values in the covariance structure are set (using the `free` argument to `fit`) that cannot be expressed in terms of the Cholesky factor L. Examples -------- A basic mixed model with fixed effects for the columns of ``exog`` and a random intercept for each distinct value of ``group``: >>> model = sm.MixedLM(endog, exog, groups) >>> result = model.fit() A mixed model with fixed effects for the columns of ``exog`` and correlated random coefficients for the columns of ``exog_re``: >>> model = sm.MixedLM(endog, exog, groups, exog_re=exog_re) >>> result = model.fit() A mixed model with fixed effects for the columns of ``exog`` and independent random coefficients for the columns of ``exog_re``: >>> free = MixedLMParams.from_components( fe_params=np.ones(exog.shape[1]), cov_re=np.eye(exog_re.shape[1])) >>> model = sm.MixedLM(endog, exog, groups, exog_re=exog_re) >>> result = model.fit(free=free) A different way to specify independent random coefficients for the columns of ``exog_re``. In this example ``groups`` must be a Pandas Series with compatible indexing with ``exog_re``, and ``exog_re`` has two columns. >>> g = pd.groupby(groups, by=groups).groups >>> vc = {} >>> vc['1'] = {k : exog_re.loc[g[k], 0] for k in g} >>> vc['2'] = {k : exog_re.loc[g[k], 1] for k in g} >>> model = sm.MixedLM(endog, exog, groups, vcomp=vc) >>> result = model.fit() """ def __init__(self, endog, exog, groups, exog_re=None, exog_vc=None, use_sqrt=True, missing='none', **kwargs): _allowed_kwargs = ["missing_idx", "design_info", "formula"] for x in kwargs.keys(): if x not in _allowed_kwargs: raise ValueError( "argument %s not permitted for MixedLM initialization" % x) self.use_sqrt = use_sqrt # Some defaults self.reml = True self.fe_pen = None self.re_pen = None if isinstance(exog_vc, dict): warnings.warn("Using deprecated variance components format") # Convert from old to new representation exog_vc = _convert_vc(exog_vc) if exog_vc is not None: self.k_vc = len(exog_vc.names) self.exog_vc = exog_vc else: self.k_vc = 0 self.exog_vc = VCSpec([], [], []) # If there is one covariate, it may be passed in as a column # vector, convert these to 2d arrays. # TODO: Can this be moved up in the class hierarchy? # yes, it should be done up the hierarchy if (exog is not None and data_tools._is_using_ndarray_type(exog, None) and exog.ndim == 1): exog = exog[:, None] if (exog_re is not None and data_tools._is_using_ndarray_type(exog_re, None) and exog_re.ndim == 1): exog_re = exog_re[:, None] # Calling super creates self.endog, etc. as ndarrays and the # original exog, endog, etc. are self.data.endog, etc. super().__init__(endog, exog, groups=groups, exog_re=exog_re, missing=missing, **kwargs) self._init_keys.extend(["use_sqrt", "exog_vc"]) # Number of fixed effects parameters self.k_fe = exog.shape[1] if exog_re is None and len(self.exog_vc.names) == 0: # Default random effects structure (random intercepts). self.k_re = 1 self.k_re2 = 1 self.exog_re = np.ones((len(endog), 1), dtype=np.float64) self.data.exog_re = self.exog_re names = ['Group Var'] self.data.param_names = self.exog_names + names self.data.exog_re_names = names self.data.exog_re_names_full = names elif exog_re is not None: # Process exog_re the same way that exog is handled # upstream # TODO: this is wrong and should be handled upstream wholly self.data.exog_re = exog_re self.exog_re = np.asarray(exog_re) if self.exog_re.ndim == 1: self.exog_re = self.exog_re[:, None] # Model dimensions # Number of random effect covariates self.k_re = self.exog_re.shape[1] # Number of covariance parameters self.k_re2 = self.k_re * (self.k_re + 1) // 2 else: # All random effects are variance components self.k_re = 0 self.k_re2 = 0 if not self.data._param_names: # HACK: could have been set in from_formula already # needs refactor (param_names, exog_re_names, exog_re_names_full) = self._make_param_names(exog_re) self.data.param_names = param_names self.data.exog_re_names = exog_re_names self.data.exog_re_names_full = exog_re_names_full self.k_params = self.k_fe + self.k_re2 # Convert the data to the internal representation, which is a # list of arrays, corresponding to the groups. group_labels = list(set(groups)) group_labels.sort() row_indices = {s: [] for s in group_labels} for i, g in enumerate(groups): row_indices[g].append(i) self.row_indices = row_indices self.group_labels = group_labels self.n_groups = len(self.group_labels) # Split the data by groups self.endog_li = self.group_list(self.endog) self.exog_li = self.group_list(self.exog) self.exog_re_li = self.group_list(self.exog_re) # Precompute this. if self.exog_re is None: self.exog_re2_li = None else: self.exog_re2_li = [np.dot(x.T, x) for x in self.exog_re_li] # The total number of observations, summed over all groups self.nobs = len(self.endog) self.n_totobs = self.nobs # Set the fixed effects parameter names if self.exog_names is None: self.exog_names = ["FE%d" % (k + 1) for k in range(self.exog.shape[1])] # Precompute this self._aex_r = [] self._aex_r2 = [] for i in range(self.n_groups): a = self._augment_exog(i) self._aex_r.append(a) ma = _dot(a.T, a) self._aex_r2.append(ma) # Precompute this self._lin, self._quad = self._reparam() def _make_param_names(self, exog_re): """ Returns the full parameter names list, just the exogenous random effects variables, and the exogenous random effects variables with the interaction terms. """ exog_names = list(self.exog_names) exog_re_names = _get_exog_re_names(self, exog_re) param_names = [] jj = self.k_fe for i in range(len(exog_re_names)): for j in range(i + 1): if i == j: param_names.append(exog_re_names[i] + " Var") else: param_names.append(exog_re_names[j] + " x " + exog_re_names[i] + " Cov") jj += 1 vc_names = [x + " Var" for x in self.exog_vc.names] return exog_names + param_names + vc_names, exog_re_names, param_names @classmethod def from_formula(cls, formula, data, re_formula=None, vc_formula=None, subset=None, use_sparse=False, missing='none', *args, **kwargs): """ Create a Model from a formula and dataframe. Parameters ---------- formula : str or generic Formula object The formula specifying the model data : array_like The data for the model. See Notes. re_formula : str A one-sided formula defining the variance structure of the model. The default gives a random intercept for each group. vc_formula : dict-like Formulas describing variance components. `vc_formula[vc]` is the formula for the component with variance parameter named `vc`. The formula is processed into a matrix, and the columns of this matrix are linearly combined with independent random coefficients having mean zero and a common variance. subset : array_like An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a `pandas.DataFrame` missing : str Either 'none' or 'drop' args : extra arguments These are passed to the model kwargs : extra keyword arguments These are passed to the model with one exception. The ``eval_env`` keyword is passed to patsy. It can be either a :class:`patsy:patsy.EvalEnvironment` object or an integer indicating the depth of the namespace to use. For example, the default ``eval_env=0`` uses the calling namespace. If you wish to use a "clean" environment set ``eval_env=-1``. Returns ------- model : Model instance Notes ----- `data` must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame. If the variance component is intended to produce random intercepts for disjoint subsets of a group, specified by string labels or a categorical data value, always use '0 +' in the formula so that no overall intercept is included. If the variance components specify random slopes and you do not also want a random group-level intercept in the model, then use '0 +' in the formula to exclude the intercept. The variance components formulas are processed separately for each group. If a variable is categorical the results will not be affected by whether the group labels are distinct or re-used over the top-level groups. Examples -------- Suppose we have data from an educational study with students nested in classrooms nested in schools. The students take a test, and we want to relate the test scores to the students' ages, while accounting for the effects of classrooms and schools. The school will be the top-level group, and the classroom is a nested group that is specified as a variance component. Note that the schools may have different number of classrooms, and the classroom labels may (but need not be) different across the schools. >>> vc = {'classroom': '0 + C(classroom)'} >>> MixedLM.from_formula('test_score ~ age', vc_formula=vc, \ re_formula='1', groups='school', data=data) Now suppose we also have a previous test score called 'pretest'. If we want the relationship between pretest scores and the current test to vary by classroom, we can specify a random slope for the pretest score >>> vc = {'classroom': '0 + C(classroom)', 'pretest': '0 + pretest'} >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, \ re_formula='1', groups='school', data=data) The following model is almost equivalent to the previous one, but here the classroom random intercept and pretest slope may be correlated. >>> vc = {'classroom': '0 + C(classroom)'} >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, \ re_formula='1 + pretest', groups='school', \ data=data) """ if "groups" not in kwargs.keys(): raise AttributeError("'groups' is a required keyword argument " + "in MixedLM.from_formula") groups = kwargs["groups"] # If `groups` is a variable name, retrieve the data for the # groups variable. group_name = "Group" if isinstance(groups, str): group_name = groups groups = np.asarray(data[groups]) else: groups = np.asarray(groups) del kwargs["groups"] # Bypass all upstream missing data handling to properly handle # variance components if missing == 'drop': data, groups = _handle_missing(data, groups, formula, re_formula, vc_formula) missing = 'none' if re_formula is not None: if re_formula.strip() == "1": # Work around Patsy bug, fixed by 0.3. exog_re = np.ones((data.shape[0], 1)) exog_re_names = [group_name] else: eval_env = kwargs.get('eval_env', None) if eval_env is None: eval_env = 1 elif eval_env == -1: from patsy import EvalEnvironment eval_env = EvalEnvironment({}) exog_re = patsy.dmatrix(re_formula, data, eval_env=eval_env) exog_re_names = exog_re.design_info.column_names exog_re_names = [x.replace("Intercept", group_name) for x in exog_re_names] exog_re = np.asarray(exog_re) if exog_re.ndim == 1: exog_re = exog_re[:, None] else: exog_re = None if vc_formula is None: exog_re_names = [group_name] else: exog_re_names = [] if vc_formula is not None: eval_env = kwargs.get('eval_env', None) if eval_env is None: eval_env = 1 elif eval_env == -1: from patsy import EvalEnvironment eval_env = EvalEnvironment({}) vc_mats = [] vc_colnames = [] vc_names = [] gb = data.groupby(groups) kylist = sorted(gb.groups.keys()) vcf = sorted(vc_formula.keys()) for vc_name in vcf: md = patsy.ModelDesc.from_formula(vc_formula[vc_name]) vc_names.append(vc_name) evc_mats, evc_colnames = [], [] for group_ix, group in enumerate(kylist): ii = gb.groups[group] mat = patsy.dmatrix( md, data.loc[ii, :], eval_env=eval_env, return_type='dataframe') evc_colnames.append(mat.columns.tolist()) if use_sparse: evc_mats.append(sparse.csr_matrix(mat)) else: evc_mats.append(np.asarray(mat)) vc_mats.append(evc_mats) vc_colnames.append(evc_colnames) exog_vc = VCSpec(vc_names, vc_colnames, vc_mats) else: exog_vc = VCSpec([], [], []) kwargs["subset"] = None kwargs["exog_re"] = exog_re kwargs["exog_vc"] = exog_vc kwargs["groups"] = groups mod = super().from_formula(formula, data, *args, **kwargs) # expand re names to account for pairs of RE (param_names, exog_re_names, exog_re_names_full) = mod._make_param_names(exog_re_names) mod.data.param_names = param_names mod.data.exog_re_names = exog_re_names mod.data.exog_re_names_full = exog_re_names_full if vc_formula is not None: mod.data.vcomp_names = mod.exog_vc.names return mod def predict(self, params, exog=None): """ Return predicted values from a design matrix. Parameters ---------- params : array_like Parameters of a mixed linear model. Can be either a MixedLMParams instance, or a vector containing the packed model parameters in which the fixed effects parameters are at the beginning of the vector, or a vector containing only the fixed effects parameters. exog : array_like, optional Design / exogenous data for the fixed effects. Model exog is used if None. Returns ------- An array of fitted values. Note that these predicted values only reflect the fixed effects mean structure of the model. """ if exog is None: exog = self.exog if isinstance(params, MixedLMParams): params = params.fe_params else: params = params[0:self.k_fe] return np.dot(exog, params) def group_list(self, array): """ Returns `array` split into subarrays corresponding to the grouping structure. """ if array is None: return None if array.ndim == 1: return [np.array(array[self.row_indices[k]]) for k in self.group_labels] else: return [np.array(array[self.row_indices[k], :]) for k in self.group_labels] def fit_regularized(self, start_params=None, method='l1', alpha=0, ceps=1e-4, ptol=1e-6, maxit=200, **fit_kwargs): """ Fit a model in which the fixed effects parameters are penalized. The dependence parameters are held fixed at their estimated values in the unpenalized model. Parameters ---------- method : str of Penalty object Method for regularization. If a string, must be 'l1'. alpha : array_like Scalar or vector of penalty weights. If a scalar, the same weight is applied to all coefficients; if a vector, it contains a weight for each coefficient. If method is a Penalty object, the weights are scaled by alpha. For L1 regularization, the weights are used directly. ceps : positive real scalar Fixed effects parameters smaller than this value in magnitude are treated as being zero. ptol : positive real scalar Convergence occurs when the sup norm difference between successive values of `fe_params` is less than `ptol`. maxit : int The maximum number of iterations. **fit_kwargs Additional keyword arguments passed to fit. Returns ------- A MixedLMResults instance containing the results. Notes ----- The covariance structure is not updated as the fixed effects parameters are varied. The algorithm used here for L1 regularization is a"shooting" or cyclic coordinate descent algorithm. If method is 'l1', then `fe_pen` and `cov_pen` are used to obtain the covariance structure, but are ignored during the L1-penalized fitting. References ---------- Friedman, J. H., Hastie, T. and Tibshirani, R. Regularized Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1) (2008) http://www.jstatsoft.org/v33/i01/paper http://statweb.stanford.edu/~tibs/stat315a/Supplements/fuse.pdf """ if isinstance(method, str) and (method.lower() != 'l1'): raise ValueError("Invalid regularization method") # If method is a smooth penalty just optimize directly. if isinstance(method, Penalty): # Scale the penalty weights by alpha method.alpha = alpha fit_kwargs.update({"fe_pen": method}) return self.fit(**fit_kwargs) if np.isscalar(alpha): alpha = alpha * np.ones(self.k_fe, dtype=np.float64) # Fit the unpenalized model to get the dependence structure. mdf = self.fit(**fit_kwargs) fe_params = mdf.fe_params cov_re = mdf.cov_re vcomp = mdf.vcomp scale = mdf.scale try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = None for itr in range(maxit): fe_params_s = fe_params.copy() for j in range(self.k_fe): if abs(fe_params[j]) < ceps: continue # The residuals fe_params[j] = 0. expval = np.dot(self.exog, fe_params) resid_all = self.endog - expval # The loss function has the form # a*x^2 + b*x + pwt*|x| a, b = 0., 0. for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] resid = resid_all[self.row_indices[group]] solver = _smw_solver(scale, ex_r, ex2_r, cov_re_inv, 1 / vc_var) x = exog[:, j] u = solver(x) a += np.dot(u, x) b -= 2 * np.dot(u, resid) pwt1 = alpha[j] if b > pwt1: fe_params[j] = -(b - pwt1) / (2 * a) elif b < -pwt1: fe_params[j] = -(b + pwt1) / (2 * a) if np.abs(fe_params_s - fe_params).max() < ptol: break # Replace the fixed effects estimates with their penalized # values, leave the dependence parameters in their unpenalized # state. params_prof = mdf.params.copy() params_prof[0:self.k_fe] = fe_params scale = self.get_scale(fe_params, mdf.cov_re_unscaled, mdf.vcomp) # Get the Hessian including only the nonzero fixed effects, # then blow back up to the full size after inverting. hess, sing = self.hessian(params_prof) if sing: warnings.warn(_warn_cov_sing) pcov = np.nan * np.ones_like(hess) ii = np.abs(params_prof) > ceps ii[self.k_fe:] = True ii = np.flatnonzero(ii) hess1 = hess[ii, :][:, ii] pcov[np.ix_(ii, ii)] = np.linalg.inv(-hess1) params_object = MixedLMParams.from_components(fe_params, cov_re=cov_re) results = MixedLMResults(self, params_prof, pcov / scale) results.params_object = params_object results.fe_params = fe_params results.cov_re = cov_re results.vcomp = vcomp results.scale = scale results.cov_re_unscaled = mdf.cov_re_unscaled results.method = mdf.method results.converged = True results.cov_pen = self.cov_pen results.k_fe = self.k_fe results.k_re = self.k_re results.k_re2 = self.k_re2 results.k_vc = self.k_vc return MixedLMResultsWrapper(results) def get_fe_params(self, cov_re, vcomp, tol=1e-10): """ Use GLS to update the fixed effects parameter estimates. Parameters ---------- cov_re : array_like (2d) The covariance matrix of the random effects. vcomp : array_like (1d) The variance components. tol : float A tolerance parameter to determine when covariances are singular. Returns ------- params : ndarray The GLS estimates of the fixed effects parameters. singular : bool True if the covariance is singular """ if self.k_fe == 0: return np.array([]), False sing = False if self.k_re == 0: cov_re_inv = np.empty((0, 0)) else: w, v = np.linalg.eigh(cov_re) if w.min() < tol: # Singular, use pseudo-inverse sing = True ii = np.flatnonzero(w >= tol) if len(ii) == 0: cov_re_inv = np.zeros_like(cov_re) else: vi = v[:, ii] wi = w[ii] cov_re_inv = np.dot(vi / wi, vi.T) else: cov_re_inv = np.linalg.inv(cov_re) # Cache these quantities that do not change. if not hasattr(self, "_endex_li"): self._endex_li = [] for group_ix, _ in enumerate(self.group_labels): mat = np.concatenate( (self.exog_li[group_ix], self.endog_li[group_ix][:, None]), axis=1) self._endex_li.append(mat) xtxy = 0. for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) if vc_var.size > 0: if vc_var.min() < tol: # Pseudo-inverse sing = True ii = np.flatnonzero(vc_var >= tol) vc_vari = np.zeros_like(vc_var) vc_vari[ii] = 1 / vc_var[ii] else: vc_vari = 1 / vc_var else: vc_vari = np.empty(0) exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, vc_vari) u = solver(self._endex_li[group_ix]) xtxy += np.dot(exog.T, u) if sing: fe_params = np.dot(np.linalg.pinv(xtxy[:, 0:-1]), xtxy[:, -1]) else: fe_params = np.linalg.solve(xtxy[:, 0:-1], xtxy[:, -1]) return fe_params, sing def _reparam(self): """ Returns parameters of the map converting parameters from the form used in optimization to the form returned to the user. Returns ------- lin : list-like Linear terms of the map quad : list-like Quadratic terms of the map Notes ----- If P are the standard form parameters and R are the transformed parameters (i.e. with the Cholesky square root covariance and square root transformed variance components), then P[i] = lin[i] * R + R' * quad[i] * R """ k_fe, k_re, k_re2, k_vc = self.k_fe, self.k_re, self.k_re2, self.k_vc k_tot = k_fe + k_re2 + k_vc ix = np.tril_indices(self.k_re) lin = [] for k in range(k_fe): e = np.zeros(k_tot) e[k] = 1 lin.append(e) for k in range(k_re2): lin.append(np.zeros(k_tot)) for k in range(k_vc): lin.append(np.zeros(k_tot)) quad = [] # Quadratic terms for fixed effects. for k in range(k_tot): quad.append(np.zeros((k_tot, k_tot))) # Quadratic terms for random effects covariance. ii = np.tril_indices(k_re) ix = [(a, b) for a, b in zip(ii[0], ii[1])] for i1 in range(k_re2): for i2 in range(k_re2): ix1 = ix[i1] ix2 = ix[i2] if (ix1[1] == ix2[1]) and (ix1[0] <= ix2[0]): ii = (ix2[0], ix1[0]) k = ix.index(ii) quad[k_fe+k][k_fe+i2, k_fe+i1] += 1 for k in range(k_tot): quad[k] = 0.5*(quad[k] + quad[k].T) # Quadratic terms for variance components. km = k_fe + k_re2 for k in range(km, km+k_vc): quad[k][k, k] = 1 return lin, quad def _expand_vcomp(self, vcomp, group_ix): """ Replicate variance parameters to match a group's design. Parameters ---------- vcomp : array_like The variance parameters for the variance components. group_ix : int The group index Returns an expanded version of vcomp, in which each variance parameter is copied as many times as there are independent realizations of the variance component in the given group. """ if len(vcomp) == 0: return np.empty(0) vc_var = [] for j in range(len(self.exog_vc.names)): d = self.exog_vc.mats[j][group_ix].shape[1] vc_var.append(vcomp[j] * np.ones(d)) if len(vc_var) > 0: return np.concatenate(vc_var) else: # Cannot reach here? return np.empty(0) def _augment_exog(self, group_ix): """ Concatenate the columns for variance components to the columns for other random effects to obtain a single random effects exog matrix for a given group. """ ex_r = self.exog_re_li[group_ix] if self.k_re > 0 else None if self.k_vc == 0: return ex_r ex = [ex_r] if self.k_re > 0 else [] any_sparse = False for j, _ in enumerate(self.exog_vc.names): ex.append(self.exog_vc.mats[j][group_ix]) any_sparse |= sparse.issparse(ex[-1]) if any_sparse: for j, x in enumerate(ex): if not sparse.issparse(x): ex[j] = sparse.csr_matrix(x) ex = sparse.hstack(ex) ex = sparse.csr_matrix(ex) else: ex = np.concatenate(ex, axis=1) return ex def loglike(self, params, profile_fe=True): """ Evaluate the (profile) log-likelihood of the linear mixed effects model. Parameters ---------- params : MixedLMParams, or array_like. The parameter value. If array-like, must be a packed parameter vector containing only the covariance parameters. profile_fe : bool If True, replace the provided value of `fe_params` with the GLS estimates. Returns ------- The log-likelihood value at `params`. Notes ----- The scale parameter `scale` is always profiled out of the log-likelihood. In addition, if `profile_fe` is true the fixed effects parameters are also profiled out. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.k_re, self.use_sqrt, has_fe=False) cov_re = params.cov_re vcomp = params.vcomp # Move to the profile set if profile_fe: fe_params, sing = self.get_fe_params(cov_re, vcomp) if sing: self._cov_sing += 1 else: fe_params = params.fe_params if self.k_re > 0: try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = np.linalg.pinv(cov_re) self._cov_sing += 1 _, cov_re_logdet = np.linalg.slogdet(cov_re) else: cov_re_inv = np.zeros((0, 0)) cov_re_logdet = 0 # The residuals expval = np.dot(self.exog, fe_params) resid_all = self.endog - expval likeval = 0. # Handle the covariance penalty if (self.cov_pen is not None) and (self.k_re > 0): likeval -= self.cov_pen.func(cov_re, cov_re_inv) # Handle the fixed effects penalty if (self.fe_pen is not None): likeval -= self.fe_pen.func(fe_params) xvx, qf = 0., 0. for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) cov_aug_logdet = cov_re_logdet + np.sum(np.log(vc_var)) exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var) resid = resid_all[self.row_indices[group]] # Part 1 of the log likelihood (for both ML and REML) ld = _smw_logdet(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var, cov_aug_logdet) likeval -= ld / 2. # Part 2 of the log likelihood (for both ML and REML) u = solver(resid) qf += np.dot(resid, u) # Adjustment for REML if self.reml: mat = solver(exog) xvx += np.dot(exog.T, mat) if self.reml: likeval -= (self.n_totobs - self.k_fe) * np.log(qf) / 2. _, ld = np.linalg.slogdet(xvx) likeval -= ld / 2. likeval -= (self.n_totobs - self.k_fe) * np.log(2 * np.pi) / 2. likeval += ((self.n_totobs - self.k_fe) * np.log(self.n_totobs - self.k_fe) / 2.) likeval -= (self.n_totobs - self.k_fe) / 2. else: likeval -= self.n_totobs * np.log(qf) / 2. likeval -= self.n_totobs * np.log(2 * np.pi) / 2. likeval += self.n_totobs * np.log(self.n_totobs) / 2. likeval -= self.n_totobs / 2. return likeval def _gen_dV_dPar(self, ex_r, solver, group_ix, max_ix=None): """ A generator that yields the element-wise derivative of the marginal covariance matrix with respect to the random effects variance and covariance parameters. ex_r : array_like The random effects design matrix solver : function A function that given x returns V^{-1}x, where V is the group's marginal covariance matrix. group_ix : int The group index max_ix : {int, None} If not None, the generator ends when this index is reached. """ axr = solver(ex_r) # Regular random effects jj = 0 for j1 in range(self.k_re): for j2 in range(j1 + 1): if max_ix is not None and jj > max_ix: return # Need 2d mat_l, mat_r = ex_r[:, j1:j1+1], ex_r[:, j2:j2+1] vsl, vsr = axr[:, j1:j1+1], axr[:, j2:j2+1] yield jj, mat_l, mat_r, vsl, vsr, j1 == j2 jj += 1 # Variance components for j, _ in enumerate(self.exog_vc.names): if max_ix is not None and jj > max_ix: return mat = self.exog_vc.mats[j][group_ix] axmat = solver(mat) yield jj, mat, mat, axmat, axmat, True jj += 1 def score(self, params, profile_fe=True): """ Returns the score vector of the profile log-likelihood. Notes ----- The score vector that is returned is computed with respect to the parameterization defined by this model instance's `use_sqrt` attribute. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed( params, self.k_fe, self.k_re, self.use_sqrt, has_fe=False) if profile_fe: params.fe_params, sing = \ self.get_fe_params(params.cov_re, params.vcomp) if sing: msg = "Random effects covariance is singular" warnings.warn(msg) if self.use_sqrt: score_fe, score_re, score_vc = self.score_sqrt( params, calc_fe=not profile_fe) else: score_fe, score_re, score_vc = self.score_full( params, calc_fe=not profile_fe) if self._freepat is not None: score_fe *= self._freepat.fe_params score_re *= self._freepat.cov_re[self._freepat._ix] score_vc *= self._freepat.vcomp if profile_fe: return np.concatenate((score_re, score_vc)) else: return np.concatenate((score_fe, score_re, score_vc)) def score_full(self, params, calc_fe): """ Returns the score with respect to untransformed parameters. Calculates the score vector for the profiled log-likelihood of the mixed effects model with respect to the parameterization in which the random effects covariance matrix is represented in its full form (not using the Cholesky factor). Parameters ---------- params : MixedLMParams or array_like The parameter at which the score function is evaluated. If array-like, must contain the packed random effects parameters (cov_re and vcomp) without fe_params. calc_fe : bool If True, calculate the score vector for the fixed effects parameters. If False, this vector is not calculated, and a vector of zeros is returned in its place. Returns ------- score_fe : array_like The score vector with respect to the fixed effects parameters. score_re : array_like The score vector with respect to the random effects parameters (excluding variance components parameters). score_vc : array_like The score vector with respect to variance components parameters. Notes ----- `score_re` is taken with respect to the parameterization in which `cov_re` is represented through its lower triangle (without taking the Cholesky square root). """ fe_params = params.fe_params cov_re = params.cov_re vcomp = params.vcomp try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = np.linalg.pinv(cov_re) self._cov_sing += 1 score_fe = np.zeros(self.k_fe) score_re = np.zeros(self.k_re2) score_vc = np.zeros(self.k_vc) # Handle the covariance penalty. if self.cov_pen is not None: score_re -= self.cov_pen.deriv(cov_re, cov_re_inv) # Handle the fixed effects penalty. if calc_fe and (self.fe_pen is not None): score_fe -= self.fe_pen.deriv(fe_params) # resid' V^{-1} resid, summed over the groups (a scalar) rvir = 0. # exog' V^{-1} resid, summed over the groups (a k_fe # dimensional vector) xtvir = 0. # exog' V^{_1} exog, summed over the groups (a k_fe x k_fe # matrix) xtvix = 0. # V^{-1} exog' dV/dQ_jj exog V^{-1}, where Q_jj is the jj^th # covariance parameter. xtax = [0., ] * (self.k_re2 + self.k_vc) # Temporary related to the gradient of log |V| dlv = np.zeros(self.k_re2 + self.k_vc) # resid' V^{-1} dV/dQ_jj V^{-1} resid (a scalar) rvavr = np.zeros(self.k_re2 + self.k_vc) for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var) # The residuals resid = self.endog_li[group_ix] if self.k_fe > 0: expval = np.dot(exog, fe_params) resid = resid - expval if self.reml: viexog = solver(exog) xtvix += np.dot(exog.T, viexog) # Contributions to the covariance parameter gradient vir = solver(resid) for (jj, matl, matr, vsl, vsr, sym) in\ self._gen_dV_dPar(ex_r, solver, group_ix): dlv[jj] = _dotsum(matr, vsl) if not sym: dlv[jj] += _dotsum(matl, vsr) ul = _dot(vir, matl) ur = ul.T if sym else _dot(matr.T, vir) ulr = np.dot(ul, ur) rvavr[jj] += ulr if not sym: rvavr[jj] += ulr.T if self.reml: ul = _dot(viexog.T, matl) ur = ul.T if sym else _dot(matr.T, viexog) ulr = np.dot(ul, ur) xtax[jj] += ulr if not sym: xtax[jj] += ulr.T # Contribution of log|V| to the covariance parameter # gradient. if self.k_re > 0: score_re -= 0.5 * dlv[0:self.k_re2] if self.k_vc > 0: score_vc -= 0.5 * dlv[self.k_re2:] rvir += np.dot(resid, vir) if calc_fe: xtvir += np.dot(exog.T, vir) fac = self.n_totobs if self.reml: fac -= self.k_fe if calc_fe and self.k_fe > 0: score_fe += fac * xtvir / rvir if self.k_re > 0: score_re += 0.5 * fac * rvavr[0:self.k_re2] / rvir if self.k_vc > 0: score_vc += 0.5 * fac * rvavr[self.k_re2:] / rvir if self.reml: xtvixi = np.linalg.inv(xtvix) for j in range(self.k_re2): score_re[j] += 0.5 * _dotsum(xtvixi.T, xtax[j]) for j in range(self.k_vc): score_vc[j] += 0.5 * _dotsum(xtvixi.T, xtax[self.k_re2 + j]) return score_fe, score_re, score_vc def score_sqrt(self, params, calc_fe=True): """ Returns the score with respect to transformed parameters. Calculates the score vector with respect to the parameterization in which the random effects covariance matrix is represented through its Cholesky square root. Parameters ---------- params : MixedLMParams or array_like The model parameters. If array-like must contain packed parameters that are compatible with this model instance. calc_fe : bool If True, calculate the score vector for the fixed effects parameters. If False, this vector is not calculated, and a vector of zeros is returned in its place. Returns ------- score_fe : array_like The score vector with respect to the fixed effects parameters. score_re : array_like The score vector with respect to the random effects parameters (excluding variance components parameters). score_vc : array_like The score vector with respect to variance components parameters. """ score_fe, score_re, score_vc = self.score_full(params, calc_fe=calc_fe) params_vec = params.get_packed(use_sqrt=True, has_fe=True) score_full = np.concatenate((score_fe, score_re, score_vc)) scr = 0. for i in range(len(params_vec)): v = self._lin[i] + 2 * np.dot(self._quad[i], params_vec) scr += score_full[i] * v score_fe = scr[0:self.k_fe] score_re = scr[self.k_fe:self.k_fe + self.k_re2] score_vc = scr[self.k_fe + self.k_re2:] return score_fe, score_re, score_vc def hessian(self, params): """ Returns the model's Hessian matrix. Calculates the Hessian matrix for the linear mixed effects model with respect to the parameterization in which the covariance matrix is represented directly (without square-root transformation). Parameters ---------- params : MixedLMParams or array_like The model parameters at which the Hessian is calculated. If array-like, must contain the packed parameters in a form that is compatible with this model instance. Returns ------- hess : 2d ndarray The Hessian matrix, evaluated at `params`. sing : boolean If True, the covariance matrix is singular and a pseudo-inverse is returned. """ if type(params) is not MixedLMParams: params = MixedLMParams.from_packed(params, self.k_fe, self.k_re, use_sqrt=self.use_sqrt, has_fe=True) fe_params = params.fe_params vcomp = params.vcomp cov_re = params.cov_re sing = False if self.k_re > 0: try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = np.linalg.pinv(cov_re) sing = True else: cov_re_inv = np.empty((0, 0)) # Blocks for the fixed and random effects parameters. hess_fe = 0. hess_re = np.zeros((self.k_re2 + self.k_vc, self.k_re2 + self.k_vc)) hess_fere = np.zeros((self.k_re2 + self.k_vc, self.k_fe)) fac = self.n_totobs if self.reml: fac -= self.exog.shape[1] rvir = 0. xtvix = 0. xtax = [0., ] * (self.k_re2 + self.k_vc) m = self.k_re2 + self.k_vc B = np.zeros(m) D = np.zeros((m, m)) F = [[0.] * m for k in range(m)] for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) vc_vari = np.zeros_like(vc_var) ii = np.flatnonzero(vc_var >= 1e-10) if len(ii) > 0: vc_vari[ii] = 1 / vc_var[ii] if len(ii) < len(vc_var): sing = True exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, vc_vari) # The residuals resid = self.endog_li[group_ix] if self.k_fe > 0: expval = np.dot(exog, fe_params) resid = resid - expval viexog = solver(exog) xtvix += np.dot(exog.T, viexog) vir = solver(resid) rvir += np.dot(resid, vir) for (jj1, matl1, matr1, vsl1, vsr1, sym1) in\ self._gen_dV_dPar(ex_r, solver, group_ix): ul = _dot(viexog.T, matl1) ur = _dot(matr1.T, vir) hess_fere[jj1, :] += np.dot(ul, ur) if not sym1: ul = _dot(viexog.T, matr1) ur = _dot(matl1.T, vir) hess_fere[jj1, :] += np.dot(ul, ur) if self.reml: ul = _dot(viexog.T, matl1) ur = ul if sym1 else np.dot(viexog.T, matr1) ulr = _dot(ul, ur.T) xtax[jj1] += ulr if not sym1: xtax[jj1] += ulr.T ul = _dot(vir, matl1) ur = ul if sym1 else _dot(vir, matr1) B[jj1] += np.dot(ul, ur) * (1 if sym1 else 2) # V^{-1} * dV/d_theta E = [(vsl1, matr1)] if not sym1: E.append((vsr1, matl1)) for (jj2, matl2, matr2, vsl2, vsr2, sym2) in\ self._gen_dV_dPar(ex_r, solver, group_ix, jj1): re = sum([_multi_dot_three(matr2.T, x[0], x[1].T) for x in E]) vt = 2 * _dot(_multi_dot_three(vir[None, :], matl2, re), vir[:, None]) if not sym2: le = sum([_multi_dot_three(matl2.T, x[0], x[1].T) for x in E]) vt += 2 * _dot(_multi_dot_three( vir[None, :], matr2, le), vir[:, None]) D[jj1, jj2] += np.squeeze(vt) if jj1 != jj2: D[jj2, jj1] += np.squeeze(vt) rt = _dotsum(vsl2, re.T) / 2 if not sym2: rt += _dotsum(vsr2, le.T) / 2 hess_re[jj1, jj2] += rt if jj1 != jj2: hess_re[jj2, jj1] += rt if self.reml: ev = sum([_dot(x[0], _dot(x[1].T, viexog)) for x in E]) u1 = _dot(viexog.T, matl2) u2 = _dot(matr2.T, ev) um = np.dot(u1, u2) F[jj1][jj2] += um + um.T if not sym2: u1 = np.dot(viexog.T, matr2) u2 = np.dot(matl2.T, ev) um = np.dot(u1, u2) F[jj1][jj2] += um + um.T hess_fe -= fac * xtvix / rvir hess_re = hess_re - 0.5 * fac * (D/rvir - np.outer(B, B) / rvir**2) hess_fere = -fac * hess_fere / rvir if self.reml: QL = [np.linalg.solve(xtvix, x) for x in xtax] for j1 in range(self.k_re2 + self.k_vc): for j2 in range(j1 + 1): a = _dotsum(QL[j1].T, QL[j2]) a -= np.trace(np.linalg.solve(xtvix, F[j1][j2])) a *= 0.5 hess_re[j1, j2] += a if j1 > j2: hess_re[j2, j1] += a # Put the blocks together to get the Hessian. m = self.k_fe + self.k_re2 + self.k_vc hess = np.zeros((m, m)) hess[0:self.k_fe, 0:self.k_fe] = hess_fe hess[0:self.k_fe, self.k_fe:] = hess_fere.T hess[self.k_fe:, 0:self.k_fe] = hess_fere hess[self.k_fe:, self.k_fe:] = hess_re return hess, sing def get_scale(self, fe_params, cov_re, vcomp): """ Returns the estimated error variance based on given estimates of the slopes and random effects covariance matrix. Parameters ---------- fe_params : array_like The regression slope estimates cov_re : 2d array_like Estimate of the random effects covariance matrix vcomp : array_like Estimate of the variance components Returns ------- scale : float The estimated error variance. """ try: cov_re_inv = np.linalg.inv(cov_re) except np.linalg.LinAlgError: cov_re_inv = np.linalg.pinv(cov_re) warnings.warn(_warn_cov_sing) qf = 0. for group_ix, group in enumerate(self.group_labels): vc_var = self._expand_vcomp(vcomp, group_ix) exog = self.exog_li[group_ix] ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix] solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var) # The residuals resid = self.endog_li[group_ix] if self.k_fe > 0: expval = np.dot(exog, fe_params) resid = resid - expval mat = solver(resid) qf += np.dot(resid, mat) if self.reml: qf /= (self.n_totobs - self.k_fe) else: qf /= self.n_totobs return qf def fit(self, start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False, method=None, **fit_kwargs): """ Fit a linear mixed model to the data. Parameters ---------- start_params : array_like or MixedLMParams Starting values for the profile log-likelihood. If not a `MixedLMParams` instance, this should be an array containing the packed parameters for the profile log-likelihood, including the fixed effects parameters. reml : bool If true, fit according to the REML likelihood, else fit the standard likelihood using ML. niter_sa : int Currently this argument is ignored and has no effect on the results. cov_pen : CovariancePenalty object A penalty for the random effects covariance matrix do_cg : bool, defaults to True If False, the optimization is skipped and a results object at the given (or default) starting values is returned. fe_pen : Penalty object A penalty on the fixed effects free : MixedLMParams object If not `None`, this is a mask that allows parameters to be held fixed at specified values. A 1 indicates that the corresponding parameter is estimated, a 0 indicates that it is fixed at its starting value. Setting the `cov_re` component to the identity matrix fits a model with independent random effects. Note that some optimization methods do not respect this constraint (bfgs and lbfgs both work). full_output : bool If true, attach iteration history to results method : str Optimization method. Can be a scipy.optimize method name, or a list of such names to be tried in sequence. **fit_kwargs Additional keyword arguments passed to fit. Returns ------- A MixedLMResults instance. """ _allowed_kwargs = ['gtol', 'maxiter', 'eps', 'maxcor', 'ftol', 'tol', 'disp', 'maxls'] for x in fit_kwargs.keys(): if x not in _allowed_kwargs: warnings.warn("Argument %s not used by MixedLM.fit" % x) if method is None: method = ['bfgs', 'lbfgs', 'cg'] elif isinstance(method, str): method = [method] for meth in method: if meth.lower() in ["newton", "ncg"]: raise ValueError( "method %s not available for MixedLM" % meth) self.reml = reml self.cov_pen = cov_pen self.fe_pen = fe_pen self._cov_sing = 0 self._freepat = free if full_output: hist = [] else: hist = None if start_params is None: params = MixedLMParams(self.k_fe, self.k_re, self.k_vc) params.fe_params = np.zeros(self.k_fe) params.cov_re = np.eye(self.k_re) params.vcomp = np.ones(self.k_vc) else: if isinstance(start_params, MixedLMParams): params = start_params else: # It's a packed array if len(start_params) == self.k_fe + self.k_re2 + self.k_vc: params = MixedLMParams.from_packed( start_params, self.k_fe, self.k_re, self.use_sqrt, has_fe=True) elif len(start_params) == self.k_re2 + self.k_vc: params = MixedLMParams.from_packed( start_params, self.k_fe, self.k_re, self.use_sqrt, has_fe=False) else: raise ValueError("invalid start_params") if do_cg: fit_kwargs["retall"] = hist is not None if "disp" not in fit_kwargs: fit_kwargs["disp"] = False packed = params.get_packed(use_sqrt=self.use_sqrt, has_fe=False) if niter_sa > 0: warnings.warn("niter_sa is currently ignored") # Try optimizing one or more times for j in range(len(method)): rslt = super().fit(start_params=packed, skip_hessian=True, method=method[j], **fit_kwargs) if rslt.mle_retvals['converged']: break packed = rslt.params if j + 1 < len(method): next_method = method[j + 1] warnings.warn( "Retrying MixedLM optimization with %s" % next_method, ConvergenceWarning) else: msg = ("MixedLM optimization failed, " + "trying a different optimizer may help.") warnings.warn(msg, ConvergenceWarning) # The optimization succeeded params = np.atleast_1d(rslt.params) if hist is not None: hist.append(rslt.mle_retvals) converged = rslt.mle_retvals['converged'] if not converged: gn = self.score(rslt.params) gn = np.sqrt(np.sum(gn**2)) msg = "Gradient optimization failed, |grad| = %f" % gn warnings.warn(msg, ConvergenceWarning) # Convert to the final parameterization (i.e. undo the square # root transform of the covariance matrix, and the profiling # over the error variance). params = MixedLMParams.from_packed( params, self.k_fe, self.k_re, use_sqrt=self.use_sqrt, has_fe=False) cov_re_unscaled = params.cov_re vcomp_unscaled = params.vcomp fe_params, sing = self.get_fe_params(cov_re_unscaled, vcomp_unscaled) params.fe_params = fe_params scale = self.get_scale(fe_params, cov_re_unscaled, vcomp_unscaled) cov_re = scale * cov_re_unscaled vcomp = scale * vcomp_unscaled f1 = (self.k_re > 0) and (np.min(np.abs(np.diag(cov_re))) < 0.01) f2 = (self.k_vc > 0) and (np.min(np.abs(vcomp)) < 0.01) if f1 or f2: msg = "The MLE may be on the boundary of the parameter space." warnings.warn(msg, ConvergenceWarning) # Compute the Hessian at the MLE. Note that this is the # Hessian with respect to the random effects covariance matrix # (not its square root). It is used for obtaining standard # errors, not for optimization. hess, sing = self.hessian(params) if sing: warnings.warn(_warn_cov_sing) hess_diag = np.diag(hess) if free is not None: pcov = np.zeros_like(hess) pat = self._freepat.get_packed(use_sqrt=False, has_fe=True) ii = np.flatnonzero(pat) hess_diag = hess_diag[ii] if len(ii) > 0: hess1 = hess[np.ix_(ii, ii)] pcov[np.ix_(ii, ii)] = np.linalg.inv(-hess1) else: pcov = np.linalg.inv(-hess) if np.any(hess_diag >= 0): msg = ("The Hessian matrix at the estimated parameter values " + "is not positive definite.") warnings.warn(msg, ConvergenceWarning) # Prepare a results class instance params_packed = params.get_packed(use_sqrt=False, has_fe=True) results = MixedLMResults(self, params_packed, pcov / scale) results.params_object = params results.fe_params = fe_params results.cov_re = cov_re results.vcomp = vcomp results.scale = scale results.cov_re_unscaled = cov_re_unscaled results.method = "REML" if self.reml else "ML" results.converged = converged results.hist = hist results.reml = self.reml results.cov_pen = self.cov_pen results.k_fe = self.k_fe results.k_re = self.k_re results.k_re2 = self.k_re2 results.k_vc = self.k_vc results.use_sqrt = self.use_sqrt results.freepat = self._freepat return MixedLMResultsWrapper(results) def get_distribution(self, params, scale, exog): return _mixedlm_distribution(self, params, scale, exog) class _mixedlm_distribution: """ A private class for simulating data from a given mixed linear model. Parameters ---------- model : MixedLM instance A mixed linear model params : array_like A parameter vector defining a mixed linear model. See notes for more information. scale : scalar The unexplained variance exog : array_like An array of fixed effect covariates. If None, model.exog is used. Notes ----- The params array is a vector containing fixed effects parameters, random effects parameters, and variance component parameters, in that order. The lower triangle of the random effects covariance matrix is stored. The random effects and variance components parameters are divided by the scale parameter. This class is used in Mediation, and possibly elsewhere. """ def __init__(self, model, params, scale, exog): self.model = model self.exog = exog if exog is not None else model.exog po = MixedLMParams.from_packed( params, model.k_fe, model.k_re, False, True) self.fe_params = po.fe_params self.cov_re = scale * po.cov_re self.vcomp = scale * po.vcomp self.scale = scale group_idx = np.zeros(model.nobs, dtype=int) for k, g in enumerate(model.group_labels): group_idx[model.row_indices[g]] = k self.group_idx = group_idx def rvs(self, n): """ Return a vector of simulated values from a mixed linear model. The parameter n is ignored, but required by the interface """ model = self.model # Fixed effects y = np.dot(self.exog, self.fe_params) # Random effects u = np.random.normal(size=(model.n_groups, model.k_re)) u = np.dot(u, np.linalg.cholesky(self.cov_re).T) y += (u[self.group_idx, :] * model.exog_re).sum(1) # Variance components for j, _ in enumerate(model.exog_vc.names): ex = model.exog_vc.mats[j] v = self.vcomp[j] for i, g in enumerate(model.group_labels): exg = ex[i] ii = model.row_indices[g] u = np.random.normal(size=exg.shape[1]) y[ii] += np.sqrt(v) * np.dot(exg, u) # Residual variance y += np.sqrt(self.scale) * np.random.normal(size=len(y)) return y class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin): ''' Class to contain results of fitting a linear mixed effects model. MixedLMResults inherits from statsmodels.LikelihoodModelResults Parameters ---------- See statsmodels.LikelihoodModelResults Attributes ---------- model : class instance Pointer to MixedLM model instance that called fit. normalized_cov_params : ndarray The sampling covariance matrix of the estimates params : ndarray A packed parameter vector for the profile parameterization. The first `k_fe` elements are the estimated fixed effects coefficients. The remaining elements are the estimated variance parameters. The variance parameters are all divided by `scale` and are not the variance parameters shown in the summary. fe_params : ndarray The fitted fixed-effects coefficients cov_re : ndarray The fitted random-effects covariance matrix bse_fe : ndarray The standard errors of the fitted fixed effects coefficients bse_re : ndarray The standard errors of the fitted random effects covariance matrix and variance components. The first `k_re * (k_re + 1)` parameters are the standard errors for the lower triangle of `cov_re`, the remaining elements are the standard errors for the variance components. See Also -------- statsmodels.LikelihoodModelResults ''' def __init__(self, model, params, cov_params): super().__init__(model, params, normalized_cov_params=cov_params) self.nobs = self.model.nobs self.df_resid = self.nobs - np.linalg.matrix_rank(self.model.exog) @cache_readonly def fittedvalues(self): """ Returns the fitted values for the model. The fitted values reflect the mean structure specified by the fixed effects and the predicted random effects. """ fit = np.dot(self.model.exog, self.fe_params) re = self.random_effects for group_ix, group in enumerate(self.model.group_labels): ix = self.model.row_indices[group] mat = [] if self.model.exog_re_li is not None: mat.append(self.model.exog_re_li[group_ix]) for j in range(self.k_vc): mat.append(self.model.exog_vc.mats[j][group_ix]) mat = np.concatenate(mat, axis=1) fit[ix] += np.dot(mat, re[group]) return fit @cache_readonly def resid(self): """ Returns the residuals for the model. The residuals reflect the mean structure specified by the fixed effects and the predicted random effects. """ return self.model.endog - self.fittedvalues @cache_readonly def bse_fe(self): """ Returns the standard errors of the fixed effect regression coefficients. """ p = self.model.exog.shape[1] return np.sqrt(np.diag(self.cov_params())[0:p]) @cache_readonly def bse_re(self): """ Returns the standard errors of the variance parameters. The first `k_re x (k_re + 1)` elements of the returned array are the standard errors of the lower triangle of `cov_re`. The remaining elements are the standard errors of the variance components. Note that the sampling distribution of variance parameters is strongly skewed unless the sample size is large, so these standard errors may not give meaningful confidence intervals or p-values if used in the usual way. """ p = self.model.exog.shape[1] return np.sqrt(self.scale * np.diag(self.cov_params())[p:]) def _expand_re_names(self, group_ix): names = list(self.model.data.exog_re_names) for j, v in enumerate(self.model.exog_vc.names): vg = self.model.exog_vc.colnames[j][group_ix] na = [f"{v}[{s}]" for s in vg] names.extend(na) return names @cache_readonly def random_effects(self): """ The conditional means of random effects given the data. Returns ------- random_effects : dict A dictionary mapping the distinct `group` values to the conditional means of the random effects for the group given the data. """ try: cov_re_inv = np.linalg.inv(self.cov_re) except np.linalg.LinAlgError: raise ValueError("Cannot predict random effects from " + "singular covariance structure.") vcomp = self.vcomp k_re = self.k_re ranef_dict = {} for group_ix, group in enumerate(self.model.group_labels): endog = self.model.endog_li[group_ix] exog = self.model.exog_li[group_ix] ex_r = self.model._aex_r[group_ix] ex2_r = self.model._aex_r2[group_ix] vc_var = self.model._expand_vcomp(vcomp, group_ix) # Get the residuals relative to fixed effects resid = endog if self.k_fe > 0: expval = np.dot(exog, self.fe_params) resid = resid - expval solver = _smw_solver(self.scale, ex_r, ex2_r, cov_re_inv, 1 / vc_var) vir = solver(resid) xtvir = _dot(ex_r.T, vir) xtvir[0:k_re] = np.dot(self.cov_re, xtvir[0:k_re]) xtvir[k_re:] *= vc_var ranef_dict[group] = pd.Series( xtvir, index=self._expand_re_names(group_ix)) return ranef_dict @cache_readonly def random_effects_cov(self): """ Returns the conditional covariance matrix of the random effects for each group given the data. Returns ------- random_effects_cov : dict A dictionary mapping the distinct values of the `group` variable to the conditional covariance matrix of the random effects given the data. """ try: cov_re_inv = np.linalg.inv(self.cov_re) except np.linalg.LinAlgError: cov_re_inv = None vcomp = self.vcomp ranef_dict = {} for group_ix in range(self.model.n_groups): ex_r = self.model._aex_r[group_ix] ex2_r = self.model._aex_r2[group_ix] label = self.model.group_labels[group_ix] vc_var = self.model._expand_vcomp(vcomp, group_ix) solver = _smw_solver(self.scale, ex_r, ex2_r, cov_re_inv, 1 / vc_var) n = ex_r.shape[0] m = self.cov_re.shape[0] mat1 = np.empty((n, m + len(vc_var))) mat1[:, 0:m] = np.dot(ex_r[:, 0:m], self.cov_re) mat1[:, m:] = np.dot(ex_r[:, m:], np.diag(vc_var)) mat2 = solver(mat1) mat2 = np.dot(mat1.T, mat2) v = -mat2 v[0:m, 0:m] += self.cov_re ix = np.arange(m, v.shape[0]) v[ix, ix] += vc_var na = self._expand_re_names(group_ix) v = pd.DataFrame(v, index=na, columns=na) ranef_dict[label] = v return ranef_dict # Need to override since t-tests are only used for fixed effects # parameters. def t_test(self, r_matrix, use_t=None): """ Compute a t-test for a each linear hypothesis of the form Rb = q Parameters ---------- r_matrix : array_like If an array is given, a p x k 2d array or length k 1d array specifying the linear restrictions. It is assumed that the linear combination is equal to zero. scale : float, optional An optional `scale` to use. Default is the scale specified by the model fit. use_t : bool, optional If use_t is None, then the default of the model is used. If use_t is True, then the p-values are based on the t distribution. If use_t is False, then the p-values are based on the normal distribution. Returns ------- res : ContrastResults instance The results for the test are attributes of this results instance. The available results have the same elements as the parameter table in `summary()`. """ if r_matrix.shape[1] != self.k_fe: raise ValueError("r_matrix for t-test should have %d columns" % self.k_fe) d = self.k_re2 + self.k_vc z0 = np.zeros((r_matrix.shape[0], d)) r_matrix = np.concatenate((r_matrix, z0), axis=1) tst_rslt = super().t_test(r_matrix, use_t=use_t) return tst_rslt def summary(self, yname=None, xname_fe=None, xname_re=None, title=None, alpha=.05): """ Summarize the mixed model regression results. Parameters ---------- yname : str, optional Default is `y` xname_fe : list[str], optional Fixed effects covariate names xname_re : list[str], optional Random effects covariate names title : str, optional Title for the top table. If not None, then this replaces the default title alpha : float significance level for the confidence intervals Returns ------- smry : Summary instance this holds the summary tables and text, which can be printed or converted to various output formats. See Also -------- statsmodels.iolib.summary2.Summary : class to hold summary results """ from statsmodels.iolib import summary2 smry = summary2.Summary() info = {} info["Model:"] = "MixedLM" if yname is None: yname = self.model.endog_names param_names = self.model.data.param_names[:] k_fe_params = len(self.fe_params) k_re_params = len(param_names) - len(self.fe_params) if xname_fe is not None: if len(xname_fe) != k_fe_params: msg = "xname_fe should be a list of length %d" % k_fe_params raise ValueError(msg) param_names[:k_fe_params] = xname_fe if xname_re is not None: if len(xname_re) != k_re_params: msg = "xname_re should be a list of length %d" % k_re_params raise ValueError(msg) param_names[k_fe_params:] = xname_re info["No. Observations:"] = str(self.model.n_totobs) info["No. Groups:"] = str(self.model.n_groups) gs = np.array([len(x) for x in self.model.endog_li]) info["Min. group size:"] = "%.0f" % min(gs) info["Max. group size:"] = "%.0f" % max(gs) info["Mean group size:"] = "%.1f" % np.mean(gs) info["Dependent Variable:"] = yname info["Method:"] = self.method info["Scale:"] = self.scale info["Log-Likelihood:"] = self.llf info["Converged:"] = "Yes" if self.converged else "No" smry.add_dict(info) smry.add_title("Mixed Linear Model Regression Results") float_fmt = "%.3f" sdf = np.nan * np.ones((self.k_fe + self.k_re2 + self.k_vc, 6)) # Coefficient estimates sdf[0:self.k_fe, 0] = self.fe_params # Standard errors sdf[0:self.k_fe, 1] = np.sqrt(np.diag(self.cov_params()[0:self.k_fe])) # Z-scores sdf[0:self.k_fe, 2] = sdf[0:self.k_fe, 0] / sdf[0:self.k_fe, 1] # p-values sdf[0:self.k_fe, 3] = 2 * norm.cdf(-np.abs(sdf[0:self.k_fe, 2])) # Confidence intervals qm = -norm.ppf(alpha / 2) sdf[0:self.k_fe, 4] = sdf[0:self.k_fe, 0] - qm * sdf[0:self.k_fe, 1] sdf[0:self.k_fe, 5] = sdf[0:self.k_fe, 0] + qm * sdf[0:self.k_fe, 1] # All random effects variances and covariances jj = self.k_fe for i in range(self.k_re): for j in range(i + 1): sdf[jj, 0] = self.cov_re[i, j] sdf[jj, 1] = np.sqrt(self.scale) * self.bse[jj] jj += 1 # Variance components for i in range(self.k_vc): sdf[jj, 0] = self.vcomp[i] sdf[jj, 1] = np.sqrt(self.scale) * self.bse[jj] jj += 1 sdf = pd.DataFrame(index=param_names, data=sdf) sdf.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|', '[' + str(alpha/2), str(1-alpha/2) + ']'] for col in sdf.columns: sdf[col] = [float_fmt % x if np.isfinite(x) else "" for x in sdf[col]] smry.add_df(sdf, align='r') return smry @cache_readonly def llf(self): return self.model.loglike(self.params_object, profile_fe=False) @cache_readonly def aic(self): """Akaike information criterion""" if self.reml: return np.nan if self.freepat is not None: df = self.freepat.get_packed(use_sqrt=False, has_fe=True).sum() + 1 else: df = self.params.size + 1 return -2 * (self.llf - df) @cache_readonly def bic(self): """Bayesian information criterion""" if self.reml: return np.nan if self.freepat is not None: df = self.freepat.get_packed(use_sqrt=False, has_fe=True).sum() + 1 else: df = self.params.size + 1 return -2 * self.llf + np.log(self.nobs) * df def profile_re(self, re_ix, vtype, num_low=5, dist_low=1., num_high=5, dist_high=1., **fit_kwargs): """ Profile-likelihood inference for variance parameters. Parameters ---------- re_ix : int If vtype is `re`, this value is the index of the variance parameter for which to construct a profile likelihood. If `vtype` is 'vc' then `re_ix` is the name of the variance parameter to be profiled. vtype : str Either 're' or 'vc', depending on whether the profile analysis is for a random effect or a variance component. num_low : int The number of points at which to calculate the likelihood below the MLE of the parameter of interest. dist_low : float The distance below the MLE of the parameter of interest to begin calculating points on the profile likelihood. num_high : int The number of points at which to calculate the likelihood above the MLE of the parameter of interest. dist_high : float The distance above the MLE of the parameter of interest to begin calculating points on the profile likelihood. **fit_kwargs Additional keyword arguments passed to fit. Returns ------- An array with two columns. The first column contains the values to which the parameter of interest is constrained. The second column contains the corresponding likelihood values. Notes ----- Only variance parameters can be profiled. """ pmodel = self.model k_fe = pmodel.k_fe k_re = pmodel.k_re k_vc = pmodel.k_vc endog, exog = pmodel.endog, pmodel.exog # Need to permute the columns of the random effects design # matrix so that the profiled variable is in the first column. if vtype == 're': ix = np.arange(k_re) ix[0] = re_ix ix[re_ix] = 0 exog_re = pmodel.exog_re.copy()[:, ix] # Permute the covariance structure to match the permuted # design matrix. params = self.params_object.copy() cov_re_unscaled = params.cov_re cov_re_unscaled = cov_re_unscaled[np.ix_(ix, ix)] params.cov_re = cov_re_unscaled ru0 = cov_re_unscaled[0, 0] # Convert dist_low and dist_high to the profile # parameterization cov_re = self.scale * cov_re_unscaled low = (cov_re[0, 0] - dist_low) / self.scale high = (cov_re[0, 0] + dist_high) / self.scale elif vtype == 'vc': re_ix = self.model.exog_vc.names.index(re_ix) params = self.params_object.copy() vcomp = self.vcomp low = (vcomp[re_ix] - dist_low) / self.scale high = (vcomp[re_ix] + dist_high) / self.scale ru0 = vcomp[re_ix] / self.scale # Define the sequence of values to which the parameter of # interest will be constrained. if low <= 0: raise ValueError("dist_low is too large and would result in a " "negative variance. Try a smaller value.") left = np.linspace(low, ru0, num_low + 1) right = np.linspace(ru0, high, num_high+1)[1:] rvalues = np.concatenate((left, right)) # Indicators of which parameters are free and fixed. free = MixedLMParams(k_fe, k_re, k_vc) if self.freepat is None: free.fe_params = np.ones(k_fe) vcomp = np.ones(k_vc) mat = np.ones((k_re, k_re)) else: # If a freepat already has been specified, we add the # constraint to it. free.fe_params = self.freepat.fe_params vcomp = self.freepat.vcomp mat = self.freepat.cov_re if vtype == 're': mat = mat[np.ix_(ix, ix)] if vtype == 're': mat[0, 0] = 0 else: vcomp[re_ix] = 0 free.cov_re = mat free.vcomp = vcomp klass = self.model.__class__ init_kwargs = pmodel._get_init_kwds() if vtype == 're': init_kwargs['exog_re'] = exog_re likev = [] for x in rvalues: model = klass(endog, exog, **init_kwargs) if vtype == 're': cov_re = params.cov_re.copy() cov_re[0, 0] = x params.cov_re = cov_re else: params.vcomp[re_ix] = x # TODO should use fit_kwargs rslt = model.fit(start_params=params, free=free, reml=self.reml, cov_pen=self.cov_pen, **fit_kwargs)._results likev.append([x * rslt.scale, rslt.llf]) likev = np.asarray(likev) return likev class MixedLMResultsWrapper(base.LikelihoodResultsWrapper): _attrs = {'bse_re': ('generic_columns', 'exog_re_names_full'), 'fe_params': ('generic_columns', 'xnames'), 'bse_fe': ('generic_columns', 'xnames'), 'cov_re': ('generic_columns_2d', 'exog_re_names'), 'cov_re_unscaled': ('generic_columns_2d', 'exog_re_names'), } _upstream_attrs = base.LikelihoodResultsWrapper._wrap_attrs _wrap_attrs = base.wrap.union_dicts(_attrs, _upstream_attrs) _methods = {} _upstream_methods = base.LikelihoodResultsWrapper._wrap_methods _wrap_methods = base.wrap.union_dicts(_methods, _upstream_methods) def _handle_missing(data, groups, formula, re_formula, vc_formula): tokens = set() forms = [formula] if re_formula is not None: forms.append(re_formula) if vc_formula is not None: forms.extend(vc_formula.values()) from statsmodels.compat.python import asunicode from io import StringIO import tokenize skiptoks = {"(", ")", "*", ":", "+", "-", "**", "/"} for fml in forms: # Unicode conversion is for Py2 compatability rl = StringIO(fml) def rlu(): line = rl.readline() return asunicode(line, 'ascii') g = tokenize.generate_tokens(rlu) for tok in g: if tok not in skiptoks: tokens.add(tok.string) tokens = sorted(tokens & set(data.columns)) data = data[tokens] ii = pd.notnull(data).all(1) if type(groups) is not str: ii &= pd.notnull(groups) return data.loc[ii, :], groups[np.asarray(ii)]