Zero-inflated negative binomial regression python

classstatsmodels.discrete.count_model.ZeroInflatedNegativeBinomialP(endog, exog, exog_infl=None, offset=None, exposure=None, inflation='logit', p=2, missing='none', **kwargs)[source]

Zero Inflated Generalized Negative Binomial Model

Parameters:endogarray_like

A 1-d endogenous response variable. The dependent variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.

exog_inflarray_like or None

Explanatory variables for the binary inflation model, i.e. for mixing probability model. If None, then a constant is used.

offsetarray_like

Offset is added to the linear prediction with coefficient equal to 1.

exposurearray_like

Log(exposure) is added to the linear prediction with coefficient equal to 1.

inflation{‘logit’, ‘probit’}

The model for the zero inflation, either Logit (default) or Probit

pfloat

dispersion power parameter for the NegativeBinomialP model. p=1 for ZINB-1 and p=2 for ZINM-2. Default is p=2

missingstr

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

Attributes:endogndarray

A reference to the endogenous response variable

exogndarray

A reference to the exogenous design.

exog_inflndarray

A reference to the zero-inflated exogenous design.

pscalar

P denotes parametrizations for ZINB regression. p=1 for ZINB-1 and

p=2 for ZINB-2. Default is p=2

Methods

cdf(X)

The cumulative distribution function of the model.

cov_params_func_l1(likelihood_model, xopt, ...)

Computes cov_params on a reduced parameter space corresponding to the nonzero parameters resulting from the l1 regularized fit.

fit([start_params, method, maxiter, ...])

Fit the model using maximum likelihood.

fit_regularized([start_params, method, ...])

Fit the model using a regularized maximum likelihood.

from_formula(formula, data[, subset, drop_cols])

Create a Model from a formula and dataframe.

get_distribution(params[, exog, exog_infl, ...])

Get frozen instance of distribution based on predicted parameters.

hessian(params)

Generic Zero Inflated model Hessian matrix of the loglikelihood

information(params)

Fisher information matrix of model.

initialize()

Initialize is called by statsmodels.model.LikelihoodModel.__init__ and should contain any preprocessing that needs to be done for a model.

loglike(params)

Loglikelihood of Generic Zero Inflated model.

loglikeobs(params)

Loglikelihood for observations of Generic Zero Inflated model.

pdf(X)

The probability density (mass) function of the model.

predict(params[, exog, exog_infl, exposure, ...])

Predict response variable or other statistic given exogenous variables.

score(params)

Score vector of model.

score_obs(params)

Generic Zero Inflated model score (gradient) vector of the log-likelihood

Properties

What is a zero

Zero-Inflated Negative Binomial Regression | R Data Analysis Examples. Zero-inflated negative binomial regression is for modeling count variables with excessive zeros and it is usually for overdispersed count outcome variables.

How do you run a negative binomial regression in Python?

Process of Doing Negative Binomial Regression Analysis in Python.
import statsmodels. api as sm..
import matplotlib. pyplot as plt..
import numpy as np..
from patsy import dmatrices..
import pandas as pd..

What type of model is used for zero

Zero-inflated Poisson regression is used to model count data that has an excess of zero counts. Further, theory suggests that the excess zeros are generated by a separate process from the count values and that the excess zeros can be modeled independently.

How do you know if data is zero

Details. If the amount of observed zeros is larger than the amount of predicted zeros, the model is underfitting zeros, which indicates a zero-inflation in the data. In such cases, it is recommended to use negative binomial or zero-inflated models.