A forecasting method that examines the association between two or more variables.

Multivariate Analysis of Forensic Fraud, 2000–2010

Brent E. Turvey, in Forensic Fraud, 2013

Hierarchical multiple regression analysis of examiner approaches to fraud

Hierarchical multiple regression analysis demonstrates that, in the present sample, sets of employer characteristics, examiner characteristics, and situational factors explained a statistically significant portion of the variance in examiner approach to fraud (see Table 9-4). Each statistically significant result is presented and discussed next.4

Dissemblers

Employer characteristics (R2 = .15∗∗), Job Description characteristics (ΔR2 = .17∗∗), and Examiner characteristics (ΔR2 = .02∗∗) all accounted for a statistically significant portion of the variance found in Dissemblers, F(12,76) = 3.24, p < .01. Put another way, the Employer, Job Description, and Examiner characteristics in the present study resulted in a total R2 of .34∗∗, accounting for 34% of all variance found among Dissemblers.

In Step 1 (Employer characteristics), statistically significant results are observed for Dissemblers based on Laboratory Accreditation (sr = .38∗∗); significantly more Dissemblers were correlated with accredited forensic laboratories. This finding suggests that if a forensic laboratory is accredited, fraudulent examiners are significantly more likely to exaggerate, embellish, lie about, or otherwise misrepresent findings than to falsify credentials or dry-lab their results.

One way to interpret this finding is to infer that the structure and accountability necessary to achieve laboratory accreditation may be effective at preventing the unqualified (Pseudoexperts) from gaining employment at accredited labs, and at ensuring that physical evidence gets examined without direct tampering from Simulators—in other words, that there are some positive and constructive results of laboratory accreditation. This is a reasonable interpretation.

However, this finding also demonstrates that laboratory accreditation does not eliminate forensic fraud. In fact, it may be used to infer that laboratory accreditation actually encourages forensic examiners to commit a particular kind of fraud—to lie about the results of their examinations for fear of committing documented error or failing their proficiencies. As will be discussed later in this chapter, this interpretation is supported by findings demonstrating that laboratory accreditation is significantly related to increased falsification of only one kind of evidence: DNA.

In Step 2 (Job Description characteristics), statistically significant results are observed for Dissemblers based on a combined ΔR2 of .17∗∗, accounting for 17% of all variance found among Dissemblers. While the combined ΔR2 is significant, examination of individual variables (JLAB, JTEC, JLEX, and JMED) is not revealing. Consequently, Step 2 variables represent what may be referred to as a pool of variance. That is to say, these variables are important as a group in relation to Dissemblers, but it is likely that there is insufficient sample size to magnify and reveal which are significant factors on their own. Further study is needed to determine which Job Description characteristics are significantly correlated with Dissemblers.

In Step 3 (Examiner characteristics), statistically significant results are observed for Dissemblers based on a combined ΔR2 of .02∗∗, accounting for 2% of all variance found among Dissemblers. While the combined ΔR2 is significant, examination of individual variables (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) is not revealing. Consequently, Step 3 variables represent a pool of variance. That is to say, these variables are important as a group in relation to Dissemblers, but it is likely that there is insufficient sample size to magnify and reveal which are significant factors on their own. Further study is needed to determine which Examiner characteristics are significantly correlated with Dissemblers.

Pseudoexperts

Employer characteristics (R2 = .10∗∗), Job Description characteristics (ΔR2 = .10∗∗), and Examiner characteristics (ΔR2 = .16∗∗) all accounted for a statistically significant portion of the variance found in Pseudoexperts, F(12,76) = 4.15, p < .01. Put another way, the Employer, Job Description, and Examiner characteristics in the present study resulted in a total R2 of .36∗∗, accounting for 36% of all variance found among Pseudoexperts.

In Step 1 (Employer characteristics), statistically significant results are observed for Pseudoexperts based on Internal Audits (sr = –.28∗∗); in the present sample, significantly fewer Pseudoexperts are revealed in association with Internal Audits. This finding may suggest that internal audits are not effective at revealing those who falsify their credentials—that audits are more often focused on reviewing cases and protocols, and not hiring practices or examiner resumes. Alternatively, it may suggest that the kinds of forensic laboratories imposing internal audits are less likely to hire examiners with phony qualifications in the first place.

In Step 2 (Job Description characteristics), statistically significant results are observed for Pseudoexperts employed as laboratory criminalists (JLAB, sr = –.20∗) and technicians (JTEC; sr = –.21∗); in the present sample, those examiners employed as laboratory criminalists or technicians are significantly less likely to falsify their credentials. With respect to JLAB examiners, this may reflect the reality that laboratory positions require a demonstration of knowledge, skill, and ability that are not easily faked; these examiners are hired primarily based on the presentation verification of scientific credentials in biology or chemistry. With respect to JTEC examiners, this may reflect the reality that police technicians are generally not expected to be educated in the sciences, and subsequently rely primarily on training and experience acquired on the job when writing reports and giving testimony.

In Step 3 (Examiner characteristics), statistically significant results are observed for Pseudoexperts with respect to Isolated Incidents (sr = –.22∗), Scientific Education (sr = –.19∗), and a History of Addiction (sr = –.25∗∗). In the present sample, Pseudoexperts are significantly more likely to engage in prolonged fraud involving multiple instances of falsification; they are significantly less likely to have a scientific education; and they are significantly less likely to have a history of addiction. Pseudoexperts are significantly more likely to have engaged in prolonged fraud involving multiple instances of credential falsification owing to the time it generally takes to uncover their activity;5 also, they tend to commit fraud more frequently: each time they apply for a job or a promotion, or testify under oath (by either commission or omission). Pseudoexperts are significantly less likely to have a scientific education, as this tends to be the very type of credential that they are falsifying. Though it is not immediately apparent why Pseudoexperts are significantly less likely to have a history of addiction, it may have something to do with avoiding additional deviant behavior that draws unwanted attention or scrutiny (this is offered as one possibility only; further study is necessary to develop a more complete set of possible explanations).6

Simulators

Employer characteristics (R2 = .39∗∗), Job Description characteristics (ΔR2 = .06∗∗), and Examiner characteristics (ΔR2 = .08∗∗) all accounted for a statistically significant portion of the variance found in Simulators, F(12,76) = 7.36, p < .01. Put another way, the Employer, Job Description, and Examiner characteristics in the present study resulted in a total R2 of .53∗∗, accounting for 53% of all variance found among Simulators.

In Step 1 (Employer characteristics), statistically significant results are observed for Simulators in relation to Employer Independence from law enforcement (sr = –.21∗) and Internal Audits (sr = .44∗∗). In the present sample, Simulators represent the most frequent approach to committing forensic fraud (90%).7 Simulators are also significantly less likely to be found in association with non-law enforcement employers, and significantly more likely to be discovered in association with Internal Audits.

Simulators are significantly less likely to be found in association with non-law enforcement employers. Given that law enforcement employers comprise 78% of our sample, this might cause some to argue that law enforcement is overrepresented, which would in turn result in a higher correlation with forensic fraud. However, no other approach to fraud is significantly correlated with this variable. This suggests that the finding is not necessarily a function of employer overrepresentation in the sample. It also supports the argument that increased employer independence from law enforcement significantly reduces the frequency of Simulators. Conversely, it supports the argument that law enforcement affiliation is significantly associated with an increased frequency of Simulators.

Simulators are also significantly more likely to be discovered in association with internal audits. It could be argued that this may be a feature of overrepresentation in the sample (SIM = 90%). However, given that internal audits are not significantly associated with Dissemblers either way (DIS = 57%), this interpretation seems less likely. Rather, it appears consistent with the argument that internal audits are most effective at revealing when physical evidence has been fabricated, tampered with, or destroyed by forensic examiners—as discussed previously.

In Step 2 (Job Description characteristics), statistically significant results are observed for Simulators based on a combined ΔR2 of .06∗∗, accounting for 6% of all variance found among Simulators. While the combined ΔR2 is significant, examination of individual variables (JLAB, JTEC, JLEX, and JMED) is not revealing. Consequently, Step 2 variables represent what may be referred to as a pool of variance. That is to say, these variables are important as a group in relation to Simulators, but it is likely that there is insufficient sample size to magnify and reveal which are significant on their own. This is consistent with a similar observation made with regard to Dissemblers at Step 2. Further study is needed to determine which Job Description characteristics are significantly correlated with Simulators.

In Step 3 (Examiner characteristics), significant results are observed for Simulators based on a combined ΔR2 of .08∗∗, accounting for 8% of all variance found among Simulators. While the combined ΔR2 is significant, examination of individual variables (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) is not revealing. Consequently, Step 3 variables represent a pool of variance. That is to say, these variables are important as a group in relation to Simulators, but it is likely that there is insufficient sample size to magnify and reveal which are significant on their own. Further study is needed to determine which Examiner characteristics are significantly correlated with Simulators.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080737000094

GIS Applications for Environment and Resources

Federica Lucà, ... Oreste Terranova, in Comprehensive Geographic Information Systems, 2018

2.03.4.1 Statistical Methods

Among statistical methods, multiple regression analysis has been the most commonly applied for assessing the relationship between a soil property (dependent variable) and several morphometric attributes as independent predictors (Moore et al., 1993; Gessler et al., 2000). The approach assumes a linear relationship between soil and topography but the simplicity of data processing, model structure, and interpretation explain its wide application for predicting several quantitative soil properties. Regression has been used for example to assessing soil horizon thickness (Moore et al., 1993; Odeh et al., 1994; Gessler et al., 2000; Florinsky et al., 2002). The relationships between soil properties and other topographic or biophysical variables are rarely linear in nature. Such consideration has led to the application of more robust methods such as generalized linear models (GLM) and generalized additive models (GAM). GLM are used both for regression and classification purposes. The assumption is that the dependent variable is normally distributed and that the predictors combine additively on the response. Aside from being able to handle multiple distributions, GLM have additional benefits, such as being able to use both categorical and continuous variables as predictors. Thanks to their ability to model complex data structures, GLM models have been widely applied. In GAM, the linear function between soil properties and topographic covariates is replaced by an unspecified nonparametric function (e.g., spline). Artificial neural network (ANN) is a nonparametric modeling technique used to overcome the nonlinearity in the relationships characterizing the soils. The ANN is a form of artificial intelligence that can use both qualitative and quantitative data. It aims autoanalyzing the relationships between multisource inputs by adopting self-learning methods, and works without any hypothesis on the statistical distribution of variables. Zhao et al. (2009) developed an ANN model to predict soil properties based on hydrological attributes (soil–terrain factor, sediment delivery ratio, and vertical slope position) derived from high-resolution DEM. Fuzzy logic (Zadeh, 1965; McBratney and Odeh, 1997) is a method for grouping multivariate data into clusters, defining the membership of an element belonging to a set of classes. Different from hard logic that allows an individual to lie within a mutually exclusive class, fuzzy logic (sometimes called fuzzy k-means) allows an individual to lie as bridges between classes. Since soil landscapes are characterized by continuous nature, fuzzy logic is useful in predictive soil mapping. Fuzzy logic has been used, for example, to cluster topographic attributes (elevation, slope, plan curvature, TWI, SPI, catchment area) derived from a 5 m DEM in order to predict topsoil clay at field scale (de Bruin and Stein, 1998). The method has resulted useful for predicting chemical properties such as soil mineral nitrogen, organic matter, available phosphorus, and soil pH (Lark, 1999) at the field scale and soil taxonomic classes in large-scale soil mapping (Odeh et al., 1992; Lark, 1999; Barringer et al., 2008). Combination of fuzzy logic with discriminant analysis is also reported in the literature (Sorokina and Kozlov, 2009).

Decision trees work by splitting data into homogeneous subsets. Two main types of the decision tree analysis are used in DSM: classification tree analysis (the dependent variable is categorical) and regression tree analysis (the dependent property is a numeric variable). Classification tree has been applied for predicting soil drainage class using digital elevation and remote sensed data (Cialella et al., 1997) or soil taxonomic classes (Lagacherie and Holmes, 1997; McBratney et al., 2000; Moran and Bui, 2002; Zhou et al., 2004; Scull et al., 2005; Mendonça-Santos et al., 2008). Regression tree has instead been used for predicting soil cation exchange capacity (Bishop and McBratney, 2001), soil profile thickness, total phosphorus (McKenzie and Ryan, 1999).

Discriminant analysis is used to assess the group membership of an individual based on the attributes of the individual itself. This method allows to determine the attributes adequate to discriminate between classes using a multivariate dataset. The approach has been used to map soil texture classes (Hengl et al., 2007), soil drainage classes (Kravchenko et al., 2002), and taxonomic classes (Thomas et al., 1999; Hengl et al., 2007).

Logistic regression is used to predict a categorical variable from a set of both continuous and/or categorical predictors (Kleinbaum et al., 2008). Logistic regression can be binary or multinomial, based on the number of soil categories to be predicted. For example, multinomial logistic regression has been used to predict soil taxonomic classes or soil texture (Hengl et al., 2007; Giasson et al., 2008). Binary logistic regression has instead been used to assess the presence or absence of specific horizon (Gessler et al., 1995), soil salinity risk (Taylor and Odeh, 2007), and gully erosion (Lucà et al., 2011; Conoscenti et al., 2014).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124095489096342

Statistical Methods for Transport Demand Modeling

V.A. Profillidis, G.N. Botzoris, in Modeling of Transport Demand, 2019

Abstract

This chapter deals with statistical methods, and more particularly with simple and multiple regression analysis, which are the basic tool when correlating transport demand to factors (such as time, cost, etc.) affecting it. After an overview of fundamentals of statistics such as terms, measures, hypothesis testing, probability distribution, and stationarity, the mathematical expression of the simple and multiple linear regression and the estimation of the various regression coefficients and the error term with the use of the ordinary least squares method are presented. Pearson correlation coefficient, coefficient of determination, and adjusted coefficient of determination as measures for the degree of correlation between one dependent and one or more independent variable(s) are analyzed. Tests of the significance of the coefficients of a regression analysis (Student's t-test and F-test) are presented afterward. Multicollinearity (correlation between independent variables), its detection, and techniques of removal are identified. The various characteristics and properties of residuals of a linear regression are surveyed with the help of the appropriate tests: probability distribution (skewness and kurtosis, Jarque–Bera test), influence of residuals and determination of outliers (Cook's distance), existence or not of serial correlation in the residuals (Durbin–Watson test, Durbin's h-test, Breusch–Godfrey Lagrange Multiplier test, Ljung–Box test). The various tests for the detection of heteroscedasticity in a regression analysis are analyzed: Breusch–Pagan test, Glesjer test, Harvey–Godfrey test, White test, autoregressive conditional heteroscedasticity test. Next, the various criteria for the evaluation of the forecasting accuracy of calibrated models are categorized, among them the Theil's inequality coefficient. All the above analysis, methods, tests, and criteria are extensively put into practice in a specific example of multiple linear regression analysis for the construction of an econometric model for transport demand.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128115138000054

Challenges for Manufacturing in China

Tachia Chin, Chris Rowley, in The Future of Chinese Manufacturing, 2018

2.3.5 Discussion

This study investigated whether policy support facilitates own-brand innovation in China's auto industry. Drawing on HMR analysis, our findings show that policy support promotes the establishment of Sino-foreign JVs as well as the propensity of such JVs to produce foreign-brand cars. Moreover, we discovered the full mediating effect of Sino-foreign JVs on the relationship between policy support and foreign-brand preference. As a result, all four of our hypotheses are fully examined, which offers an answer to our research question that policy support may not be able to facilitate the conduct of branding strategy, but rather hinders own-brand development along with the establishment of more Sino-foreign JVs in China's auto industry.

In accordance with prior research (Du & Luo, 2016; Jia, 2016), our findings also indicate that industrial policy as a typical means of governmental intervention plays a crucial role in attracting and leveraging FDI to accelerate the internationalization and development of domestic firms in China's auto industry, especially for SOEs and enterprises with state-owned shares, which make up a large portion of organizations. On the one hand, policy support is particularly conducive to the competitiveness of Chinese enterprises because it can bring about unique, inconceivable resources that compensate for the imperfections and immaturity of China's market structure and legal frameworks (Augier et al., 2016; Chin, 2015). On the other hand, excessive governmental intervention into the market mechanism may also elicit side effects such as low efficiency, institutional inertia, rent-seeking, and corruption, all of which may impede innovation (Chin et al., 2015; Ngo, 2008).

Inconsistent with previous studies addressing the positive technology spillover arising from international strategic alliances (Eapen, 2012; Opper & Nee, 2015), our results implicitly highlight the negative association between Sino-foreign JVs and own-brand innovation. Owing to the potent policy support, the Sino-foreign JVs might have bypassed all institutional constraints and dominated the Chinese auto market. Interestingly, however, such Sino-foreign JVs, as the biggest beneficiaries of government support on subsidies and access to loans and tax breaks, performed relatively poorly in building and promoting domestic brands compared to the privately owned carmakers, such as GWN, Geely and BYD, that have been endeavouring to widen their brand visibility for many years (Drauz, 2013; Wang & Kimble, 2013). As noted earlier, the well-known case of Chinese Geely's acquisition of Volvo from Ford in 2010 elucidates how a Chinese private carmaker gained core technologies, proprietary intellectual property, international marketing channels, and global-brand recognition almost ‘overnight’. The Sino-foreign JV model, in this sense, seems to have become an impediment rather than a catalyst for promoting own-brand innovation because the lack of an entrepreneurial culture and a related fear of failure may discourage top leaders of such ventures from making risky decisions. As a result, forming international strategic alliances may lead to excessive reliance on political resources and thus increase the risk-averse behaviour of firms. In contrast, private firms, with a more entrepreneurial spirit and higher risk tolerance, are gradually entering China's economy, fighting for more market share vis-a-vis giant national corporations (Opper & Nee, 2015).

Considering the foregoing arguments, the main theoretical contribution of this research is to bring deeper and greater insights into the interplay among policy support, own-brand innovation, and international JVs in China's auto industry. While Japanese and Korean auto industries demonstrate successful examples of industrial upgrading and brand establishment (e.g. Toyota, Honda, Nissan, and Mazda in Japan; and Daewoo, Hyundai, and Kia in Korea) (Kim, 1999; Yang & Tang, 2014), our research illustrates an unsuccessful Chinese story regarding own-brand development. This is not surprising considering that the ratio of expenditure of Chinese companies on technology imports to their technology assimilation is relatively low compared to that in Japan and South Korea. The novel context-specific approach we proposed to measure the degree of policy support is also considered as a valuable contribution to the literature because it embodies the unique institutional complexity and intricate social network embedded in China's SOEs. In addition, implicit in our findings is the possibility that the Sino-foreign JV model as an FDI-led growth strategy may create too heavy a dependence on borrowed technology and marketing skills from the developed-country partners and thus become unable to fulfil the strategic goal of indigenous innovation. Viewed from this angle, we also enrich existing knowledge concerning the impact of FDI on innovation in the Chinese context.

As far as the practical implications are concerned, this research provides valuable information for policymakers to further rectify auto-related policies. As mentioned above, the failure of developing own-brand products by Sino-foreign JVs in China has explicitly drawn our attention to the fact that, from a market perspective, corporate-political ties can not only be an advantage, but can also be problematic. In addition, it seems especially vital to emphasize the strategic importance of top managers in SOEs—firm-level value-creation and innovation rely on the contributions of all employees and this is motivated largely by effective leaders (Foss & Lindenberg, 2013). Whereas the Sino-foreign JV as a symbol of state monopoly still dominates the Chinese car market, it is imperative for the government to formulate more specific policies and strategies for raising the brand awareness of top management so as to motivate these organizations to engage more in own-brand innovation.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780081011089000021

Counterurbanization

Clare J.A. Mitchell, Christopher R. Bryant, in International Encyclopedia of Human Geography (Second Edition), 2020

Local Assets

Counterurbanization is sometimes explained by identifying local assets that may be responsible for high rates of in-migration and/or population growth. Multiple regression analysis is typically undertaken to identify the independent variables that explain these levels within national or subnational systems. Evidence suggests that statistically significant factors vary spatially. In Turkey, for example, downward migration is found in areas with relatively high employment levels. In Australia, it occurs in places with high relative accessibility, a tourism presence, or agricultural activity. In Sweden, urban accessibility is associated with high counterurbanization levels, and in the United States, high migration rates are found in remote areas adjacent to natural amenities (e.g., water bodies) and with potential for land developability. In the United Kingdom, the presence of universities is an important factor, and in South Africa, counterurbanization is associated with mining and sunshine destinations.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780081022955103336

Time Series Analysis

Guy M. Robinson, in International Encyclopedia of Human Geography (Second Edition), 2020

Spatial Forecasting

One of the principal aims of time series analysis when performed by geographers is to make forecasts of future spatial patterns, ranging from just a few days ahead in the case of meteorological phenomena to perhaps 5 or 10 years in the case of economic geography. This requires a clear understanding of the processes that produce the spatial patterns and then identifying patterns based on past trends to use this information predictively. In predicting values of a variable for a particular location, the past record of that variable at other locations can be taken into account in various ways. In human geography, the initial applications were for regional forecasts for unemployment and the incidence of disease. Subsequently, time series modeling techniques have gained popularity in the last 10 years as useful tools for studying the temporal scaling structures and dynamics present in ecological and other complex systems.

Pioneering work on regional forecasting of unemployment was performed by Peter Haggett and associates at the University of Bristol in the 1970s. This work involved the development of procedures for estimating how trends in time series data would progress through time and space. The focus of this approach is xi(t), a time series for variable x at the ith location, which also has values for that variable for other locations. A set of time series for different locations can be compared using cross-spectral analysis. For example, by comparing whether the time series for two locations are in phase or whether one series leads or lags another. This can establish the intensity of the relationship between them for various lags and leads. Behavior at a location that consistently leads others can be used as a predictor of behavior in other locations.

This largely descriptive and exploratory approach can be extended into the area of modeling by using the equation

yt=Bkxt−k+e

where yt = time series in a region xt, t = 1, …, T, xt = the national series, e = a random disturbance term, and the vector of Bk coefficients records the regional responses at each lead or lag (k). Autocorrelation can affect Bk and test statistics, but this can be calculated and incorporated into the model.

The pattern of the variable being analyzed can be disaggregated into three components for which equations could be applied to each location, deriving coefficients using multiple regression analysis. For research on regional unemployment patterns, the three components are as follows:

1.

Aggregate cyclical component: that part of a region's unemployment rate resulting from cyclical variations at the national level. Sensitivity (aij) = 1, if the aggregative regional component behaves in the same way as the national pattern of variation. More prosperous areas with low unemployment rates tend to be cyclically insensitive and vice versa.

Ajt=ajUt=lj

where Ajt = aggregative cyclical component in region j at time t, U = national unemployment rate, lj = length of lead or lag between the national and regional series, and aj = sensitivity of region j to national cyclical variations.2.

Structural component: this is the component peculiar to each region, which reflects long-term disturbances in the labor market.

Sjt=Cj+bjt+djt2

where Sjt = structural component of regional unemployment in region j at time t, which is measured in terms of a quadratic time trend, Cj = structural component in the initial time period, and bj and dj = coefficients of the quadratic time trend.3.

Regional cyclic component: this is given by

Rjt=Ajt+Sjt+Rjt

where Rjt = level of unemployment Ujt in region j at time t. If aj = 1 and Cj = bj = dj = Rjt = 0, then no regional unemployment problems exist.

In theory, these models enable areas to be classified on the basis of their performance with respect to the three components and there are applications in terms of regional planning and forecasting; however, the models assume that macroeconomic conditions are stable over a period of time, which is not true in any period of economic instability. To allow for this, more complex rules can be created and added to the basic equations, for example, using techniques that model directly the changing statistical relationships in a time series. Nevertheless, lack of stability between interregional lead–lag relations in one economic cycle and another can make forecasting very difficult. Hence, geographers have tended to use predictive models in other applications, notably in forecasting epidemic waves for diseases (see below).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780081022955106146

Combining Multiple Signals for Biosurveillance

Andrew W. Moore, ... Weng-Keen Wong, in Handbook of Biosurveillance, 2006

3 COMBINING MULTIPLE TIME SERIES USING REGRESSION ANALYSIS

Multiple regression is a statistical technique that can be used to analyze the relationship between a single dependent variable and several independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the value of the single dependent value. Each predictor value is weighed, the weights denoting their relative contribution to the overall prediction.

Here Y is the dependent variable, and X1,…,Xn are the n independent variables. In calculating the weights, a, b1,…,bn, regression analysis ensures maximal prediction of the dependent variable from the set of independent variables. This is usually done by least squares estimation.

This approach can be applied to analyze multivariate time series data when one of the variables is dependent on a set of other variables. We can model the dependent variable Y on the set of independent variables. At any time instant when we are given the values of the independent variables, we can predict the value of Y from Eq. 1.

In time series analysis, it is possible to do regression analysis against a set of past values of the variables. This is known as autoregression (AR). Let us consider n variables. We have a time series corresponding to each variable. At time t, the vector Zt represents the values of the n variables. The general autoregressive model assumes that Zt can be represented as:

(2)Zt=A1Zt−1+A2 Zt−2+…+ApZt−p+Et

where each Ai (an n × n matrix) is the autoregression coefficient. Zt is the column vector of length n, denoting the values of the time series variables at time t. p is the order of the filter which is generally much less than the length of the series. The noise term or residual, Et, is almost always assumed to be Gaussian white noise.

In a more general case, we can consider that the values Zt-1, Zt-2 … Zt-p are themselves noisy values. Adding the noise values to Eq. 2, we get the ARMA (autoregressive moving average) equation:

(3)Zt=A1Z t−1+A2Zt−2+…+ApZt−p+Et−B1Et−1−B2 Et−2…−BpEt−p

Here B1 … Bp, (each an n × n matrix), are the MA coefficients. These coefficients can be determined using the standard Box-Jenkins methodology (Box et al., 1994).

AR and ARMA provide a nice way to predict what should be happening to all of the time series simultaneously, and they allow one time series to use other series to increase their accuracy.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012369378550017X

Probability Models

R. Flowerdew, in International Encyclopedia of Human Geography, 2009

Normal Probability Models

Most of the body of statistical theory adopted by geographers in the 1960s was based on the normal distribution, which was the basis of the standard forms of correlation and regression analysis. A standard (Ordinary Least Squares or OLS) multiple regression analysis (perhaps the most widely used statistical technique) can be regarded as fitting a series of probability models. The response (or dependent) variable Y is regarded as having a normal distribution whose mean is a linear combination of certain explanatory (or independent) variables X1, X2, etc. (and whose variance is the same for each case in the data set). A series of models is fitted, usually starting with one explanatory variable and step by step incorporating other explanatory variables. A model can be stated:

Yi=β0+β1X1i+εi

where Yi is the value of the response variable for case i, X1i is the value of explanatory variable X1 for case i, β0 and β1 are parameters calculated in the model to optimize goodness of fit, and εi is an error term whose value is regarded as being taken from a normal distribution with mean zero and standard deviation constant for all cases. For example, Yi might be life expectancy in country i and X1i might be country i’s gross national product (GNP). If the model is fitted to appropriate data, it is possible to assess whether GNP could have an effect on life expectancy, to estimate the direction and approximate size of the effect, and the overall success of the model in estimating life expectancy across the data set. Note that GNP does not tell you exactly what life expectancy is in any country; it just gives an indication of the probable range of values, subject to influence from other factors not in the model, including random variation.

Having fitted a probability model which states that life expectancy follows a normal distribution around a mean equal to a linear function of GNP, further models can be fitted using other variables which may be related to life expectancy. If X2i is the literacy rate of country i, X3i is the level of urbanization in country i, and so on, it is possible to fit any of a set of models, differing in which of the available X variables are used, for calculating the mean of the normal distribution. Models can be assessed using a goodness of fit statistic (R2, the coefficient of determination, is commonly used in OLS regression). This gives some insight into whether the model could reasonably be supposed to have generated the data and which of the X variables have the greatest influence on Y (controlling for the other X variables in the model).

Statistical significance tests can be used to evaluate both the model as a whole and each of the X variables. It is customary to use type 1 error (the probability of rejecting a null hypothesis that is true) as a criterion for deciding whether a particular X variable is significant or not in modeling the distribution of the Y variable. By convention, this value is set at 0.05 (sometimes 0.01) and there has been controversy over the implications of this rather arbitrary choice.

Interpretation of goodness of fit is not always very straightforward. Even a very high R2 value does not mean that the model reveals the actual process generating the data for Y; the X variables and Y may both be influenced by some other factor. Nor does a very poor R2 mean that the X variables have no effect on Y – a real effect may be masked by failure to include a very important X variable, or by a very large error term. Or, of course, there may be a strong but nonlinear relationship.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080449104011159

Path Analysis

Christy Lleras, in Encyclopedia of Social Measurement, 2005

Path Analytic Methods

One of the primary goals of social scientific research is to understand social systems through the explication of causal relationships. However, given the complexity of social life, disentangling the interrelationships among variables is often a difficult task. Path analysis is a methodological tool that helps researchers using quantitative (correlational) data to disentangle the various (causal) processes underlying a particular outcome. The path analytic method is an extension of multiple regression analysis and estimates the magnitude and strength of effects within a hypothesized causal system. In addition, path analysis can be used to test the fit between two or more causal models, which are hypothesized by the researcher to fit the data.

Since path analysis assesses the comparative strength of different effects on an outcome, the relationships between variables in the path model are expressed in terms of correlations and represent hypotheses proposed by the researcher. Therefore, the relationships or pathways cannot be statistically tested for directionality and the models themselves cannot prove causation. However, path models do reflect theories about causation and can inform the researcher as to which hypothesized causal model best fits the pattern of correlations found within the data set. One of the advantages of using path analysis is that it forces researchers to explicitly specify how the variables relate to one another and thus encourages the development of clear and logical theories about the processes influencing a particular outcome. Path analysis is also advantageous in that it allows researchers to break apart or decompose the various factors affecting an outcome into direct effects and indirect components.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985004837

Racial Bias—McCleskey v. Kemp (1987)

Barry Latzer, David McCord, in Death Penalty Cases (Third Edition), 2011

II

McCleskey's first claim is that the Georgia capital punishment statute violates the Equal Protection Clause of the Fourteenth Amendment.7 He argues that race has infected the administration of Georgia's statute in two ways: persons who murder whites are more likely to be sentenced to death than persons who murder blacks, and black murderers are more likely to be sentenced to death than white murderers. As a black defendant who killed a white victim, McCleskey claims that the Baldus study demonstrates that he was discriminated against because of his race and because of the race of his victim. In its broadest form, McCleskey's claim of discrimination extends to every actor in the Georgia capital sentencing process, from the prosecutor who sought the death penalty and the jury that imposed the sentence, to the State itself that enacted the capital punishment statute and allows it to remain in effect despite its allegedly discriminatory application. We agree with the Court of Appeals, and every other court that has considered such a challenge, that this claim must fail.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820242000067

Is a forecasting method that examines the association between 2 or more variables?

The regression method of forecasting involves examining the relationship between two different variables, known as the dependent and independent variables.

Which model is based on correlation between two or more variables?

The multiple regression model is based on the following assumptions: There is a linear relationship between the dependent variables and the independent variables. The independent variables are not too highly correlated with each other.

What is regression forecasting method?

Linear regression is a statistical tool used to help predict future values from past values. It is commonly used as a quantitative way to determine the underlying trend and when prices are overextended.

What is the variable used to predict another variable called?

❖ The variable that researchers are trying to explain or predict is called the response variable. It is also sometimes called the dependent variable because it depends on another variable. ❖ The variable that is used to explain or predict the response variable is called the explanatory variable.