In multiple regression model What are the characteristics of a good predictor variable

Decision strategies and bad group decision-making: a group meeting experiment

Kazuhisa Takemura, in Escaping from Bad Decisions, 2021

11.3.5.3 Multiple regression analysis of discussion evaluation

Multiple regression analysis was conducted to examine the effects of three factors (decision-making strategy, group to which participants belonged to, and type of agenda) on individuals’ evaluation of the discussion process, evaluation of the discussion results, and overall satisfaction with the discussion. Specifically, multiple regression analysis was conducted with the evaluation of the discussion process, the evaluation of the discussion results, and the overall satisfaction of the discussion as the dependent variables, and the decision-making strategy, the group to which the participants belonged to, and the type of agenda as the independent variables. The models were evaluated by AIC. The models examined were a model predicting by intercept only (model number 1), a model predicting by intercept and decision strategy (model number 2), a model predicting by intercept and participant's group (model number 3), a model explaining by intercept and agenda type (model number 4), a model predicting by intercept, decision strategy, and participant's group (model number 5), a model predicting by intercept, decision strategy, and agenda type (model number 6), a model predicting by intercept, group participants belong to, and agenda type (model number 7), a model predicting by intercept, decision strategy, group participants belong to, and agenda type (model number 8), an interaction between the intercept and the decision strategy, the group to which the participant belongs, the decision strategy and the group to which the participant belongs (model number 9), an interaction between the intercept and the decision strategy, the type of agenda, the decision strategy and the type of agenda (model number 10), and an interaction between the intercept and the group to which the participant belongs, the type of agenda, the group to which the participant belongs and the type of agenda (model number 11). We compared the partial regression coefficients of the decision strategy in the model with the lowest AIC among the six models that included the decision strategy.

Multiple regression analysis was conducted to examine the influence of the three factors of decision-making strategy, the group to which the participants belonged to, and the type of agenda on the evaluation of the discussion process. As a result of comparing and ranking the AIC of each model, the model with the lowest AIC was the model that predicted the evaluation of the discussion process by the interaction of the decision strategy, the group to which the participants belonged to, and the decision strategy and the group to which the participants belonged to (model number 9), with an AIC of 2767.89. In other words, among the 11 models examined, the model that predicts the evaluation of the discussion process by the interaction of the decision strategy, the group to which the participant belongs to, and the group to which the participant belongs to with the decision strategy can be judged to be the model with the highest predictive ability.

The partial regression coefficients for the decision strategies in this model are shown in Table 5.6. The partial regression coefficients for DIS as the reference were −0.07 for WAD and −0.02 for LEX. There was no significant difference between WAD and DIS for either WAD or LEX [WAD: t(1140)=−0.63, n.s.; LEX: t(1140)=−0.18, n.s.]. There was no significant difference between WAD and DIS [WAD: t(1140)=−0.63, n.s.; LEX: t(1140)=−0.18, n.s.]. This suggests that when comparing WAD and DIS, and LEX and DIS, the ratings of the discussion process did not change, respectively.

Multiple regression analysis was conducted to examine the influence of the three factors of decision-making strategy, the group to which the participants belonged to, and the type of agenda on the evaluation of the outcome of the discussion. The lowest AIC model predicted the evaluation of the discussion process by the interaction of the decision-making strategy and the group to which the participants belonged to (model number 9), with an AIC of 3439.48. In other words, among the 11 models examined, the model that predicts the evaluation of the outcome of the discussion by the interaction of the decision strategy, the group to which the participant belongs to, and the group to which the participant belongs to with the decision strategy can be judged to be the model with the highest predictive ability.

The partial regression coefficients for the decision strategies in this model are shown in Table 5.6. The partial regression coefficients for DIS as the reference were −0.19 for WAD and −0.06 for LEX. There was no significant difference between WAD and DIS for either WAD or LEX [WAD: t(1140)=−1.21, n.s.; LEX: t(1140)=−0.41, n.s.]. There was no significant difference between WAD and DIS [WAD: t(1140)=−1.21, n.s.; LEX: t(1140)=−0.41, n.s.]. This suggests that when comparing WAD and DIS, and LEX and DIS, the evaluation of the results of the discussion did not change, respectively.

Analysis of overall satisfaction for discussion

Multiple regression analysis was conducted to examine the impact of the three factors of decision-making strategy, the group to which the participants belonged to, and the type of agenda on overall discussion satisfaction. As a result of comparing and ranking the AIC of each model, the model with the lowest AIC predicted the satisfaction of the entire discussion by the interaction of the decision strategy and the group to which the participant belonged to (model number 9), with an AIC of 3096.21. In other words, among the 11 models examined, the model that predicts the satisfaction of the entire discussion by the interaction of the decision strategy, the group to which the participant belongs to, and the group to which the participant belongs to with the decision strategy can be judged to be the model with the highest prediction ability.

The partial regression coefficients of the decision strategies in this model are shown in Table 11.4, where the partial regression coefficients for DIS were −0.11 for WAD and −0.04 for LEX [LEX: t(1140)=−0.31, n.s.]. This suggested that the overall satisfaction with the discussion did not change when comparing WAD to DIS and LEX to DIS, respectively.

Table 11.4. Akaike information criterion rank and partial regression coefficient o for decision strategy, group, and agenda.

ItemModel numberModelDISWADLEX
Process evaluation 9 Intercept+decision strategy+group+decision strategy × group 1.00 0.07 0.02
Outcome evaluation 9 Intercept+decision strategy+group+decision strategy × group 1.00 0.19 0.06
Satisfaction 9 Intercept+decision strategy+group+decision strategy × group 1.00 0.11 0.04

Notes: Group: group in which participants belonged to, decision strategy × group: interaction between decision strategy and group, process evaluation: evaluation of the discussion process, outcome evaluation: evaluation of outcome, and satisfaction: satisfaction level of discussion. DIS, Disjunctive; LEX, lexicographic; WAD, weighted-additive decision.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128160329000090

Results, Discussion, and Conclusion

Katerina Petchko, in How to Write About Economics and Public Policy, 2018

Results of Multiple Regression Analysis (MRA)

Recall that MRA is a statistical procedure that assesses the relationship between a dependent variable and several predictor variables. The estimates generated by MRA are called coefficients. Using MRA, we can calculate the amount of variance in the dependent variable that is accounted for (= explained) by the variation in each of the independent variables. This calculation shows the relative importance of each independent variable to the relationship.

It is beyond the scope of this book to provide a detailed treatment of MRA as a statistical technique. For a basic interpretation of MRA results in economics, consult Greenlaw (2009). For advanced information on MRA and other statistical techniques, you may wish to consult Tabachnick and Fidell’s Using Multivariate Statistics.

In an MRA study, the following information generated by regression software is usually reported.

The size and sign of regression coefficients. The size of regression coefficients shows how much each predictor variable contributes on its own to the variance in the dependent variable after the effects of all the other predictor variables in the model have been statistically removed. In their standardized form (as β), regression coefficients are a measure of the importance of each variable, allowing researchers to compare the relative importance of the predictors. In economics and public policy, the sign of regression coefficients is also important and it is discussed in comparison with the expected (or hypothesized) sign predicted from theory: Do the explanatory variables have the expected sign?

Statistical significance for each estimated coefficient, which is determined by comparing the p-value (or significance probability) associated with a coefficient with the chosen level of significance. If the p-value is smaller, the coefficient is interpreted as being statistically significant; if it is greater, the coefficient is interpreted as being nonsignificant, or as not being significant. There are many variations in the reporting and interpretation of null hypothesis significance testing in public policy and economics. For example, in economics, three significance levels are commonly used: 1%, 5%, and 10% and results are often described as being “statistically significant at the 1% (or 5%, or 10%) significance level.” The 10% significance level is uncommon in other disciplines, for example, in sociology or education, where results with p-values that are greater than .05 (5%) are interpreted as being nonsignificant.

Alternatively, when reporting statistical significance, researchers may simply indicate whether the generated p-values are smaller than the level of significance. In this case, authors indicate statistically significant values with asterisks—a single asterisk (⁎) for p < .01, a double asterisk (⁎⁎) for p < .05, and a triple asterisk (⁎⁎⁎) for p < .1—and use a note under the table to show what the asterisks refer to.

In some research areas, authors may provide the exact p-values (e.g., p = .58). Providing the exact p-values is especially common in psychological and educational research, but it is fairly uncommon in economics. In some areas, confidence intervals are commonly used to indicate significance levels.

Because of the great variability among disciplines in reporting statistical significance, it is important to find out what is common in the particular area you are working in and report statistical results using the conventions of your field.

“Goodness-of-fit” statistics. These statistics show how well the model you are testing explains the data: How much variance in the dependent variable is explained by the combination of the predictors? The F-statistic is used to determine if all the coefficients in the model are statistically significant, whereas R2 (or adjusted R2) is used to determine the overall amount of variance in the dependent variable that is explained by all the predictor variables in combination.

Greenlaw (2009, p. 217) gives good advice for interpreting R2: “R2 for cross-section data is generally less than R2 for time-series data. Econometricians typically consider a time-series regression to be “good’ if it results in an R2 of 0.8 or higher. By contrast, a cross-section regression is considered “good” if it has an R2 of only half that: 0.4 or above.”

Regression results are always presented in table form. A typical regression table includes the following information: regression coefficients, standard errors (in parentheses), statistics indicating significance, and goodness-of-fit statistics. It is important to stress here that regression tables that are included in a paper are always constructed and never copied directly from regression output provided by the regression software. Later in this section, I give suggestions for formatting tables in a quantitative study.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128130100000144

Projections and Risk Assessment

Morton Glantz, Johnathan Mun, in Credit Engineering for Bankers (Second Edition), 2011

11 Multiple Regression

To run the multiple regression analysis, follow these steps:

1.

Start Excel and open the example model Risk Simulator | Example Models | 01 Advanced Forecast Models.

2.

Go to the Regression worksheet.

3.

Select the data area including the headers or cells B5:G55 and click on Risk Simulator | Forecasting | Regression Analysis. Select the Dependent Variable as the variable Y, leave everything else alone, and click OK. Review the generated report.

Exercise Question: Which of the independent variables are statistically insignificant, and how can you tell? That is, which statistic did you use?

Exercise Question: How good is the initial model’s fit?

Exercise Question: Delete the entire variable columns of data that are insignificant and rerun the regression (i.e., select the column headers in Excel’s grid, right-click, and delete). Compare the R-Square and Adjusted R-Square values for both regressions. What can you determine?

Exercise Question: Will R-square always increase when you have more independent variables, regardless of their being statistically significant? How about adjusted R-square? Which is a more conservative and appropriate goodness-of-fit measure?

Exercise Question: What can you do to increase the adjusted R-Square of this model? Hint: Consider nonlinearity and some other econometric modeling techniques.

Exercise Question: Run an Auto-Econometric model on this dataset and select the nonlinear and interacting option and see what happens. Does the generated model better fit the data?

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123785855100089

Research Proposals

Katerina Petchko, in How to Write About Economics and Public Policy, 2018

Methodology

A quantitative paper states a hypothesis and tests it using statistical tools (e.g., multiple regression analysis) to produce generalizable results. If you are writing a quantitative proposal, explain what kind of data you will use and where the data will come from. Explain your empirical methodology, also what model you will use, what variables you will include, and how you will measure them.

A qualitative empirical paper explores a phenomenon or a process using multiple sources of information including in-depth interviews, documents, and observation. If you are writing a proposal for a qualitative study, explain why and/or how you selected your case(s), what data you plan to use, and how you will collect the data.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128130100000077

Research in Public Policy and Economics

Katerina Petchko, in How to Write About Economics and Public Policy, 2018

Which Approach Is Prevalent in Public Policy Programs?

Since the 1970s, public policy literature has been characterized by the pervasive use of quantitative methods of data collection and analysis including survey research, quasiexperimental research, multiple regression analysis, cost-benefit analysis, and economic modeling. Although public policy research became more diversified in the 1990s (Radin, 2000) and began to include qualitative studies, quantitative research remains prevalent in public policy and, especially, policy analysis, both in journal publications and in educational curricula. For example, in a review of educational curricula of 44 programs in public policy and policy analysis taught at leading public policy schools in the United States, Morçöl and Ivanova (2010) found that quantitative courses constituted an overwhelming majority of courses taught at both the master's (88%) and doctoral (79%) levels. They also found that the most frequently taught method of data collection was survey and the most frequently taught method of data analysis was multiple regression analysis. This emphasis on quantitative methods is also reflected in the predominantly quantitative types of papers that students in public policy programs are often required to write.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128130100000028

Data and Methodology

Katerina Petchko, in How to Write About Economics and Public Policy, 2018

Quantitative vs. Qualitative Data Analysis

There are many different strategies and techniques for data analysis in public policy and economics; some are more common in a particular research area than others. The choice of the particular strategy for data analysis is dictated largely by the purpose of your study—by its research question—and by the type of data that are used.

Quantitative research questions usually ask about relationships among multiple variables, and data are usually observational rather than experimental. By far, the most common tool used to analyze such data is multiple regression analysis. Multiple regression analysis allows researchers to assess the strength of the relationship between an outcome (the dependent variable) and several predictor variables as well as the importance of each of the predictors to the relationship, often with the effect of other predictors statistically eliminated.

It is important to point out, however, that multiple regression analysis is a statistical technique, not a research design, and as such, it does not establish causation. This is because multiple regression builds on correlation, which shows mere associations between variables. To infer a causal relationship, re- searchers need to eliminate bias resulting, for example, from variables that cannot be observed. This can be done by design—through experimental manipulation of variables, or by using statistical controls. The second option is much more common in studies of public policy and economics. Various approaches can be used to minimize bias due to reverse causality and omitted variables. Panel regression with fixed effects is one example of a commonly used approach in economics research. However, panel regression requires the use of panel data, which may not always be available, and they, too, have limitations. It is, therefore, wise to keep in mind when interpreting results, that even under the best of circumstances, statistical controls are never fool-proof.

Qualitative data analysis in public policy depends on whether the study is data-based or literature-based. In data-based studies (e.g., studies based on data collection through interviews, focus group discussions, or participant observation), data analysis involves transcribing and coding participants’ responses and/or the researcher’s notes by identifying certain themes or patterns in the data that help answer the research question(s). In many ways, qualitative data analysis is an attempt to reduce a very large amount of qualitative data—participants’ responses and comments—to a few themes. For example, if your study has looked at how poor women in rural areas cope with violence, you may want to analyze the women’s responses to identify the strategies that they have used. You would have to make many subjective decisions about what the women’s responses really mean and you would need to be very clear about how you made those decisions. Using multiple sources of data (e.g., interviews + documents + observation) in a qualitative study is one strategy to reduce subjectivity.

In literature-based studies, there are usually no data and a paper may be based solely on summarizing an often arbitrary selection of studies or other documents. In such studies, it is common for authors to explain how the literature was located, how the specific studies and documents were selected, and how they help answer the research question(s).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128130100000132

Hypothesis Testing

Laura Lee Johnson, ... Pamela A. Shaw, in Principles and Practice of Clinical Research (Fourth Edition), 2018

Multiple Comparisons

When making many statistical comparisons, i.e., performing multiple hypothesis tests, a certain fraction of the test statistics will be statistically significant even when the null hypothesis is true. In general, when a series of tests is performed at the α significance level, approximately α × 100% of tests will be significant at the α level even when the null hypothesis for each test is true. For example, even if the null hypotheses are true for all tests, when conducting many independent hypothesis tests at the 0.05 significance level, on average (in the long term) 5 of 100 tests will be significant by chance alone. Issues of multiple comparisons arise in various situations, such as in clinical trials with multiple end points and multiple looks at the data. By doing multiple tests, you naturally increase your chances of making a type I error if no adjustment is made to the usual testing framework for a single test statistic. Pairwise comparison among the sample means of several groups is also an area in which issues of multiple comparisons may be of concern. For k groups, there are k(k – 1)/2 pairwise comparisons, and just by chance some may reach significance. Our last example is with multiple regression analysis in which many candidate predictor variables are tested and entered into the model. Some of these variables may result in a significant result just by chance. With an ongoing study and many interim analyses or inspections of the data, with no adjustment for performing multiple comparisons, we have a high chance of rejecting the null hypothesis at some time point even when the null hypothesis is true.

There are various approaches to the multiple comparisons problem. First, consider if multiple comparisons is actually a problem. If we ask multiple questions we expect multiple answers. If we ask related questions we expect related answers. Looking at the totality of the evidence when interpreting results is far more useful than overzealous correction for multiple comparisons (or ignoring all but the single significant p-value out of 50). One rather informal approach to multiple comparisons is to choose a significance level α lower than the traditional 0.05 level (e.g., 0.01) to prevent many false-positive conclusions or to “control the false discovery rate.” The number of comparisons should be made explicit in the article. More formal approaches to control the “experiment–wise” type I error using corrections for multiple comparisons have been proposed. An example is the Bonferroni correction, in which the type I error rate is taken as α/n, where n is the number of comparisons made. Another class of methods has been developed to correct for multiple comparisons that result from monitoring trial results during the trial. Interim monitoring methods that control the type I error rate are available for various study designs and are discussed further in Chapter 27.14 The classic reference by Hochberg and Tamhane provides a broader discussion of methodology to adjust for multiple comparisons.15

It is best to address the issue of multiple comparisons during the design stage of a study. One should determine how many comparisons will be made and then explicitly state these comparisons. Studies should generally be designed to minimize the number of statistical tests at the end of the study. Ad hoc solutions to the multiple comparisons problem may be done for exploratory or epidemiologic studies. Multiple comparison adjustments should be made for the primary analyses of definitive studies (such as phase III confirmatory studies) to rigorously maintain the type I error rate, i.e., the probability of falsely rejecting any null among those tested, at the chosen α level. Studies that focus on a single primary outcome and data analyzed at the end of study avoid the issue of multiple comparisons. The topic of multiple comparisons is expanded in Chapter 27.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128499054000241

Revisiting the group decision-making experiment

Kazuhisa Takemura, in Escaping from Bad Decisions, 2021

13.4 Results

To examine the factors that cause irrational meetings, we created eight videos of meeting scenes and conducted an experiment. Then, analyses were conducted to examine the effects of the outcome of the meeting decision, the emphasis of the risk, and the compliance with the rules on the desirability of the meeting decision and process. Specifically, comparison of means, analysis of variance, analysis of covariance, and analysis of correlation were conducted.

13.4.1 Analysis of the desirability of a meeting decision

First, we calculated the means of the desirability of the decision in each video. In Fig. 13.6 the first left item 1 indicates the low risk and risky outcome (Matsuzaka beef) condition. The other numbers of items are follows: Item 2 for the high risk and risky outcome condition, Item 3 for the low risk and riskless outcome (Imported beef), Item 4 for the high risk and riskless outcome condition, Item 5 for the rule noncompliance and risky outcome condition, Item 6 for the rule compliance and risky outcome condition, Item 7 for the noncompliance and riskless outcome condition, and Item 8 for the rule compliance and riskless outcome condition. For clarity, when the final decision result was Matsuzaka beef, the bar graph was set to “Matsuzaka” and colored black, and when the final decision result was imported beef, the bar graph was set to “Import” and colored white.

In multiple regression model What are the characteristics of a good predictor variable

Figure 13.6. Mean and standard deviation (SD) for the desirability of decision in each condition.

Fig. 13.6 shows the mean and the SD for the desirability of decision regardless of the emphasis of the risk or the compliance with the rule; the desirability of the decision of the meeting tended to be higher at the level where the outcome of the decision was imported cattle. Next, we conducted a two-level analysis of variance for each of the two between-subjects factors separately for risk intensity and decision outcome and for rule compliance and decision outcome.

To examine the influence of the degree of risk and the decision of the meeting on the desirability of the decision, we conducted a two-level analysis of variance for each of the two between-subjects factors of risk (high risk vs low risk) and decision outcome (Matsuzaka beef (risky outcome) vs imported beef (riskless outcome)). As a result, a significant main effect was found for the decision outcome in the meeting (F(1,59)=21.25, P<.001). Regardless of the emphasis of the risk, the final decision for imported beef was evaluated as more desirable than that for Matsuzaka beef.

Next, to examine the effects of rule compliance and decision outcome on the desirability of the decision, we conducted a two-level analysis of variance for each of the two between-subjects factors of rule compliance (rule compliance vs rule noncompliance) and decision outcome (Matsuzaka beef vs imported beef). As a result, a significant main effect was found for the decision outcome in the meeting (F(1,57)=4.98, P<.05). Regardless of whether the rule was adhered to or not, the final decision for imported beef was evaluated as more desirable than that for Matsuzaka beef.

In the multiple regression analysis of the desirability of the decision outcome of the meeting, since it was not possible to analyze the three factors together due to the experimental design, the decision outcome (Matsuzaka beef vs imported beef), risk (high vs low), and compliance with the rule (control vs rule compliance vs rule noncompliance) were used as independent variables. Multiple regression analysis was conducted to predict the desirability of the decision from these independent variables. The control condition for the presence or absence of rule compliance was used as the low risk condition, in which there was no conflict over the rule in the majority vote. The results showed a positive partial regression coefficient that was significant at the 0.1% level (t=4.81, P<.001). In other words, it was recognized that the outcome of the meeting was more desirably evaluated when the final decision was made for imported cattle. There was a significant negative partial regression coefficient at the 5% level in the noncompliance condition (t=−2.07, P<.05). In other words, the condition in which the decision was overturned due to noncompliance with the rules was rated as significantly less desirable than the control condition of rule compliance.

We conducted an analysis of covariance with the decision outcome as the predictor variable, the desirability of the process in the meeting as the covariate, and the desirability of the decision in the meeting as the dependent variable. The results showed that the desirability of the meeting decision was significantly higher in the condition in which the decision outcome was imported beef than in the condition in which the decision outcome was Matsuzaka beef (t=5.18, P<.001).

Next, we conducted an analysis of covariance with risk intensity as the predictor variable, desirability of the meeting process as the covariate, and desirability of the meeting decision as the dependent variable. The results showed that there was no effect of risk intensity on the desirability of meeting decisions (t=1.30, n.s.).

13.4.2 Analysis of the desirability of the meeting process

Fig. 13.7 shows the mean and SD of the desirability of the meeting process for each video as a bar graph. The numbers of items indicate the same conditions as mentioned in the previous section.

In multiple regression model What are the characteristics of a good predictor variable

Figure 13.7. Mean and SD for the desirability of meeting process in each condition.

Unlike the desirability of the decision, no difference was found between the decision results of Matsuzaka and imported cattle. In addition, regardless of the final decision result, the evaluation tended to be higher in the rule compliance condition than in the rule noncompliance condition. Next, we conducted a two-level analysis of variance for each of the two between-subjects factors, dividing the results into the emphasis of risk and decision outcome, and the presence or absence of rule compliance and decision outcome.

To examine the influence of risk emphasis and decision outcome on the desirability of the process, we conducted a two-level analysis of variance for each of the two between-subjects factors: risk (high vs low) and decision outcome (Matsuzaka vs imported beef). The main effects of risk (high vs low) and decision outcome (Matsuzaka vs imported beef) on the desirability of the process in the meeting and their interactions were not significant (F(1,59)=0.02, n.s., F(1,59)=0.40, n.s.).

Next, to examine the influence of rule compliance and decision outcome on the desirability of the process, we conducted a two-level analysis of variance between subjects for rule compliance (adherence vs nonadherence) and decision outcome (Matsuzaka beef vs imported beef). A two-level analysis of variance was conducted for each factor. As a result, no factor was found to affect the desirability of the process in the meeting.

As in the multiple regression analysis of the desirability of decision-making, we conducted a multiple regression analysis to predict the desirability of the process in the meeting from the three factors of decision outcome, risk intensity, and rule compliance. As a result, no independent variable was found to influence the dependent variable (for all t <2.00, n.s.).

An analysis of covariance was conducted with the rule compliance factor as the predictor variable, desirability of meeting decisions as the covariate, and desirability of meeting process as the dependent variable. The results of the analysis of covariance showed that there was no effect of the factors of noncompliance (t=−1.39, n.s.) and compliance (t=0.69, n.s.) on the desirability of the meeting process.

13.4.3 Correlation analysis

Next, we calculated the correlation coefficients between the eight conditions and the desirability of the decision and the desirability of the process in the meeting. As a result, a significant moderate positive correlation was found when the decision outcome was Matsuzaka beef (high risk→Matsuzaka) in the high risk (risk emphasized) condition (r=0.57, P<.05). A significant, moderate, positive correlation (r=0.66, P<.01) was also found when the decision result was imported beef (low risk→imported) in the low risk condition. In the rule compliance condition a significant, moderate, positive correlation was found regardless of the decision outcome (Matsuzaka beef: r=0.53, P<.05; imported beef: r=0.52, P<.05).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128160329000041

Types of Variables and Measurement and Accuracy Scales

Luiz Paulo Fávero, Patrícia Belfiore, in Data Science for Business and Decision Making, 2019

2.4 Types of Variables × Number of Categories and Scales of Accuracy

Qualitative or categorical variables can also be classified based on the number of categories: (a) dichotomous or binary (dummies), when they only take on two categories; (b) polychotomous, when they take on more than two categories.

On the other hand, metric or quantitative variables can also be classified based on the scale of accuracy: discrete or continuous.

This classification can be seen in Fig. 2.11.

In multiple regression model What are the characteristics of a good predictor variable

Fig. 2.11. Qualitative variables × Number of categories and Quantitative variables × Scales of accuracy.

2.4.1 Dichotomous or Binary Variable (Dummy)

A dichotomous or binary variable (dummy) can only take on two categories, and the values 0 or 1 are assigned to these categories. Value 1 is assigned when the characteristic of interest is present in the variable and value 0 if otherwise. As examples, we have: smokers (1) and nonsmokers (0), a developed country (1) and an underdeveloped country (0), vaccinated patients (1) and nonvaccinated patients (0).

Multivariate dependence techniques have as their main objective to specify a model that can explain and predict the behavior of one or more dependent variables through one or more explanatory variables. Many of these techniques, including the simple and multiple regression analysis, binary and multinomial logistic regression, regression for count data, and multilevel modeling, among others, can easily and coherently be applied with the use of nonmetric explanatory variables, as long as they are transformed into binary variables that represent the categories of the original qualitative variable. In this regard, a qualitative variable with n categories, for example, can be represented by (n − 1) binary variables.

For instance, imagine a variable called Evaluation, expressed by the categories good, average, or bad. Thus, two binary variables may be necessary to represent the original variable, depending on the researcher’s objectives, as shown in Table 2.7.

Table 2.7. Defining Binary Variables (Dummies) for the Variable Evaluation

Binary Variables (Dummies)
EvaluationD1 D2
Good0 0
Average1 0
Bad0 1

Further details about the definition of dummy variables in confirmatory models will be discussed in Chapter 13, including the presentation of the operations necessary to generate them on software such as Stata.

2.4.2 Polychotomous Variable

A qualitative variable can take on more than two categories and, in this case, it is called polychotomous. As examples, we can mention social classes (lower, middle, and upper) and educational levels (elementary school, high school, college, and graduate school).

2.4.3 Discrete Quantitative Variable

As described in Section 2.2.2, discrete quantitative variables can take on a finite set of values that frequently come from a count, such as, for example, the number of children in a family (0, 1, 2…), the number of senators elected, or the number of cars manufactured in a certain factory.

2.4.4 Continuous Quantitative Variable

Continuous quantitative variables, on the other hand, are those whose possible values are in an interval with real numbers and result from a metric measurement, as, for example, weight, height, or an individual’s salary (Bussab and Morettin, 2011).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128112168000021

Demographic Transition and Savings Behavior in Mauritius

Rafael Munozmoreno, ... Raja Vinesh Sannassee, in Emerging Markets and the Global Economy, 2014

4.2 Microeconomic Modeling

4.2.1 Survey Data

We use data from Household Budget Survey (HBS) conducted in 2 years 2001/2002 and 2006/2007. The HBS 2001/02 covers a sample of 6720 households, out of an estimated 300,000 private households in the country. Similarly the HBS 2006/07 surveys a sample of 6720 households, out of an estimated total 335,000 households. Each sample was selected to be representative of all households in the country through a stratified two-stage design with probability proportional to size. The survey questionnaire covers information about the household and household member characteristics such as demographics, education; family size, occupation, expenditures; assets and housing conditions among others. We use the Ordinary Least Squares estimation technique for our empirical analysis.

4.2.2 Methodology

We use a measure of household saving built on the information on income and expenditure flows provided by the HBS database. We compare the average monthly income of households and their consumption expenditures, and evaluate the part of their income that households can save. In order to identify which factors explain household saving, we estimate different models. A reduced-form approach is adopted, taking into account a variety of saving determinants identified in the literature (Edwards, 1996; Loayza et al., 2000; Schmidt-Hebbel and Servén, 2000). The estimations are undertaken using Ordinary Least Squares after some robustness checks.

Model Specification

Our specification includes as dependent variable savings as a share of income. The econometric equation is as follows:

(2)lnsavincomei=α+β1HHi′+β2Xi′+εi,

where the dependent variable lnsavincomei is savings behavior of the head of household (that is the ratio of savings to income), HHi′ denotes a vector of dummies for different types of households and Xi′ is a vector including the characteristics of the household and the profile of the head of household. εi is a random error assumed to be independent and identically distributed. Multiple regression analysis is carried out to find determinants of household savings.

The list of determinants is the monthly household income of the household head; the gender of the household head; age and age-squared of the household head, the household size; the activity status of the household that is whether the household is employed, unemployed, self-employed, or retired and the location of the household that is district dummies.

4.2.3 Data Analysis

In this section, we analyze the income distribution and consumption pattern of all households.

Income Pattern

A comparison with poor households is also given (as per the definition of Statistics Mauritius). The income used in our analysis refers to the total household resources which comprises mainly income from employment, transfers, property, and imputed rent that is, an equivalent rental value of non-renting households. It should also be pointed out that the income refers to the income at current prices at both 2001/02 and 2006/07 HBS. In order to allow comparison over time, we have adjusted for price increase from 2001/02 to 2006/07.

In 2006/07, the majority (around 87%) of poor households derived a monthly income less than Rs 10,000 compared with 17% for all households. Comparison over time shows that the percentage of poor households deriving an income higher than Rs 7,500 increased from 11% in 2001/02 to 45% in 2006/07 (see Table 2).

Table 2. Distribution (%) of all households by income class, HBS 2001/02 and 2006/07

Monthly household disposable income (Rs)2001/022006/07
Households (%)Total income (%)Households (%)Total income (%)
Under 3,000 3.5 0.5 2.1 0.2
3,000 to <4,000 3.2 0.8 1.7 0.3
4,000 to <5,000 3.5 1.1 2.7 0.6
5,000 to <6,000 5.0 1.9 2.8 0.8
6,000 to <7,000 6.6 3.0 3.9 1.3
7,000 to <8,000 6.8 3.5 3.9 1.5
8,000 to <9,000 7.3 4.4 4.7 2.1
9,000 to <10,000 6.7 4.5 5.1 2.5
10,000 to <12,000 11.8 9.0 10.7 6.1
12,000 to <14,000 9.2 8.4 9.7 6.6
14,000 to <16,000 7.3 7.6 9.1 7.2
16,000 to <20,000 9.8 12.2 12.1 11.3
20,000 to <25,000 7.6 11.8 10.5 12.3
25,000 to <30,000 4.4 8.5 6.5 9.4
30,000 to <35,000 2.5 5.7 3.8 6.4
35,000 to <40,000 1.5 3.9 3.0 5.8
40,000 & over 3.3 13.2 7.7 25.5

Total 100.0 100.0 100.0 100.0

Statistics Mauritius, 2007

In 2006/07, the average monthly household income of poor households stood at Rs 7,055, compared with Rs 22,2423 for all households, thus showing that the income for all households was more than three times higher than that for poor households. A similar situation is observed in 2001/02. However, comparison of data from 2001/02 to 2006/07 shows that the average monthly household income of poor households grew by 38.9% against 33.6% for all households. Removing the effect of change in prices over the five-year period, the income of poor households grew by 3.5% while that of all households dropped by 0.5% (see Table 3).

Table 3. Average monthly household income (Rs) of poor households and all households, HBS 2001/02 and 2006/07

Average monthly household incomePercentage increase 2001/02 to 2006/07
2001/022006/07%%
Poor households 5,078 7,055 38.9 3.5
All households 16,642 22,242 33.6 −0.5

Statistics Mauritius, 2007

Income from paid employment represented the main source of income for both poor and all households (see Table 4). The share of income from paid employment over total gross income stood at 41.0% for poor households and 59.5% for all households. After removing the effect of price changes during the five-year period, income from paid employment grew by 0.6% for poor households but dropped by 2.2% for all households.

Table 4. Average monthly household income (Rs) of poor households and all households by source of income, HBS 2001/02 and 2006/07.

2001/022006/07
Poor householdsAll householdsPoor householdsAll households
Paid employment 2,152 10,258 2,906 13,463
Self-employment 886 2,592 1,140 2,928
Transfers 1,100 1,562 1,698 2,630
Other income* 977 2,693 1,342 3,603
Average monthly gross household income 5,115 17,105 7,086 22,624
Deductions 37 463 31 382
Average monthly household income 5,078 16,642 7,055 22,242

*Income includes property income, imputed rent for non-renting households, and income from own produced goods and services.

Transfers (income from social security benefits, pension from employer, alimony, allowances from parents and relatives, etc.) constituted the second main source of income for the poor. The share of transfer income over total income represented 24.0% for poor households against 11.6% for all households. Removing the effect of price changes over the five-year period, transfer income grew by 15% for poor households against 25% for all households. On average, female-headed poor household earned less income than male-headed household in both 2001/02 and 2006/07. Between 2001/02 and 2006/07, income of male-headed and female-headed household increased by around 37% and 40%, respectively.

Expenditure Pattern

In 2006/07, 41.7% of the poor households spent less than Rs 5000 per month compared with 9.8% for all households. On the other hand, only 12.0% of the poor households spent Rs 10,000 or more per month compared with 56.5% for all households. Comparison over time shows that the percentage of poor households spending at least Rs 5000 increased from 31.8% in 2001/02 to 58.3% in 2006/07. The corresponding percentage for all households increased from 79.3% to 90.2%. It should be also pointed out that the proportion of poor households spending Rs 10,000 or more increased from 3.8% to 12.0% while for all households, the corresponding percentage increased from 36.4% to 56.5% (see Table 5).

Table 5. Distribution (%) of poor households and all households by consumption expenditure class, HBS 2001/02 and 2006/07

Consumption expenditure class (Rs)2001/022006/07
Poor householdsAll householdsPoor householdsAll households
Below 2,500 22.8 3.8 7.3 1.3
2,500 to <5,000 45.4 16.9 34.4 8.5
5,000 to <7,500 20.5 23.7 31.1 15.6
7,500 to <10,000 7.5 19.2 15.2 18.1
Total 100.0 100.0 100.0 100.0
Average monthly* 4,384 10,220 6,500 14,300
 household
 consumption
 expenditure

*The expenditure figures for 2001/02 have not been adjusted for infrequently purchased items such as air-tickets, household appliances, etc., while for 2006/07 an adjustment has been made.

Statistics Mauritius, 2007

In 2006/07, 41.7% of the poor households spent less than Rs 5,000 per month compared to 9.8% for all households. On the other hand, only 12.0% of the poor households spent Rs 10,000 or more per month compared with 56.5% for all households. Comparison over time shows that the percentage of poor households spending at least Rs 5,000 increased from 31.8% in 2001/02 to 58.3% in 2006/07. The corresponding percentage for all households increased from 79.3% to 90.2%. It should be also pointed out that the proportion of poor households spending Rs 10,000 or more increased from 3.8% to 12.0% while for all households, the corresponding percentage increased from 36.4% to 56.5%.

From Table 6 below, we note that expenditure of households is mainly concentrated in food items and non-alcoholic beverages. Transport is the next expenditure item of Mauritian households (15.2% in 2006/07 compared with 13.9% in 2001/02). This may have declined over the years with free transport facilities provided to students and the elderly. Housing water, electricity, and gas also have a high expenditure share.

Table 6. Adjusted average monthly household consumption expenditure by COICOP division—2001/02 and 2006/07 HBS

Division2001/022006/07
Rs%Rs%
1. Food and non-alcoholic beverages 3,401 29.9 4,504 29.7
2. Alcoholic beverages and tobacco 979 8.6 1,448 9.5
3. Clothing and footwear 686 6.0 803 5.3
4. Housing, water, electricity, gas, and other fuels 1,094 9.6 1,492 9.8
5. Furnishing, household equipment, and routine household maintenance 909 8.0 1,015 6.7
6. Health 321 2.8 466 3.1
7. Transport 1,583 13.9 2,312 15.2
8. Communication 359 3.1 568 3.7
9. Recreation and culture 607 5.3 759 5.0
10. Education 273 2.4 510 3.4
11. Restaurants and hotels 567 5.0 680 4.5
12. Miscellaneous goods and services 610 5.4 631 4.2

Total 11,390 100.0 15,188 100.0

Statistics Mauritius, 2007

On average, all households spent 50% more on food than poor households (Rs 4,500 against Rs 3,000). Also, the expenditure of all households on clothing and footwear, health, education, and transport was around 3–5 times that of poor households. Compared with all households, poor households had larger shares of their expenditure on “food and non-alcoholic beverages” (46% against 32%) and “housing, water, electricity, gas, and other fuels” (15% against 11%) in 2006/07.

Household Debt

In 2006/07, the percentage of indebted households, that is households having made at least one loan repayment, is estimated at 46% for all households against 20% for poor households. On the average, poor indebted households disbursed Rs 1,401 per month on loan repayment against Rs 4,353 for all households. The highest loan repayment for the poor households was on housing (Rs 2,491), whereas for all households the highest loan repayment was on motor vehicle (Rs 4,036) (see Table 7).

Table 7. Average monthly loan repayment for poor indebted households and all indebted households by selected item of debt, HBS 2006/07

Item of debtPoor householdsAll households
Percentage of indebted poor householdsAverage household debt (Rs)Percentage of indebted poor householdsAverage household debt (Rs)
Housing 26.1 2,491 54.7 3,891
Furniture 25.9 670 14.8 1,214
Audio and household appliances 40.9 633 27.9 1,133
Motor/vehicles 0.0 0 11.6 4,036
Other loan 29.8 923 40.0 2,757

Statistics Mauritius, 2007

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124115491000065

What makes a good predictor variable?

Generally variable with highest correlation is a good predictor. You can also compare coefficients to select the best predictor (Make sure you have normalized the data before you perform regression and you take absolute value of coefficients) You can also look change in R-squared value.

How do you know if a regression line is a good predictor?

The best way to take a look at a regression data is by plotting the predicted values against the real values in the holdout set. In a perfect condition, we expect that the points lie on the 45 degrees line passing through the origin (y = x is the equation). The nearer the points to this line, the better the regression.

What are predictor variables in a multiple regression?

The variables you base your prediction on are called the predictor variables (or IVs) While simple linear regression only enables you to predict the value of one variable based on the value of a single predictor variable; multiple regression allows you to use multiple predictors. Worked Example.

What is the characteristic of a multiple regression model?

The main characteristic of the multiple regression model is that it is linear in parameters since, as the simple regression model, both the dependent variable and the explanatory variable can be nonlinear transformations of other variables.