Which type of data cross sectional vs time series is more important to research?

Cohort studies come into the category of what are called “quasi-experimental designs.” One cannot allocate individuals randomly to different conditions in the way that the “true experiment” demands. Hence, one can never be certain that a variable identified with an hypothesized cause is not confounded with another variable that has not been measured. Is it aspiration or social class that one is really measuring? On the other hand, one can unravel how particular life histories develop, and draw strong quantitative or qualitative inferences through the experience that some individuals rather than others have had, as to what has shaped them. Who moves up through the social structure, who moves downward, and who stays immobile?

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985000086

Time Series

Andrew F. Siegel, in Practical Business Statistics�(Sixth Edition), 2012

A time series is different from cross-sectional data because ordering of the observations conveys important information. In particular, you are interested in more than just a typical value to summarize the entire series (the average, for example) or even the variability of the series (as described by, say, the standard deviation). You would like to know what is likely to happen next. Such a forecast must carefully extend the most recent behavior with respect to the patterns over time, which are evident in past behavior. Here are some examples of time-series situations:

One: In order to prepare a budget for next quarter, you need a good estimate of the expected sales. This forecast will be the basis for predicting the other numbers in the budget, perhaps using regression analysis. By looking at a time series of actual quarterly sales for the past few years, you should be able to come up with a forecast that represents your best guess based on the overall trend in sales (up, you hope) and taking into account any seasonal variation. For example, if there has always been a downturn from fourth quarter (which includes the holiday shopping season) to first quarter, you will want your forecast to reflect the usual seasonal pattern.

Two: In order to decide whether or not to build that new factory, you need to know how quickly your market will grow. Analyzing the available time-series data on industry sales and prices will help you evaluate your chances for success. But don't expect to get exact answers. Predicting the future is a tricky and uncertain business, even with all of the computerized help you can get. Although time-series analysis will help you by providing a “reality check” to your decision making, substantial risk may still remain.

Three: By constantly monitoring time-series data related to your firm, both internal (sales, cost, etc.) and external (industrywide sales, imports, etc.), you will be in the best position to manage effectively. By anticipating future trends corresponding to those you spotted in the early stages, you will be ready to participate in growth areas or to move away from dead-end markets. By anticipating seasonal needs for cash, you can avoid the panic of having too little and the costs of having too much. By anticipating the need for inventory, you can minimize the losses due to unfilled orders (which help your competition) and the costs (interest and storage) of carrying too much. There is a tremendous amount of valuable information contained in these time-series data sets.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123852083000146

Time Series Forecasting

Vijay Kotu, Bala Deshpande, in Data Science (Second Edition), 2019

12.4.1 Windowing

The purpose of windowing is to transform the time series data into a generic machine learning input dataset. Fig. 12.28 shows a sample windowing and cross-sectional data extraction from the time series dataset.

Which type of data cross sectional vs time series is more important to research?

Figure 12.28. Windowing process. (A) original time series and (B) cross-sectional data set with consecutive windows.

The characteristics of the windows and the cross-sectional data extractions are specified by the parameters of the windowing process. The following parameters of windowing allow for changing the size of the windows, the overlap between consecutive windows, and the prediction horizon which is used for forecasting.

1.

Window Size: Number of lag points in one window excluding the target data point.

2.

Step: Number of data points between the first value of the two consecutive windows. If the step is 1, maximum number of windows can be extracted from the time series dataset.

3.

Horizon width: The prediction horizon controls how many records in the time series end up as the target variable. The common value for the horizon width is 1.

4.

Skip: Offset between the window and horizon. If the skip is zero, the consecutive data point(s) from the window is used for horizon.

In Fig. 12.28, the window size is 6, step is 1, horizon width is 1, and skip is 0.

Thus, the series data are now converted into a generic cross-sectional dataset that can be predicted with learning algorithms like regression, neural networks, or support vector machines. Once the windowing process is done, then the real power of machine learning algorithms can be brought to bear on a time series dataset.

Model Training

Consider the time series dataset shown in Fig. 12.28A. The dataset refers to historical monthly profits from a product, from January 2009 to June 2012. Suppose the objective in this exercise is to develop profitability forecasts for the next 12 months. A linear regression model can be used to fit the cross-sectional dataset shown in Fig. 12.28B using the technique described in Chapter 5, Regression Methods. The model will be:

inputYt+1(label)=0.493×inputYt−5+0.258×inputYt−4+0.107×inputYt−3−0.098×inputYt−2−0.073×inputYt−1+0.329×inputYt−0+0.135

Training the model is quite straightforward. The inferred relationship between a data point in the time series with the previous six data points is established. In other words, if one knows six consecutive data points in a time series, they can use the model to predict the seventh unseen data point. Since a new data point has been forecasted, it can be used along with the five preceding data points to predict one more data point and so on. That’s time series forecasting one data point at a time!

How to Implement

Implementing a time series forecasting process using supervised learning is similar to a classification or regression process. The distinguishing step in time series forecasting is the conversion of a time series dataset to a cross-sectional dataset and stacking the forecast one data point at a time. The RapidMiner process is shown in Fig. 12.29. It uses operators from the Time Series extension. Although the process looks complicated, it consists of three functional blocks: (1) conversion to cross-sectional data, (2) training an machine learning model, and (3) forecasting one data point at a time in a loop. The dataset used in the process is the Product profit5 dataset (the dataset can be downloaded from www.IntroDataScience.com) shown in Fig. 12.28. The time series has two attributes: Date and Input Yt.

Which type of data cross sectional vs time series is more important to research?

Figure 12.29. Process for time series forecasting using machine learning.

Step 1: Set Up Windowing

The process window in Fig. 12.29 shows the necessary operators for windowing. The time series dataset has a date column, and this must be treated with special care. The operator must be informed that one of the columns in the dataset is a date and should be considered as an “id.” This is accomplished with the Set Role operator. If the input data has multiple time series, Select Attributes operator can be used to select the one to be forecasted. In this case, only a one value series is used and strictly speaking this operator is not needed. However, to make the process generic it has been included and the column labeled “inputYt” has been selected. Optionally, one may want to use the Filter Examples operator to remove any data points that may have missing values. The central operator for this step is the Windowing operator in the Time series extension. The main parameters for the Windowing operator are:

1.

Window size: Determines how many “attributes” are created for the cross-sectional data. Each row of the original time series within the window size will become a new attribute. In this example, w=6 was chosen.

2.

Step size: Determines how to advance the window. s=1 was used.

3.

Horizon width: Determines how far out to make the forecast. If the window size is 6 and the horizon is 1, then the seventh row of the original time series becomes the first sample for the “label” variable. h=1 was used.

Fig. 12.28 shows the original data and the transformed output from the windowing process. The window operator adds six new attributes named input Yt−5 through input Yt−0.

Step 2: Train the Model

When training any supervised model using this data, the attributes labeled input Yt−5 through input Yt−0 form the independent variables. In this case, linear regression is used to fit the dependent variable called label, using the independent variables input Yt−5 through input Yt−0. The Vector Linear Regression operator is being used to infer the relationship between six dependent variables and the dependent variable. The model output for the dataset is:

label=0.493×inputYt−5+0.258×inputYt−4+0.107×inputYt−3−0.098×inputYt−2−0.073×inputYt−1+0.329×inputYt−0+0.135

Step 3: Generate the Forecast in a Loop

Once the model fitting is done, the next step is to start the forecasting process. Note that given this configuration of the window size and horizon, one can now only make the forecast for the next step. In the example, the last row of the transformed dataset corresponds to December 2011. The independent variables are values from June–November 2011 and the target or label variable is December 2011. The regression equation is be used to predict December 2011 value. The same regression equation is also used for predicting January 2012 value. All one needs to do is insert the values from July–December into the regression equation to generate the January 2012 forecast. Next, a new row of data needs to be generated that would run from August–January to predict February 2012 using the regression equation. All the (actual) data from August–December is available as well as the predicted value for January. Once the predicted February value is obtained, there is nothing preventing the actual data from September–December plus the predicted January and February values from being used to forecast March 2012.

To implement this in RapidMiner process, one would need to break this up into two separate parts. First, take the last forecasted row (in this case, December 2011), drop the current value of input Yt−5 (current value is 1.201), rename input Yt−4 to input Yt−5, rename input Yt−3 to input Yt−4, rename input Yt−2 to input Yt−3, rename input Yt−1 to input Yt−2, rename input Yt−0 to input Yt−1, and finally rename predicted label (current value is 1.934) to input Yt−0. With this new row of data, the regression model can be applied to predict the next date in the series: January 2012. Fig. 12.30 shows the sample steps. Next, this entire process need to be put inside a Loop operator that will allow these steps to be repeatedly run for as many future periods as needed.

Which type of data cross sectional vs time series is more important to research?

Figure 12.30. Forecasting one step ahead.

The Loop operator will contain all the mechanisms for accomplishing the renaming and, of course, to perform the forecasting (Fig. 12.31). Set the iterations in the Loop operator to the number of future months to forecast (horizon). In this case, this is defined by a process variable called futureMonths whose value can be changed by the user before process execution. It is also possible to capture the Loop counts in a macro if the set iteration macro box is checked. A macro in RapidMiner is nothing but a process variable that can be called by other operators in the process. When set iteration macro is checked and a name is provided in the macro name box, a variable will be created with that name whose value will be updated each time, one loop is completed. An initial value for this macro is set by the macro start value option. Loops may be terminated by specifying a timeout, which is enabled by checking the limit time box. A macro variable can be used by any other operator by using the format %{macro name} in place of a numeric value.

Which type of data cross sectional vs time series is more important to research?

Figure 12.31. Looping subroutine to forecast one data point at a time.

Before the looping is started, the last forecasted row needs to be stored in a separate data structure. This is accomplished by a new Windowing operator and the macro titled Extract Example Set. The Filter Example operator simply deletes all rows of the transformed dataset except the last forecasted row. Finally the Remember operator stores this in memory and allows one to “recall” the stored value once inside the loop.

The loop parameter iterations will determine the number of times the inner process is repeated. Fig. 12.31 shows that during each iteration, the model is applied on the last forecasted row, and bookkeeping operations are performed to prepare application of the model to forecast the next month. This includes incrementing the month (date) by one, changing the role of the predicted label to that of a regular attribute, and finally renaming all the attributes. The newly renamed data are stored and then recalled before the next iteration begins.

The output of this process is shown in Fig. 12.32 as an overlay on top of the actual data. As seen, the simple linear regression model seems to adequately capture both the trend and seasonality of the underlying data. The Linear Regression operator of Step 2: Train the model can be quickly swapped to a Support Vector Machine operator and its performance tested without having to do any other programming or process modifications.

Which type of data cross sectional vs time series is more important to research?

Figure 12.32. Forecasted time series.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128147610000125

Dynamic Migration Modeling

K. Bruce Newbold, in Encyclopedia of Social Measurement, 2005

Longitudinal (Panel) Analysis

It is arguable as to whether or not cross-sectional data, which are typically used in cohort projection and other migration models, adequately capture the dynamic properties of migration. Instead, what is needed is a sequence of cross sections or panel data, involving successive waves of individual or household interviews that give histories of the sequence of residences over time so that the duration of residence before moving can be derived. Longitudinal analyses allow the interrelationships between specific life-course events, representing some identifiable and quantitative change, such as leaving school, marriage, or divorce, to be statistically related to migration. Therefore, the event occurs at a discernable point in time, with the event history represented by a longitudinal record of when events occur to a sample of individuals or households. Longitudinal data files, such as the PSID, represent a particularly rich source of migration information. The added advantage of longitudinal files is the enriched ability to examine the dynamic properties of migration, with the files providing histories of migration and linkages to other events correlated with migration. By using the sequence of residences and the duration of residence prior to the move, the determinants of migration can be empirically derived and tested. Ideally, the times associated with migration are accurate, so that a discrete time approximation to the continuous process is not needed.

Migration is therefore viewed as a transition from one region (residence in i) to region j, which terminates the episode. Key to the analysis is the length of the episode, with the resulting methodology reflecting either “continuous” or “discrete” events. Although both may be based on longitudinal data, differences in model definition arise over whether the timing of an event (migration) is measured exactly (continuous) or whether it occurred over some interval t (discrete). In both cases, an important concept is the hazard function. In the case of discrete data, the hazard rate is defined as the probability of migrating from region i at time t, such that

(13)hit=Pr[Ti=t|Ti≥t,Xit].

In this formulation, the hazard rate is the probability that individual i will migrate at some moment in time given that they have not migrated before that time, t is the time of migration, and Xit is a set of exogenous variables that vary over time, such as wage or employment rate differentials. It should be noted that the hazard rate is similar to the survival rate, which would represent the probability of a person with characteristics Xit not migrating by time t. Although the hazard is unobserved, it controls both the occurrence and timing of migration and is therefore the dependent variable within any model. It should be noted that the hazard rate is similar to the survival rate, which would represent the probability of a person with a defined set of characteristics not migrating by time t.

Ideally, the timing of migration is accurate, so that a discrete time approximation to a continuous process is not needed. In this case, the modeler is interested in the instantaneous probability that migration occurs in the interval from t to t + s, where s is infinitesimally small, such that the hazard rate is defined as

(14)h(t)=lims→0P(t,t+s)/s.

Although potentially providing important insights into the dynamic nature of migration, the construction of continuous history data is time-consuming and costly, with such data sets typically “one-of-a-kind” and created for a particular need. Indeed, dynamic data on migration are typically more available for discrete time intervals than for continuous time histories as employed in event history analyses.

Although there is no single methodology for the analysis of either discrete or continuous events, most approaches have focused on regression (or similar) models, allowing the event of interested (the dependent variable) to be expressed as a function of independent explanatory variables. Methods typically represent straightforward generalizations of static models applied to cross-sectional data, such as regression techniques or the random utility model discussed earlier.

Regardless of whether discrete-time or continuous models are used, the principal problem of all longitudinal studies is the need to separate “state dependence” from “unobserved heterogeneity.” The former includes how migration behavior is influenced by previous migration experience (i.e., return migration and the importance of a home region), whereas the latter reflects the fact that some people migrate due to unobserved characteristics and will therefore move more than others (i.e., “chronic” migrants). Both effects produce similar behavioral patterns but have different implications with respect to methodology and the derived conclusions. Care should also be taken in the analysis of panel data, since different survey methods and designs, such as the timing and duration of waves, will alter derived conclusions.

In addition to suffering from the problems of small sample size and spatial representation, as already discussed, longitudinal surveys suffer from two additional problems. First, attrition of sample members through death or for other reasons will reduce sample size over time as well as reduce model power. Second, longitudinal data suffer from problems associated with the “initial condition.” That is, panel data must start and stop at some point in time, therefore interrupting the process of interest (migration) at some intermediate point rather than its true start or end. It is therefore reasonable to assume that panel data may miss important information pertaining to migration or residential history that occurred before that start of the sampling frame. Statistically, this results in “censored” information, with the application of ordinary least-squares techniques to censored data leading to potentially biased estimates. Various econometric techniques, including maximum-likelihood estimation and techniques such as TOBIT models, have been derived in order to overcome problems associated with censoring.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985003637

Reliability

Duane F. Alwin, in Encyclopedia of Social Measurement, 2005

Reliability Models for Composite Scores

The most common approach to assessing reliability in cross-sectional data is through the use of multiple indicators of a given concept and the estimation of the reliability of a linear composite score made up of those measures. Let Y symbolize such a linear composite defined as the sum Y1 + Y2 + … + Yg + … + YG; that is, ∑gYg, where g is an index that runs from 1 to G. Such estimates of reliability are referred to as internal consistency estimates of reliability (ICR). In this case, we can formulate a reliability model for the composite as Y = T + E, where T is a composite of true scores for the G measures, and E is a composite of error scores. This assumes that for each measure the random error model holds, that is, Yg = Tg + Eg, and thus T=∑gTgand E=∑gEg. The goal of the internal consistency approach is to obtain an estimate of VAR(T)/VAR(Y) = [VAR(Y) − VAR(E)]/VAR(Y). This can be defined as a straightforward extension of the common factor model of CTST given previously. The following identities result from the previous development:

VAR(Y)=∑j∑i∑YYVAR(T)=∑j∑i[ΛΦΛ′]VAR(E)=∑j∑iΘ2.

(Note that i and j represent indexes that run over the rows and columns of these matrices, where i = 1 to G and j = 1 to G.) In other words, the common factor representation of the CTST model given previously for the population basically partitions the composite observed score variance into true score and error variance. These quantities can be manipulated to form an internal consistency measure of composite reliability as follows:

ICR=∑j∑i∑YY−∑j∑iΘ2∑j∑i∑YY.

The most common estimate of internal consistency reliability is Cronbach's α, computed as follows:

α=GG−1[1−∑gVAR(Yg)VAR(Y)].

This formula is derived from the assumption of G unit-weighted (or equally weighted) tau-equivalent measures. The logic of the formula can be seen as follows. First, rewrite ∑j∑iΘ2in the previous expression for ICR as equal to ∑j∑i∑Y−∑j∑i∑T, where ΣY is a diagonal matrix formed from the diagonal elements of ΣYY, and ΣT is a diagonal matrix formed from the diagonal of ΛΦΛ′. Note further that under tau-equivalence Λ = 1 (a vector of 1s), so this reduces to ϕI, where ϕ is the variance of Tg, and I is a (G × G) identity matrix. Note that in the population model for tau-equivalent measures, all the elements in ΣYY are identical and equal to ϕ, the variance of the true score of Tg. From these definitions, we can rewrite ICR as follows:

ICR=∑j∑i∑YY−∑j∑i∑Y+∑j∑iφI∑j∑i∑YY.

Note further that ∑j∑i∑YY−∑j∑i∑Y=G(G−1)ϕ, and ∑j∑iφI=Gφ, and thus Cronbach's α can be derived from the following identities:

ICR=[G(G−1)φ+Gφ]/∑j∑i∑YY=[G/G−1][G(G−1)]φ/∑j∑i∑YY=[G/G−1][∑j∑i∑YY−∑j∑i∑Y]/∑j∑i∑YY=[G/G−1][1−[∑j∑i∑Y/∑j∑i∑YY]].

The final identity is equivalent to the formula for Cronbach's α given previously. The point of this derivation is that the ICR approach actually has a more general formulation (the congeneric measures model) for which Cronbach's α is but a special case (i.e., ICR = α when the G measures are tau equivalent).

These methods can be generalized to the case of weighted composites, where Yw is the composite formed from the application of a set of weights to the G variables in Y. However, we will not consider this case here, except to note that when the vector of weights, w, is chosen to be proportional to Θ−2Λ, such a set of weights will be optimal for maximizing ICR.

There have been other variations to formulating ICR. Heise and Bohrnstedt, for example, defined an ICR coefficient, named Ω, based on the use of U2 in place of Θ2 in the previous formulation for ICR, where U2 is a diagonal matrix of unique variances from an orthogonal common factor analysis of a set of G variables without the CTST assumptions of univocity (e.g., K > 1). They proposed partitioning Ω into its contributions from the common factors of the model, arbitrarily labeling the first factor common variance as “valid” variance and successive factor common variance as “invalid” variance.

Although it is a very popular approach, ICR coefficients have several major shortcomings. First, ICR is an unbiased estimate of composite reliability only when the true score model assumptions hold. To the extent the model assumptions are violated, it is generally believed that ICR approaches provide a lower bound estimate of reliability. However, at the same time, there is every possibility that ICR is inflated due to correlated errors (e.g., common method variance among the items), and that some reliable variance is really invalid in the sense that it represents something about responses other than true score variation, such as nonrandom sources of measurement error. ICR therefore captures systematic sources of measurement error in addition to true score variation and in this sense cannot be unambiguously interpreted as a measure of data quality.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985003832

Data Structures

Andrew F. Siegel, Michael R. Wagner, in Practical Business Statistics (Eighth Edition), 2022

2.4 Time-Series and Cross-Sectional Data

If the data values are recorded in a meaningful sequence, such as daily stock market prices, then you have time-series data. If the sequence in which the data are recorded is irrelevant, such as the first-quarter 2020 earnings of eight aerospace firms, you have cross-sectional data. Cross-sectional is just a fancy way of saying that no time sequence is involved; you simply have a cross-section, or snapshot, of how things are at one particular time.

Analysis of time-series data is generally more complex than cross-sectional data analysis because the ordering of the observations must be carefully taken into account. For this reason, in coming chapters we will initially concentrate on cross-sectional data. Time-series analysis will be covered in Chapter 14.

Example

The Stock Market

Fig. 2.4.1 shows a chart of the Dow Jones Industrial Average (DJIA) stock market index monthly closing value starting in October 1928.5 This time-series data set indicates how the value of a portfolio of stocks has changed through time. Note how the stock market value has risen impressively through much of its history, although not entirely smoothly. Note the occasional downward bumps (such as the crash of October 1987, the “dot-com bust” of 2000, and the financial difficulties during the recession of 2007–2009) that represent the risk that you take by holding a portfolio of stocks that often (but not always) increases in value.

Which type of data cross sectional vs time series is more important to research?

Fig. 2.4.1. The Dow Jones Industrial Average stock market index, monthly since 1928, is a time-series data set that provides an overview of the history of the stock market.

Here are some additional examples of time-series data:

1.

The price of wheat each year for the past 50 years, adjusted for inflation. These time trends might be useful for long-range planning to the extent that the variation in future events follows the patterns of the past.

2.

Retail sales recorded monthly for the past 20 years. This data set has a structure showing generally increasing activity over time, as well as a distinct seasonal pattern, with peaks around the December holiday season.

3.

The thickness of paper as it emerges from a rolling machine measured once each minute. This kind of data might be important to quality control. The time sequence is important because small variations in thickness may either “drift” steadily toward an unacceptable level or “oscillate,” becoming wider and narrower within fairly stable limits.

The following are some examples of cross-sectional data:

1.

The number of hours of sleep last night measured for 30 people being examined to test the effectiveness of a new over-the-counter medication.

2.

Today’s book values of a random sample of a bank’s savings certificates.

3.

The number of orders placed online today as referred by each of your associated marketing channels, as part of a study of the costs and effectiveness of these channels.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128200254000026

Cross Section Data

Jean W. Gallagher, in Advances In Atomic, Molecular, and Optical Physics, 1994

V Journals and Periodical Publications

Various journals and periodic publications concentrate on compilations and review articles and frequently contain useful collections of cross sectional data in a traditional format. The scientist with a continuing interest in these data may want to scan these on a regular basis. A few of those that focus heavily on atomic and molecular cross sections are the following:

Advances in Atomic and Molecular Physics. This annual series provides authoritative reviews on all aspects of atomic and molecular physics. These do not necessarily incorporate comprehensive data compilations. Each volume prints the contents of all previous volumes.

Atomic Data and Nuclear Data Tables. This bimonthly journal contains extensive data compilations including cross sections, rates, etc., with an cumulated subject index published annually. Examples of recent articles on the subject of electron collision cross sections are Itikawa et al., 1984; Itikawa et al., 1991; Pradhan and Gallagher, 1992.

Journal of Physical and Chemical Reference Data. This bimonthly journal publishes, among a wide range of other subjects, evaluated data sets for collision cross sections and electron swarm parameters. Examples are Itikawa et al., 1989; Tawara et al., 1990; Phelps, 1991; 1992. Subject and author indices are provided annually.

Reviews of Modern Physics (RMP). This U.S. physics journal occasionally includes articles on collision cross sections. Examples are Rudd et al., 1985, 1992; Heddle and Gallagher, 1989. In addition, every issue of RMP provides a valuable list entitled “Some Review Articles Appearing in Other Journals and Serial Publications.”

The list of review journals given here is not comprehensive. Many other journals are dedicated to review of the scientific literature, although the concentration of cross-section information is not as high as in those mentioned. With few exceptions, other review journals may be found in the RMP listings.

Last, but not least, a continuing valuable resource has been the listings of data collections, bibliographies, review articles, and books compiled by the Atlanta atomic physics group under the leadership of Earl McDaniel of the Georgia Institute of Technology. This first extensive bibliography was McDaniel, et al., 1985, and is updated in this volume (McDaniel and Mansky chapter). These listings will contain all articles falling within the definition of the title.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S1049250X08600415

Longitudinal transition models for categorical response data

Xian Liu, in Methods and Applications of Longitudinal Data Analysis, 2016

12.2 Longitudinal transition models with only fixed effects

In the two-time health transition model, statistical inference and the resulting estimating procedures essentially rely on a cross-sectional data structure with the baseline health status used either as a covariate or as defining a subsample of the analysis. Given only one data point for each individual, strictly speaking the two-time multinomial logit transition model is not in the domain of longitudinal data analysis. When more than two time points are specified, the data structure becomes more complex because a subject has at least two data points (in the data matrix, each subject has more than one row). The resulting dependence in this data structure thereby calls for the development of more advanced techniques to account for intraindividual correlation.

With a sequence of observed time points for subject i, the conditional distribution of the multinomial response at the jth time point, denoted by Yij where j = 1, …, ni, can be viewed as a function of the prior response or responses and covariates Xij. The simplest longitudinal transition model for data with more than two time points follows the basic Markov chain hypothesis that longitudinal transitions between different values in the state space depend only on the value of the previous state. Correspondingly, the transition probability from the state at time point j − 1 to the state at time point j can be written as a Markov process, given by

(12.6)π~ijk=Pr(Yi1=kYij−1=i~), for   i~=0, 1; k=1,...,K+1,

where prior state Yi(j−1)and current state Yijare subject to different state spaces because the prior state space does not include an absorbing state but Yijdoes. With the specification of the Markov random variable, the only information about the past for predicting the present is the previous state. This basic Markov hypothesis implies that knowledge of the state values at times earlier than j − 1 do not change the transition probability between j − 1 and j, thereby being overlooked. If such a Markov process is correctly assumed, it is reasonable to specify a separate multinomial logit model for each prior state value on K outcome values.

Let Yi(j−1)=0, 1and Yij=1,...,K+1. Two separate multinomial logit models, with covariate vector Xij, can then be specified for Yi(j−1)=0and Yi(j−1)=1, respectively, written as

(12.7a)log it Pr(Yij=kYij−1=0,Xij)=Xij′β0k,

(12.7b)log it Pr(Yij=kYij−1=1,Xij)=Xij′β1k,

where, given subscript 0 or 1, β0kand β1kmay differ to allow for variations in the effects of Xijbetween the two prior states.

As indicated in the description of the two-time transition model, the application of separate transition models can yield statistically inefficient results on parameter estimates and the corresponding standard errors. If serious problems arise, an integrated multinomial logit transition model can be specified by using the prior state as a covariate, given by

(12.8)log it Pr(Yij=kYi(j−1)=i~,Xij)=Yi(j−1)β1k+Xij′β0k,

where β1k, in the context of a longitudinal transition model, is the regression coefficient of the prior state at time j − 1. With Yi(j−1)taking value 0 or 1, β1k=β0k+β1k. Some interaction terms may be specified in β0kto account for differences in the effects of certain covariates between the two prior status groups. As only the immediately previous state is considered in predicting the logit on the current state, the above two types of transition models, separate or unified, are referred to as the first-order Markov chain models (Diggle et al., 2002). This first-order Markov chain approach is somewhat popular in the analysis of health transitions and life expectancies (e.g., Lièvre et al., 2003). Ignoring intraindividual correlation in analyzing health transitions, conditionally on prior state, can result in substantial bias in nonlinear predictions of the transition probabilities.

Some researchers extend the above-mentioned first-order Markov approach by specifying a full set of the past responses, denoted by Hij, to create a transition model (Diggle et al., 2002). Mathematically, Hijcan be defined as the σ-algebra of the prior history of transitions, given by Hij=σYi(j−1),...,Yi(j−q~), where q~is the number of prior observations. With the specification of Hij, the multinomial logit of the Markov chain (YijHij,Xij)=k(k=1,...,K)for subject i at time point j can be written as

(12.9)log it Pr(Yij=kHij,Xij)=∑r~=1q~Yi(j−r~)βr~k+Xij′βq~k,

where βr~kis the regression coefficient of the state value at time point (j−r~). The vector of regression coefficients for Xijgiven Hij, denoted by βq~k, indicates that the value and the interpretation of the regression coefficients change with the Markov order q~. Theoretically, when the above Markov model is correctly assumed, the transition events are conditionally uncorrelated, and consequently, the classical multinomial logit model with only fixed effects can be applied to estimate the regression coefficients and the corresponding standard errors (Diggle et al., 2002). When too many time points are considered, the value of q~is high, thereby making the estimating process tedious and cumbersome. The regression becomes even denser when the order of prior states impacts the effects of the covariates on the response at the current time point. Specification of a large number of interaction terms will further complicate estimation of the parameters, thereby affecting the precision of the estimates. Furthermore, the precision in the parameter estimates depends on the Markov order in Hij; that is, for earlier responses, the information of the past responses is limited to fewer previous time occasions, and only for the last observed time point, the specified set of the past responses is complete. As a result, data at early times tend to be more correlated than the measurements of the response at later points.

As previously indicated, with dependence among repeated measurements of the response for the same subject, the specification of the between-subjects random effects is statistically efficient and effective to account for intraindividual correlation inherent in longitudinal data. Correspondingly, a heterogeneous transition pattern can be assumed to address the association between the history of transition events and the current state, conditionally on the specified fixed and random parameters. The mixed-effects multinomial logit model described in Chapter 11 can be extended to the perspective of multidimensional transitions from prior state to a set of competing destination states. With the inclusion of the prior state as a covariate and the specification of the subject-specific random effects, the transition probabilities can be adequately predicted.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128013427000125

Time Series Analysis in Political Science

Harold D. Clarke, Jim Granato, in Encyclopedia of Social Measurement, 2005

Traditional Time Series Analysis

Research using time series data in political science typically has utilized many of the same regression techniques as are employed to analyze cross-sectional data. The vast majority of these traditional time series analyses have considered single-equation models such as the following:

(1)Yt=β0+Σβ1−kX1−k,t−i+ɛt,

where Yt is the dependent variable at time t, Xt − i are 1 to k independent variables at time t − i, β0 is constant, β1 − k are the parameters associated with variables X1 − k, and εt is the stochastic error term ∼N(0, σ2).

For a model such as Eq. (1), the possible (non)stationarity of the variables is ignored, and ordinary least squares (OLS) is employed to estimate the values of the parameters β0, β1 − k. The effects of the X's may be specified to occur simultaneously (i.e., at time t or with a lag i). Also, as in analyses of cross-sectional data, inferences regarding the statistical significance of the β's are made by calculating t ratios (i.e., β/s.e.). When doing diagnostic tests on such regression models, particular attention is given to the possibility that the stochastic errors (ε's) are correlated [i.e., cov(εt, ε t− i) ≠ 0]. Correlated errors do not bias parameter estimates but affect standard errors and, therefore, pose a threat to inference by affecting the size of the t ratios. The standard test for correlated errors has been the Durbin–Watson test, which tests only for first-order autocorrelation in the residuals of the estimated regression Eq. (1). If the null hypothesis that the residuals do not suffer from first-order autocorrelation is rejected by this test, the conventional approach is to conclude that the errors are generated by the following process:

(2)εt=ρεt−1+υt,

where ρ captures the relationship between temporally adjacent errors, and vt is a “well-behaved” (uncorrelated) error proces ∼N(0, σ2). This (assumed) relationship between the errors is treated as a “nuisance” to be “corrected.” The alternative possibility, that the correlation among the residuals represents the result of model misspecification, is not considered.

The correction employed is a form of generalized least squares (GLS) that involves multiplying both sides of the model of interest by the “quasi-differencing” operator (1−ρL), where L is a backshift operator such that Lkyt = yt − k. This model is then subtracted from the original one. For example, for a model with a single right-hand-side variable, the result is

(3)Yt−ρYt−1=β0−ρβ0+β1Xt−ρβ1Xt−1+εt−ρεt−1.

The error process for the transformed model is εt − ρεt − 1 = vt, which, by assumption, is uncorrelated. Since ρ is unknown, it must be estimated from the data. Various techniques may be used for this purpose, and the resulting procedures are known as feasible GLS.

Political scientists adopting this approach to addressing the threat to inference have often failed to recognize that they have, in effect, respecified their original model in autoregressive distributed lag form and imposed a common-factor restriction (1 − ρL). This may be seen by rewriting Eq. (3) as

(4)(1−ρL)Yt=(1−ρL)β0+(1−ρL)Xt+(1−ρL)εt.

As Hendry emphasizes, the warrant for this restriction should be determined empirically, rather than simply assumed. The vast majority of time series analyses in political science have not done so. By failing to recognize that autocorrelated residuals do not necessarily imply autocorrelated errors, such analyses risk model misspecification.

Although many political scientists continue to use GLS procedures, it is increasingly common to attempt to capture the dynamics in a time series by specifying an autoregressive, distributed lag model that includes a lagged endogenous variable Yt − 1:

(5)Yt=β0+γYt−1+Σβ1−kX1−k,t−i+εt.

A model such as Eq. (5) may be specified initially on theoretical grounds, or after the analyst finds evidence of first-order autocorrelation in Eq. (1), a common practice is to use Eq. (5). In any event, the presence of the lagged endogenous variable Yt − 1 means that the analyst is hypothesizing, either explicitly or implicitly, that the effects of all of the X variables are distributed through time and that all of these effects decline at exactly the same rate. That rate is γ, the coefficient on Yt − 1. For example, the impact of β1X1t in Eq. (5) is β1 at time t, β1γ at time t + 1, β1γ2 at time t + 2, etc., and the long-term (asymptotic) impact of X1 is β1/(1 − γ). Clearly, the assumption that the effects of all X's evolve in exactly the same way is very strong. ARIMA intervention and transfer function models considered later relax this assumption.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985003212

Geographic Information Systems

Michael F. Goodchild, in Encyclopedia of Social Measurement, 2005

Issues

As will be obvious from the previous section, the use of GIS raises numerous issues concerning the nature of geographic information and inference from cross-sectional data. It is generally accepted that cross-sectional data cannot be used to confirm hypotheses about process, but they can certainly be used to reject certain false hypotheses and to explore data in the interests of hypothesis generation. Although GIS has evolved from the static view inherent in paper maps, there is much interest in adding dynamics and in developing methods of spatiotemporal analysis.

Uncertainty is a pervasive issue in GIS. It is impossible to measure location on the Earth's surface exactly and other forms of uncertainty are common also. For example, summary statistics for reporting zones are means or totals and clearly cannot be assumed to apply uniformly within zones, despite efforts to ensure that census tracts are approximately homogenous in socioeconomic characteristics. Results of analysis of aggregated data are dependent on the boundaries used to aggregate (the modifiable areal unit problem) and inferences from aggregated data regarding individuals are subject to the ecological fallacy.

Nevertheless, the outcomes of the widespread adoption of GIS in the social sciences since the 1980s are impressive. It is clear that GIS has brought new power to the analysis of cross-sectional data and the integration of diverse data sets. It has also shifted the ground of social science to some degree, by increasing the emphasis on local data, geographic variation, and highly disaggregated analysis, in contrast to the pervasive nomethetic approach of earlier decades.

Which type of data cross

Cross sectional data means that we have data from many units, at one point in time. Time series data means that we have data from one unit, over many points in time. Panel data (or time series cross section) means that we have data from many units, over many points in time.

What is cross

Uses of Cross-Sectional Data Cross-sectional datasets are used extensively in economics and other social sciences. Applied microeconomics uses cross-sectional datasets to analyze labor markets, public finance, industrial organization theory, and health economics.

Can data be both time series and cross

Some types of data can have both time-series and cross-sectional aspects. Panel data and longitudinal data have both time-series and cross-sectional aspect. The panel data consist of observations through time on a single characteristic of multiple observational units.

Which of the following scales represents the strongest level of measurement?

The four major forms of measurement have the following hierarchy, with the ratio scale being the highest or strongest level of measurement and nominal the lowest or weakest type of measurement.