Theoretically reliability is

When creating a test one generally uses a subset of items to represent a larger construct

Chapter 5 covered topics that rely on statistical analyses of data from educational and psychological measurements. These analyses are used to examine the relationships among scores on one or more test forms, in reliability, and scores based on ratings from two or more judges, in interrater reliability. Aside from coefficient alpha, all of the statistical analyses introduced so far focus on composite scores. Item analysis focuses instead on statistical analysis of the items themselves that make up these composites.

As discussed in Chapter 4, test items make up the most basic building blocks of an assessment instrument. Item analysis lets us investigate the quality of these individual building blocks, including in terms of how well they contribute to the whole and improve the validity of our measurement.

This chapter extends concepts from Chapters 2 and 5 to analysis of item performance within a CTT framework. The chapter begins with an overview of item analysis, including some general guidelines for preparing for an item analysis, entering data, and assigning score values to individual items. Some commonly used item statistics are then introduced and demonstrated. Finally, two additional item-level analyses are discussed, differential item functioning analysis and option analysis.

Learning objectives

Explain how item bias and measurement error negatively impact the quality of an item, and how item analysis, in general, can be used to address these issues.
Describe general guidelines for collecting pilot data for item analysis, including how following these guidelines can improve item analysis results.
Identify items that may have been keyed or scored incorrectly.
Recode variables to reverse their scoring or keyed direction.
Use the appropriate terms to describe the process of item analysis with cognitive vs noncognitive constructs.
Calculate and interpret item difficulties and compare items in terms of difficulty.
Calculate and interpret item discrimination indices, and describe what they represent and how they are used in item analysis.
Describe the relationship between item difficulty and item discrimination and identify the practical implications of this relationship.
Calculate and interpret alpha-if-item-deleted.
Utilize item analysis to distinguish between items that function well in a set and items that do not.
Remove items from an item set to achieve a target level of reliability.
Evaluate selected-response options using option analysis.

In this chapter, we’ll run item and option analyses on PISA09 data using epmr, with results plotted, as usual, using ggplot2.

# R setup for this chapter
# Required packages are assumed to be installed,
# see chapter 1
library["epmr"]
library["ggplot2"]
# Functions we'll use in this chapter
# str[] for checking the structure of an object
# recode[] for recoding variables
# colMeans[] for getting means by column
# istudy[] from epmr for running an item analysis
# ostudy[] from epmr for running an option analysis
# subset[] for subsetting data
# na.omit[] for removing cases with missing data

Preparing for item analysis

Item quality

As noted above, item analysis lets us examine the quality of individual test items. Information about individual item quality can help us determine whether or not an item is measuring the content and construct that it was written to measure, and whether or not it is doing so at the appropriate ability level. Because we are discussing item analysis here in the context of CTT, we’ll assume that there is a single construct of interest, perhaps being assessed across multiple related content areas, and that individual items can contribute or detract from our measurement of that construct by limiting or introducing construct irrelevant variance in the form of bias and random measurement error.

Bias represents a systematic error with an influence on item performance that can be attributed to an interaction between examinees and some feature of the test. Bias in a test item leads examinees having a known background characteristic, aside from their ability, to perform better or worse on an item simply because of this background characteristic. For example, bias sometimes results from the use of scenarios or examples in an item that are more familiar to certain gender or ethnic groups. Differential familiarity with item content can make an item more relevant, engaging, and more easily understood, and can then lead to differential performance, even for examinees of the same ability level. We identify such item bias primarily by using measures of item difficulty and differential item functioning [DIF], discussed below and again in Chapter 7.

Bias in a test item indicates that the item is measuring some other construct besides the construct of interest, where systematic differences on the other construct are interpreted as meaningful differences on the construct of interest. The result is a negative impact on the validity of test scores and corresponding inferences and interpretations. Random measurement error on the other hand is not attributed to a specific identifiable source, such as a second construct. Instead, measurement error is inconsistency of measurement at the item level. An item that introduces measurement error detracts from the overall internal consistency of the measure, and this is detected in CTT, in part, using item analysis statistics.

Piloting

The goal in developing an instrument or scale is to identify bias and inconsistent measurement at the item level prior to administering a final version of our instrument. As we talk about item analysis, remember that the analysis itself is typically carried out in practice using pilot data. Pilot data are gathered prior to or while developing an instrument or scale. These data require at least a preliminary version of the educational or psychological measure. We’ve written some items for our measure, and we want to see how well they work.

Nunnally and Bernstein [1994] and others recommend that the initial pilot “pool” of candidate test items should be 1.5 to 2 times as large as the final number of items needed. So, if you’re envisioning a test with 100 items on it, you should aim to pilot 150 to 200 items. This may not be feasible, but it is a best-case scenario, and should at least be followed in large-scale testing. By collecting data on up to twice as many items as we intend to actually use, we’re acknowledging that, despite our best efforts, many of our preliminary test items may either be low quality, for example, biased or internally inconsistent, and they may address different ability levels or content than intended.

An adequate sample size of test takers is essential if we hope to obtain item analysis results that generalize to the population of test takers. Nunnally and Bernstein [1994] recommend that data be collected on at least 300 individuals from the population of interest, or 5 times as many individuals as test items, whichever is larger. A more practical goal for smaller scale testing applications, such as with classroom assessments, is 100 to 200 test takers. With smaller or non-representative samples, our item analysis results must be interpreted with caution. As with inferences made based on other types of statistics, small samples more often lead to erroneous results. Keep in mind that every statistic discussed here has a standard error and confidence interval associated with it, whether it is directly examined or not. Note also that bias and measurement error arise in addition to this standard error or sampling error, and we cannot identify bias in our test questions without representative data from our intended population. Thus, adequate sampling in the pilot study phase is critical.

The item analysis statistics discussed here are based on the CTT model of test performance. In Chapter 7 we’ll discuss the more complex item response theory [IRT] and its applications in item analysis.

Data entry

After piloting a set of items, raw item responses are organized into a data frame with test takers in rows and items in columns. The str[] function is used here to summarize the structure of the unscored items on the PISA09 reading test. Each unscored item is coded in R as a factor with four to eight factor levels. Each factor level represents different information about a student’s response.

# Recreate the item name index and use it to check the 
# structure of the unscored reading items
# The strict.width argument is optional, making sure the
# results fit in the console window
ritems


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			Self-disclosure, listening, and feedback are skills essential to good in relationships.

		
	

		
		
		   
		   
		   
		
		
			The study of organisms and how they interact with each other and within their environment

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn q-q plot exponential distribution python - q-q âm mưu phân phối theo cấp số nhân python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn hexdec php large numbers - hexdec php số lớn

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn why is html better than pdf? - tại sao html tốt hơn pdf?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn is html coding or programming? - là mã hóa hay lập trình html?

		
	

		
		
		   
		   
		   
		
		
			What are the 5 areas of command?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn python array of odd numbers - mảng trăn số lẻ

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn how do you define onclick in css? - làm thế nào để bạn xác định onclick trong css?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn can a string be a literal in python? - một chuỗi có thể là một ký tự trong python?

		
	

		
		
		   
		   
		   
		
		
			Which of the following is a risk of outsourcing?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn how do i check if data is an integer in python? - làm cách nào để kiểm tra xem dữ liệu có phải là số nguyên trong python không?

		
	

		
		
		   
		   
		   
		
		
			What are the ways in which consumer and business products and services can be classified?

		
	

		
		
		   
		   
		   
		
		
			Can I use niacinamide while using vitamin C?

		
	

		
		
		   
		   
		   
		
		
			Who established the academic discipline of sociology?

		
	

		
		
		   
		   
		   
		
		
			Is an example of the computer hardware platforms?

		
	

		
		
		   
		   
		   
		
		
			What is the reader that uses radio waves to read and capture information stored on a tag?

		
	

		
		
		   
		   
		   
		
		
			Which of the following is true of client-centered therapists?

		
	

		
		
		   
		   
		   
		
		
			What is a make or buy decision quizlet?

		
	

		
		
		   
		   
		   
		
		
			What gland stimulates the secretion of other endocrine glands?

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 4 uống nước chanh sả mật ong có tác dụng gì 2023
			7 tháng trước
		
	



	
	 
		#2
		
			Top 10 bài tập làm văn số 5 lớp 7 de 4 2023
			7 tháng trước
		
	



	
	 
		#3
		
			Top 3 vừa chơi đã có tài khoản vương giả chap 1 2023
			7 tháng trước
		
	



	
	 
		#4
		
			Top 6 anh sẽ on thôi cover phạm nguyên ngọc lyrics 2023
			7 tháng trước
		
	



	
	 
		#5
		
			Top 7 tài liệu quản lý nhà nước và quản lý ngành giáo dục đào tạo 2023
			7 tháng trước
		
	



	
	 
		#6
		
			Top 7 hãy ra khỏi người đó đi hợp âm 2023
			7 tháng trước
		
	



	
	 
		#7
		
			Top 6 giáo án thơ về thăm nhà bác 2023
			7 tháng trước
		
	



	
	 
		#8
		
			Top 8 giáo án ngữ văn 6 cánh diều 2023
			7 tháng trước
		
	



	
	 
		#9
		
			Top 9 tinh bột tham gia phản ứng nào 2023
			7 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Khu phố quang minh quảng vinh sầm sơn thanh hóa năm 2024

		
	

		
		
		   
		   
		   
		
		
			Top phim hành động điểm imdb cao nhất năm 2024

		
	

		
		
		   
		   
		   
		
		
			Giải bài tập vật lý 9 máy biến thế năm 2024

		
	

		
		
		   
		   
		   
		
		
			Học ngôn ngữ anh ở trường nào tốt nhất năm 2024

		
	

		
		
		   
		   
		   
		
		
			Danh mục hàng hóa kiểm tra vsattp bộ xây dựng năm 2024

		
	

		
		
		   
		   
		   
		
		
			Lỗi cs 1.3 could not get ipx socket name năm 2024

		
	

		
		
		   
		   
		   
		
		
			Giải đua xe tank việt nam top mấy năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài nghe tiếng anh lớp 5 tập 2 unit 18 năm 2024

		
	

		
		
		   
		   
		   
		
		
			Coông văn sôố 196 ngày 04 9 2023 của tandtc năm 2024

		
	

		
		
		   
		   
		   
		
		
			Vết thương ngày nào có thể liền da năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Mẹo Hay
		  Toplist
		  Là gì
		  Địa Điểm Hay
		  Học Tốt
		  mẹo hay
		  Công Nghệ
		  Nghĩa của từ
		  Khỏe Đẹp
		  Bao nhiêu
		  đánh giá
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  So Sánh
		  Bài tập
		  Tiếng anh
		  So sánh
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  Máy tính
		  javascript
		  Ở đâu
		  Thế nào
		  Hướng dẫn
		  Dịch 
		  Tại sao
		  Đại học
		  Món Ngon
		  Facebook
		  Khoa Học