Chapter 12: Assessing Learners
Chapter 12: Assessing Learners
Chapter Overview
Some of your strongest childhood and adolescent memories probably include taking tests in school. For that matter, test taking probably is among the most vivid memories of your college experience. If you are like most people who have spent many years in school, you have strong or mixed feelings about tests. In this chapter, we try to dispel some of the discomfort you might feel about tests and show how they can be effective tools in your classroom.
This chapter introduced you to some techniques for assessing student learning. Its key terms and main points were:
Norm-Referenced and Criterion-Referenced Tests
A test that determines a student's place or rank among other students is called a norm-referenced test (NRT). This type of test conveys information about how a student performed compared to a large sample of pupils at the same age or grade.
A test that compares a student's performance to a standard of mastery is called a criterion-referenced test (CRT). This type of test conveys information about whether a student needs additional instruction on some skill or set of skills.
The major advantage of an NRT is that it covers many different content areas in a single test; its major disadvantage is that it is too general to be useful in identifying specific strengths and weaknesses tied to individual texts or workbooks.
The major advantage of a CRT is that it can yield highly specific information about individual skills or behaviours. Its major disadvantage is that many such tests would be needed to make decisions about the many skills or behaviours typically taught in school.
The Test Blueprint
A test blueprint is a table that matches the test items to be written with the content areas and levels of behavioural complexity taught. The test blueprint helps ensure that a test samples learning across (1) the range of content areas covered and (2) the cognitive and/or affective processes considered important.
Objective Test Items
Objective test item formats include the following:
- True-false
- Matching
- Multiple choice
True-false
To reduce the effects of guessing in true/ false items, encourage all students to guess when they do not know the answer, and require the revision of statements that are false.
Two methods for reducing the effects of guessing in true-false items are:
- to encourage all students to guess when they do not know the answer and
- to require revision of statements that are false.
Matching
To construct good matching items:
- Make lists homogeneous, representing the same kind of events, people, or circumstances.
- Place the shorter list first, and list options in chronological, numbered, or alphabetical order.
- Provide approximately three more options than descriptions.
- Write directions to identify what the lists contain and specify the basis for matching.
- Closely check the options for multiple correct answers.
In constructing matching items:
- Make lists homogeneous, representing the same kind of events, people, or circumstances.
- Place the shorter list first and list options in chronological, numbered, or alphabetical order.
- Provide approximately three more options than descriptions to reduce the chance of guessing correctly.
- Write directions to identify what the lists contain and specify the basis for matching.
- Closely check the options for multiple correct answers.
Multiple choice
· Avoid the following flaws
when writing multiple-choice items:
- Using stem clues, in which the same word
or a close derivative appears in both the stem and an option
- Grammatical clues, in which an article,
verb, or pronoun eliminates one or more options from being grammatically
correct
- Repeating the same words across options
that could have been provided only once in the stem
- Making response options of unequal
lengths, indicating that the longest option may be correct
- The use of “all of the above,” which
discourages response discrimination, and “none of the above,” which encourages
guessing
- Suggestions for writing higher-level
multiple-choice items include: pictorial, graphical, or tabular stimuli;
analogies that demonstrate relationships among items; and application of
previously learned
Avoid the following flaws when writing
multiple-choice items:
- Stem clues in which the same word or a
close derivative appears in both the stem and an option
- Grammatical clues in which an article,
verb, or pronoun eliminates one or more options from being grammatically
correct
- Repeating same words across options that
could have been provided only once in the stem
- Making response options of unequal length,
indicating the longer option may be correct
- The use of "all of the above," which discourages response discrimination, or "none of the above," which encourages guessing
Suggestions for writing higher-level multiple-choice items
include use of the following:
- Pictorial, graphical, or tabular stimuli
- Analogies that demonstrate relationships
among items
- Previously learned principles or
procedures
Completion
or short answer
The following are suggestions for writing completion items:
- Require single-word answers.
- Pose each question or problem in a brief,
definite statement.
- Check to be sure the response is factually
correct.
- Omit only one or two key words in the
item.
- Word the statement so the blank is near
the end.
- If the question requires a numerical answer, indicate the units in which the answer is to be expressed.
The following are suggestions for writing
completion items:
- Require a single-word answer.
- Pose the question or problem in a brief, definite statement.
- Check to be sure an accurate response can be found in the text, workbook, or class notes.
- Omit only one or two key words.
- Word the statement so the blank is near the end.
- If the question requires a numerical answer, indicate the units in which the answer is to be expressed.
Essay
Test Items
An extended-response essay item allows the student to
determine the length and complexity of a response.
A restricted-response essay item poses a specific problem for which the student must recall and organize the proper information, derive a defensible conclusion, and express it within a stated time or length.
Essay items are most appropriate when:
- the instructional objectives specify
high-level cognitive processes,
- relatively few test items (students) need
to be graded, and
- test security is a consideration.
1. Suggestions for writing essay items include the following:
- Identify beforehand the mental processes
you want to measure (e.g., application, analysis, decision making).
- Identify clearly and unambiguously the
task to be accomplished by the student.
- Begin the essay question with key words,
such as compare, give reasons for, predict.
- Require presentation of evidence for
controversial questions.
- Avoid optional items.
- Establish reasonable time and/or page limits.
2. Restrict the use of essay items to those that cannot
easily be measured by multiple-choice items.
3. Relate each essay question to an objective on the test
blueprint.
4. Be sure each essay question relates to an objective on the
test blueprint.
5. Use of the following criteria will help increase
consistency and accuracy when scoring essay items:
- content,
- organization,
- process,
- completeness/internal consistency, and
- originality/creativity.
The following are suggestions for increasing consistency and
accuracy when scoring essay items:
·
Specify the response length.
·
Use several restricted-response essay
items instead of one extended-response item.
·
Prepare a scoring scheme in which you
specify beforehand all ingredients necessary to achieve each of the grades that
could be assigned.
Packaging
Test
Some suggestions for packaging the test are as follows:
- Group together all items of similar
format.
- Arrange test items from easy to hard.
- Space items for easy reading.
- Keep items and options on the same page.
- Position illustrations near descriptions.
- Check the answer key.
- Determine beforehand how students are to
record the answers.
- Provide space for name and date.
- Check test directions for clarity.
- Proofread the test.
Validity
Validity refers to whether a test measures what it says it
measures. Three types of validity are content, concurrent, and predictive.
- Content validity is established by
examining a test's contents.
- Concurrent validity is established by
correlating the scores on a new test with the scores on an established test
given to the same set of individuals.
- Predictive validity is established by
correlating the scores on a new test with some future behaviour of the examinee
that is representative of the test's content.
Reliability
Reliability refers to whether a test yields the same or
similar scores consistently. Three types of reliability are test-retest,
alternative form, and internal consistency.
- Test-retest reliability is established by
giving the test twice to the same individuals and correlating the first set of
scores with the second.
- Alternative form reliability is
established by giving two parallel but different forms of the test to the same
individuals and correlating the two sets of scores.
- Internal consistency reliability is
established by determining the extent to which the test measures a single basic
concept.
Accuracy
Accuracy refers to whether a test approximates an
individual's true level of knowledge, skill, or ability.
Marks
and Marking Systems
Marks are based on comparisons, usually comparisons of
students with one or more of the following:
- Other students
- Established standards
- Aptitude
- Actual (Achievement) versus potential
effort
- Actual (Achievement) versus potential
improvement
Standardized
Tests
Standardized tests are developed by test construction specialists to determine a student's performance level relative to others of similar age and grade. They are standardized because they are administered and scored according to specific and uniform procedures.
Four ways in which a standardized test can
be biased are:
- Certain cultural groups on the average
consistently score higher than other groups (Bias in group differences)
- Certain cultural groups are not included
in the norming sample (Sample bias)
- The culture or dialect of the examiner
does not match that of the examinee (Examiner and language bias)
- Test results are used in ways that deprive some groups of educational opportunities (Bias in test use)
The following assumptions are likely to guide the development
of standardized tests in the future:
- Learning is a process.
- Learning ability can be improved.
- Learning occurs in a social context.
- Learning assessment should have
instructional validity.
Performance
tests
Performance tests use direct measures of learning that require learners to analyze, problem-solve, experiment, make decisions, measure, cooperate with others, present orally, or produce a product.
Performance tests not only can assess higher-level cognitive skills but also non-cognitive outcomes, such as self-direction, ability to work with others, and social awareness.
Paper and pencil tests are most efficient, reliable, and valid for assessing knowledge, comprehension, and some types of application. When properly constructed, performance tests are most efficient, reliable, and valid for assessing complex thinking, attitudes, and social skills.
Three questions to ask when deciding what
to test with a performance assessment are the following:
- What knowledge or content is essential for
learner understanding?
- What intellectual skills are used?
- What habits of mind or attitudes are important?
Two categories of performance skills in the
cognitive domain are:
- skills related to acquiring information
and
- skills related to organizing and using information.
The four steps to constructing a
performance assessment are:
- deciding what to test,
- designing the assessment context,
- specifying the scoring rubrics, and
- specifying the testing constraints.
Some questions to ask in designing the
performance assessment context are:
- what does the doing of math, history, and
so on, look and feel like to professionals,
- what projects and tasks are performed by
these professionals, and
- what roles—or habits of mind—do professionals assume?
A good performance assessment includes a hands-on exercise or
problem, an observable outcome, and a process that can be observed.
Rubrics and Primary trait scoring
Rubrics are scoring standards composed of model answers, which are used to score performance tests. They are samples of acceptable responses against which the rater compares a student's performance.
Primary trait scoring is a type of rating that requires the
test developer to first identify the most relevant characteristics or primary
traits of importance.
Performance
Test Accomplishments
A performance test can require four types
of accomplishments from learners:
- Products,
- complex cognitive processes,
- observable performance, and
- attitudes and social skills.
Methods
of Scoring
These performances can be scored with checklists, rating
scales, or holistic scales.
- Checklists contain lists of behaviours, traits, or
characteristics that can be scored as either present or absent. They are best
suited for complex behaviours or performances that are divisible into a series
of clearly defined, specific actions.
- Rating scales assign numbers to categories representing
different degrees of performance. They are typically used for those aspects of
a complex performance, such as attitudes, products, and social skills, that do
not lend themselves to yes/no or present/absent type judgments.
- Holistic scoring estimates the overall quality of
a performance by assigning a single numerical value to represent a specific
category of accomplishment. It is used for measuring both products and
processes.
Constraints
on constructing and administering a performance test
Constraints to decide on when constructing
and administering a performance test are:
- amount of time allowed,
- use of reference material,
- help from others,
- use of specialized equipment,
- prior knowledge of the task, and
- scoring criteria.
Performance
and Portfolio Assessment
A performance assessment asks learners to show what they know by measuring complex cognitive skills with authentic, real-world tasks.
A portfolio is a planned collection of learner achievement
that documents what a student has accomplished and the steps taken to get
there.
Approaches
to Combining Performance Grades
Two approaches to combining performance grades with other
grades are to
· assign 100 total points to each assignment
that is graded and average the results and
· begin with an arbitrary total and then determine the percentage of points each component is worth.
Packaging Test
grades with other grades are to
- assign 100 total points to each assignment that is graded and average the results and
- begin with an arbitrary total and then determine the percentage of points each component is worth.
No Child Left Behind (NCLB)
The intent of No Child Left Behind (NCLB) is to improve the educational opportunities of every learner—regardless of ethnicity, income, or background.
Individuals with Disabilities Education Improvement Act (IDEIA)
The intent of the Individuals with Disabilities Education Improvement Act (IDEIA) is to provide children with disabilities a free and appropriate public education (FAPE) and to ensure that special education students have access to all the potential benefits that regular education students have from the general curriculum.
Response To Intervention (RTI)
The intent of Response To Intervention (RTI) is to promote early identification of students who may be at risk for learning difficulties through a three-tiered intervention model.
Multiple Choice Questions
Instructions
Click on 'Details' below to begin the test
2. A test blueprint ensures that a test will sample learning across a range of.....
3. To require students to think at least at the application level on multiple-choice tests, teachers should write questions using......
4. Well-constructed essay questions require students to.....
5. Included in the instructions for essay tests should be.....
6. To aid in scoring consistency, teachers should......
7. Good essay tests.....
8. One method in constructing tests that will ease test anxiety is to.....
9. If a test measures what it says it is supposed to measure, then the test is considered to be…..
10. Test-retest, alternative form, and internal consistency are three methods used to determine.....
11. Performance tests.....
12. Performance tests, if properly constructed, are the best to use when assessing.....
13. Behaviours such as constructive criticism, tolerance of ambiguity, respect for reason, and appreciation are examples of.....
14. Checklists, rating scales, and holistic scoring are examples of.....
15. When planning scoring systems for rating scales using primary trait scoring, which question(s) should be asked?
16. Ms. Kelley is grading extended essays received from her students. When grading these essays, she is most interested in the overall quality of the paper rather than specific aspects of what is included in the paper. She should probably use......
17. Mr. Garcia is working on a performance test for his English II students. He is not sure how much time to give them or whether he should allow them to use references and other people for help. They will be allowed to use word processors and he is confident the students have enough prior knowledge to be successful. He is still concerned about what scoring criteria to use. Mr. Garcia is developing.....
18. If the teacher is interested in the learner's growth in proficiency, long term achievement, and significant accomplishments in a given area he/she would use a.....
19. If a test measures what it says it is supposed to measure, then the test is considered to be…..
20. A good performance assessment includes a hands-on exercise or problem, an observable outcome, and....
21. Which of the following is not a common student test constraint to be considered in creating performance assessments?
True/False
Instructions
Click on 'Details' below to begin the test
2. On a true-false test, a student has a 50% chance of selecting the correct answer whether he/she reads the item or not.
3. When designing matching test questions, there should be the same number of options as descriptions.
4. Using options such as "answer two of the following four questions" helps form a basis for comparison among students.
5. Using a predetermined scoring scheme helps when scoring extended response essay questions.
6. In content validity, the instructional objectives provide the point of reference.
7. Internal consistency assumes that people who get one item right will more likely get other, similar items right.
8. All other factors being equal, the fewer items included in a test, the higher the test's reliability.
9. "Grading on the curve" means adding points to everyone's test grade.
10. When certain ethnic groups usually score higher than other ethnic groups, it is called test bias.
11. A well-constructed performance test can serve as a student learning experience.
12. One of the main limitations of performance tests is the time required to reliably score them.
13. Portfolios are an alternative to paper-and-pencil tests, essay tests, or performance tests.
14. Parents need not be involved with decisions concerning the development of portfolios.
15. Since portfolios may cover an extended time period, time lines are not important.
16. Performance tests are the fastest and easiest way for a teacher to assess students' progress.
17. Portfolios show off the learner's best work plus the steps it took him/her to get there.
18. One purpose for a portfolio is to provide information about a learner that no other measurement tool can provide.
19. True-false and multiple-choice questions require greater use of judgment than performance assessments.
20. Performance assessments are meant to serve and enhance instruction rather than being just a test given to assign a grade.
21. Generally, the more items included in a test, the higher the test's reliability.
22. An advantage to "grading on the curve" is that it simplifies marking decisions.
23. The purpose of a test blueprint is to create a format for grading future tests-saving a teacher time and effort in writing tests.
24. Equal differences between percentile ranks indicate equal differences in achievement.
25. One way to reduce the effects of guessing on True/False tests is to require students to correct false items to make them true.
26. When standardized tests originated, it was widely believed that learning ability was inherited, fixed, and largely unchangeable.
27. A good practice for giving essay tests is to write many essay questions and allow students to choose the one they want to answer--enhancing a sense of choice and self-expression.
28. Teachers should avoid using controversial items on essay tests because there is no single right answer.
29. When creating a scoring system for a performance assessment, the number of points or categories should be limited for ease of use and scoring.
30. It is important to remember that performance assessments are tests and that no learning should occur during the assessment.
31. An area conventional tests have assessed very well over the years is that of student affect and attitude.
32. A well-planned performance assessment presents the learner with an authentic, real-world problem or challenge.
33. Conventional paper-pencil tests are popular because they measure learning directly.
34. One advantage of performance assessment is that it can be used at any point in the instruction process without losing its usefulness.
Comments
Post a Comment