Chapter 12: Assessing Learners


Chapter 12: Assessing Learners

Chapter Overview

Some of your strongest childhood and adolescent memories probably include taking tests in school. For that matter, test taking probably is among the most vivid memories of your college experience. If you are like most people who have spent many years in school, you have strong or mixed feelings about tests. In this chapter, we try to dispel some of the discomfort you might feel about tests and show how they can be effective tools in your classroom.

 

This chapter introduced you to some techniques for assessing student learning. Its key terms and main points were:

Norm-Referenced and Criterion-Referenced Tests

A test that determines a student's place or rank among other students is called a norm-referenced test (NRT). This type of test conveys information about how a student performed compared to a large sample of pupils at the same age or grade.

 

A test that compares a student's performance to a standard of mastery is called a criterion-referenced test (CRT). This type of test conveys information about whether a student needs additional instruction on some skill or set of skills.

 

The major advantage of an NRT is that it covers many different content areas in a single test; its major disadvantage is that it is too general to be useful in identifying specific strengths and weaknesses tied to individual texts or workbooks.

 

The major advantage of a CRT is that it can yield highly specific information about individual skills or behaviours. Its major disadvantage is that many such tests would be needed to make decisions about the many skills or behaviours typically taught in school.

The Test Blueprint

A test blueprint is a table that matches the test items to be written with the content areas and levels of behavioural complexity taught. The test blueprint helps ensure that a test samples learning across (1) the range of content areas covered and (2) the cognitive and/or affective processes considered important.

Objective Test Items

Objective test item formats include the following:

  • True-false
  • Matching
  • Multiple choice

True-false

To reduce the effects of guessing in true/ false items, encourage all students to guess when they do not know the answer, and require the revision of statements that are false.

 

Two methods for reducing the effects of guessing in true-false items are:

  • to encourage all students to guess when they do not know the answer and
  • to require revision of statements that are false.

Matching

To construct good matching items:

  • Make lists homogeneous, representing the same kind of events, people, or circumstances.
  • Place the shorter list first, and list options in chronological, numbered, or alphabetical order.
  • Provide approximately three more options than descriptions.
  • Write directions to identify what the lists contain and specify the basis for matching.
  • Closely check the options for multiple correct answers.

In constructing matching items:

  • Make lists homogeneous, representing the same kind of events, people, or circumstances.
  • Place the shorter list first and list options in chronological, numbered, or alphabetical order.
  • Provide approximately three more options than descriptions to reduce the chance of guessing correctly.
  •  Write directions to identify what the lists contain and specify the basis for matching.
  • Closely check the options for multiple correct answers.

Multiple choice

·         Avoid the following flaws when writing multiple-choice items:

  • Using stem clues, in which the same word or a close derivative appears in both the stem and an option
  • Grammatical clues, in which an article, verb, or pronoun eliminates one or more options from being grammatically correct
  • Repeating the same words across options that could have been provided only once in the stem
  • Making response options of unequal lengths, indicating that the longest option may be correct
  • The use of “all of the above,” which discourages response discrimination, and “none of the above,” which encourages guessing
  • Suggestions for writing higher-level multiple-choice items include: pictorial, graphical, or tabular stimuli; analogies that demonstrate relationships among items; and application of previously learned

Avoid the following flaws when writing multiple-choice items:

  • Stem clues in which the same word or a close derivative appears in both the stem and an option
  • Grammatical clues in which an article, verb, or pronoun eliminates one or more options from being grammatically correct
  • Repeating same words across options that could have been provided only once in the stem
  • Making response options of unequal length, indicating the longer option may be correct
  • The use of "all of the above," which discourages response discrimination, or "none of the above," which encourages guessing

Suggestions for writing higher-level multiple-choice items include use of the following:

  • Pictorial, graphical, or tabular stimuli
  • Analogies that demonstrate relationships among items
  • Previously learned principles or procedures

Completion or short answer

The following are suggestions for writing completion items:

  • Require single-word answers.
  • Pose each question or problem in a brief, definite statement.
  • Check to be sure the response is factually correct.
  • Omit only one or two key words in the item.
  • Word the statement so the blank is near the end.
  • If the question requires a numerical answer, indicate the units in which the answer is to be expressed.

The following are suggestions for writing completion items:

  • Require a single-word answer.
  • Pose the question or problem in a brief, definite statement.
  • Check to be sure an accurate response can be found in the text, workbook, or class notes.
  • Omit only one or two key words.
  • Word the statement so the blank is near the end.
  • If the question requires a numerical answer, indicate the units in which the answer is to be expressed.

Essay Test Items

An extended-response essay item allows the student to determine the length and complexity of a response.

A restricted-response essay item poses a specific problem for which the student must recall and organize the proper information, derive a defensible conclusion, and express it within a stated time or length.

Essay items are most appropriate when:

  • the instructional objectives specify high-level cognitive processes,
  • relatively few test items (students) need to be graded, and
  • test security is a consideration.
Suggestions for writing essay items

1. Suggestions for writing essay items include the following:

  • Identify beforehand the mental processes you want to measure (e.g., application, analysis, decision making).
  • Identify clearly and unambiguously the task to be accomplished by the student.
  • Begin the essay question with key words, such as compare, give reasons for, predict.
  • Require presentation of evidence for controversial questions.
  • Avoid optional items.
  • Establish reasonable time and/or page limits.

2. Restrict the use of essay items to those that cannot easily be measured by multiple-choice items.

3. Relate each essay question to an objective on the test blueprint.

4. Be sure each essay question relates to an objective on the test blueprint.

5. Use of the following criteria will help increase consistency and accuracy when scoring essay items:

  • content,
  • organization,
  • process,
  • completeness/internal consistency, and
  • originality/creativity.

The following are suggestions for increasing consistency and accuracy when scoring essay items:

·         Specify the response length.

·         Use several restricted-response essay items instead of one extended-response item.

·         Prepare a scoring scheme in which you specify beforehand all ingredients necessary to achieve each of the grades that could be assigned.

Packaging Test

Some suggestions for packaging the test are as follows:

  • Group together all items of similar format.
  • Arrange test items from easy to hard.
  • Space items for easy reading.
  • Keep items and options on the same page.
  • Position illustrations near descriptions.
  • Check the answer key.
  • Determine beforehand how students are to record the answers.
  • Provide space for name and date.
  • Check test directions for clarity.
  • Proofread the test.

Validity

Validity refers to whether a test measures what it says it measures. Three types of validity are content, concurrent, and predictive.

  • Content validity is established by examining a test's contents.
  • Concurrent validity is established by correlating the scores on a new test with the scores on an established test given to the same set of individuals.
  • Predictive validity is established by correlating the scores on a new test with some future behaviour of the examinee that is representative of the test's content.

Reliability

Reliability refers to whether a test yields the same or similar scores consistently. Three types of reliability are test-retest, alternative form, and internal consistency.

  • Test-retest reliability is established by giving the test twice to the same individuals and correlating the first set of scores with the second.
  • Alternative form reliability is established by giving two parallel but different forms of the test to the same individuals and correlating the two sets of scores.
  • Internal consistency reliability is established by determining the extent to which the test measures a single basic concept.

Accuracy

Accuracy refers to whether a test approximates an individual's true level of knowledge, skill, or ability.

Marks and Marking Systems

Marks are based on comparisons, usually comparisons of students with one or more of the following:

  • Other students
  • Established standards
  • Aptitude
  • Actual (Achievement) versus potential effort
  • Actual (Achievement) versus potential improvement

Standardized Tests

Standardized tests are developed by test construction specialists to determine a student's performance level relative to others of similar age and grade. They are standardized because they are administered and scored according to specific and uniform procedures.

Four ways in which a standardized test can be biased are:

  • Certain cultural groups on the average consistently score higher than other groups (Bias in group differences)
  • Certain cultural groups are not included in the norming sample (Sample bias)
  • The culture or dialect of the examiner does not match that of the examinee (Examiner and language bias)
  • Test results are used in ways that deprive some groups of educational opportunities (Bias in test use)

The following assumptions are likely to guide the development of standardized tests in the future:

  • Learning is a process.
  • Learning ability can be improved.
  • Learning occurs in a social context.
  • Learning assessment should have instructional validity.

Performance tests

Performance tests use direct measures of learning that require learners to analyze, problem-solve, experiment, make decisions, measure, cooperate with others, present orally, or produce a product.

Performance tests not only can assess higher-level cognitive skills but also non-cognitive outcomes, such as self-direction, ability to work with others, and social awareness.

Paper and pencil tests are most efficient, reliable, and valid for assessing knowledge, comprehension, and some types of application. When properly constructed, performance tests are most efficient, reliable, and valid for assessing complex thinking, attitudes, and social skills.

Three questions to ask when deciding what to test with a performance assessment are the following:

  • What knowledge or content is essential for learner understanding?
  • What intellectual skills are used?
  • What habits of mind or attitudes are important?

Two categories of performance skills in the cognitive domain are:

  • skills related to acquiring information and
  • skills related to organizing and using information.

The four steps to constructing a performance assessment are:

  • deciding what to test,
  • designing the assessment context,
  • specifying the scoring rubrics, and
  • specifying the testing constraints.

Some questions to ask in designing the performance assessment context are:

  • what does the doing of math, history, and so on, look and feel like to professionals,
  • what projects and tasks are performed by these professionals, and
  • what roles—or habits of mind—do professionals assume?

A good performance assessment includes a hands-on exercise or problem, an observable outcome, and a process that can be observed.

Rubrics and Primary trait scoring

Rubrics are scoring standards composed of model answers, which are used to score performance tests. They are samples of acceptable responses against which the rater compares a student's performance.

Primary trait scoring is a type of rating that requires the test developer to first identify the most relevant characteristics or primary traits of importance.

Performance Test Accomplishments

A performance test can require four types of accomplishments from learners:

  • Products,
  • complex cognitive processes,
  • observable performance, and
  • attitudes and social skills.

Methods of Scoring

These performances can be scored with checklists, rating scales, or holistic scales.

  • Checklists contain lists of behaviours, traits, or characteristics that can be scored as either present or absent. They are best suited for complex behaviours or performances that are divisible into a series of clearly defined, specific actions.
  • Rating scales assign numbers to categories representing different degrees of performance. They are typically used for those aspects of a complex performance, such as attitudes, products, and social skills, that do not lend themselves to yes/no or present/absent type judgments.
  • Holistic scoring estimates the overall quality of a performance by assigning a single numerical value to represent a specific category of accomplishment. It is used for measuring both products and processes.

Constraints on constructing and administering a performance test

Constraints to decide on when constructing and administering a performance test are:

  • amount of time allowed,
  • use of reference material,
  • help from others,
  • use of specialized equipment,
  • prior knowledge of the task, and
  • scoring criteria.

Performance and Portfolio Assessment

A performance assessment asks learners to show what they know by measuring complex cognitive skills with authentic, real-world tasks.

A portfolio is a planned collection of learner achievement that documents what a student has accomplished and the steps taken to get there.

Approaches to Combining Performance Grades

Two approaches to combining performance grades with other grades are to

·     assign 100 total points to each assignment that is graded and average the results and

·     begin with an arbitrary total and then determine the percentage of points each component is worth.

Packaging Test

grades with other grades are to

  • assign 100 total points to each assignment that is graded and average the results and
  • begin with an arbitrary total and then determine the percentage of points each component is worth.

No Child Left Behind (NCLB)

The intent of No Child Left Behind (NCLB) is to improve the educational opportunities of every learner—regardless of ethnicity, income, or background.

Individuals with Disabilities Education Improvement Act (IDEIA)

The intent of the Individuals with Disabilities Education Improvement Act (IDEIA) is to provide children with disabilities a free and appropriate public education (FAPE) and to ensure that special education students have access to all the potential benefits that regular education students have from the general curriculum.

Response To Intervention (RTI)

The intent of Response To Intervention (RTI) is to promote early identification of students who may be at risk for learning difficulties through a three-tiered intervention model.



Multiple Choice Questions

Instructions

Answer all the questions in this section by selecting (clicking) the circle by the letter corresponding to the correct or best answer and then clicking on "Submit Answer" to confirm the answer selected. 


Click on 'Details' below to begin the test


1. Grade-equivalent scores and other scores from norm-referenced test allow.....
A. general, comparative decisions.
B. information on skills mastered.
C. information on processes used.
D. student's level of proficiency.

2. A test blueprint ensures that a test will sample learning across a range of.....
A. multiple content areas.
B. cognitive and/or affective processes considered important.
C. a variety of items that tap different levels of cognitive complexity.
D. both B and C.

3. To require students to think at least at the application level on multiple-choice tests, teachers should write questions using......
A. complex words and sentences.
B. pictures and tables.
C. cooperative learning is typically interactive.
D. today's students don't read very well.

4. Well-constructed essay questions require students to.....
A. recall information.
B. respond to a question that has only one correct answer.
C. use higher-order thinking skills.
D. write lengthy answers to answers.

5. Included in the instructions for essay tests should be.....
A. the type of writing style desired.
B. whether the use of pen or pencil is acceptable.
C. whether spelling and grammar will be counted.
D. both A and B.

6. To aid in scoring consistency, teachers should......
A. grade all the questions without taking long breaks.
B. specify the criteria beforehand.
C. require that all the answers be the same length.
D. have at least two people grade each question.

7. Good essay tests.....
A. take more time to construct than other types of tests.
B. take less time to construct than other types of tests.
C. are helpful for developing lower-level thinking skills.
D. will have one correct answer for each question.

8. One method in constructing tests that will ease test anxiety is to.....
A. mix item formats throughout the test.
B. have a pattern for true-false and multiple-choice answers.
C. arrange test items from easy to hard.
D. put essay questions first.

9. If a test measures what it says it is supposed to measure, then the test is considered to be…..
A. valid.
B. checking for understanding.
C. accurate.
D. consistent.

10. Test-retest, alternative form, and internal consistency are three methods used to determine.....
A. validity.
B. reliability.
C. more structured and more field independent.
D. concurrent validity.

11. Performance tests.....
A. use indicators showing that processes have occurred.
B. simulate real-world activities.
C. are mainly paper and pencil tests.
D. both A and C.

12. Performance tests, if properly constructed, are the best to use when assessing.....
A. attitudes and social skills.
B. knowledge.
C. comprehension.
D. application.

13. Behaviours such as constructive criticism, tolerance of ambiguity, respect for reason, and appreciation are examples of.....
A. higher order thinking skills.
B. concepts.
C. essential tasks.
D. habits of mind.

14. Checklists, rating scales, and holistic scoring are examples of.....
A. Tests.
B. performance assessments.
C. rubrics.
D. multimodal assessments.

15. When planning scoring systems for rating scales using primary trait scoring, which question(s) should be asked?
A. What characteristics most justify receiving a higher score?
B. What are the most important characteristics that show a high degree of the trait?
C. What errors most justify achieving a lower score?
D. Both B and C.

16. Ms. Kelley is grading extended essays received from her students. When grading these essays, she is most interested in the overall quality of the paper rather than specific aspects of what is included in the paper. She should probably use......
A. primary trait scoring.
B. holistic scoring.
C. checklist.
D. rating scale.

17. Mr. Garcia is working on a performance test for his English II students. He is not sure how much time to give them or whether he should allow them to use references and other people for help. They will be allowed to use word processors and he is confident the students have enough prior knowledge to be successful. He is still concerned about what scoring criteria to use. Mr. Garcia is developing.....
A. test constraints.
B. test reliability.
C. scoring efficiency.
D. test validity.

18. If the teacher is interested in the learner's growth in proficiency, long term achievement, and significant accomplishments in a given area he/she would use a.....
A. performance test.
B. combined scoring system.
C. holistic scoring system.
D. portfolio assessment.

19. If a test measures what it says it is supposed to measure, then the test is considered to be…..
A. holistic scoring.
B. weighting.
C. scoring efficiency.
D. combined scoring system.

20. A good performance assessment includes a hands-on exercise or problem, an observable outcome, and....
A. a paper and pencil test.
B. a process that can be observed.
C. a rating scale with which to score it.
D. a portfolio.

21. Which of the following is not a common student test constraint to be considered in creating performance assessments?
A. Time to prepare, revise, finish.
B. Equipment such as calculators, computers, etc.
C. Getting help from others.
D. Cost of reference materials.


True/False

Instructions

Answer all the questions in this section by selecting (clicking) the circle by the letter corresponding to the correct or best answer and then clicking on "Submit Answer" to confirm the answer selected. 


Click on 'Details' below to begin the test


1. A criterion-referenced test compares student performance with an absolute standard.
A. True
B. False

2. On a true-false test, a student has a 50% chance of selecting the correct answer whether he/she reads the item or not.
A. True
B. False

3. When designing matching test questions, there should be the same number of options as descriptions.
A. True
B. False

4. Using options such as "answer two of the following four questions" helps form a basis for comparison among students.
A. True
B. False

5. Using a predetermined scoring scheme helps when scoring extended response essay questions.
A. True
B. False

6. In content validity, the instructional objectives provide the point of reference.
A. True
B. False

7. Internal consistency assumes that people who get one item right will more likely get other, similar items right.
A. True
B. False

8. All other factors being equal, the fewer items included in a test, the higher the test's reliability.
A. True
B. False

9. "Grading on the curve" means adding points to everyone's test grade.
A. True
B. False

10. When certain ethnic groups usually score higher than other ethnic groups, it is called test bias.
A. True
B. False

11. A well-constructed performance test can serve as a student learning experience.
A. True
B. False

12. One of the main limitations of performance tests is the time required to reliably score them.
A. True
B. False

13. Portfolios are an alternative to paper-and-pencil tests, essay tests, or performance tests.
A. True
B. False

14. Parents need not be involved with decisions concerning the development of portfolios.
A. True
B. False

15. Since portfolios may cover an extended time period, time lines are not important.
A. True
B. False

16. Performance tests are the fastest and easiest way for a teacher to assess students' progress.
A. True
B. False

17. Portfolios show off the learner's best work plus the steps it took him/her to get there.
A. True
B. False

18. One purpose for a portfolio is to provide information about a learner that no other measurement tool can provide.
A. True
B. False

19. True-false and multiple-choice questions require greater use of judgment than performance assessments.
A. True
B. False

20. Performance assessments are meant to serve and enhance instruction rather than being just a test given to assign a grade.
A. True
B. False

21. Generally, the more items included in a test, the higher the test's reliability.
A. True
B. False

22. An advantage to "grading on the curve" is that it simplifies marking decisions.
A. True
B. False

23. The purpose of a test blueprint is to create a format for grading future tests-saving a teacher time and effort in writing tests.
A. True
B. False

24. Equal differences between percentile ranks indicate equal differences in achievement.
A. True
B. False

25. One way to reduce the effects of guessing on True/False tests is to require students to correct false items to make them true.
A. True
B. False

26. When standardized tests originated, it was widely believed that learning ability was inherited, fixed, and largely unchangeable.
A. True
B. False

27. A good practice for giving essay tests is to write many essay questions and allow students to choose the one they want to answer--enhancing a sense of choice and self-expression.
A. True
B. False

28. Teachers should avoid using controversial items on essay tests because there is no single right answer.
A. True
B. False

29. When creating a scoring system for a performance assessment, the number of points or categories should be limited for ease of use and scoring.
A. True
B. False

30. It is important to remember that performance assessments are tests and that no learning should occur during the assessment.
A. True
B. False

31. An area conventional tests have assessed very well over the years is that of student affect and attitude.
A. True
B. False

32. A well-planned performance assessment presents the learner with an authentic, real-world problem or challenge.
A. True
B. False

33. Conventional paper-pencil tests are popular because they measure learning directly.
A. True
B. False

34. One advantage of performance assessment is that it can be used at any point in the instruction process without losing its usefulness.
A. True
B. False

Effective Teaching Methods

Contents

Comments

Latest News

UDS Vice Chancellor kicks against conversion of polytechnics into universities

ICT for JHS

Nayiri commends Bashiru Imoro Ibn Saeed on vice chancellor appointment

Why does things like this keep happening?

Popular Posts

ICT for JHS

Regular BECE ICT Pasco

Gov’t Intervenes As Court Orders Auctioning Of GHANASCO Cars



AlatiphA is an Education and Technology blog that provides quality contents on education and technology.

AlatiphA is optimized for news, ebooks, educational templates, training, learning, testing (quizing) and many more.

Tutorials, eBooks and tests are constantly reviewed to avoid errors but we cannot warrant full correctness of all contents.

While using this blog, you agree to have read and accepted our:

Disclaimer,
Terms of use
&
Privacy policy.

We're Social