A Note on the Format of Ennis' Multiple-Choice Tests of Deductive Reasoning Competence

E.P. Brandon, 13th October 1992

Archived in ERIC Documentation Service, TM 019 276 


The central notion of deductive logic is that of a valid argument, an argument in which the conclusion really does follow from the premisses.  The standard informal explanation of validity is that it should be impossible for the premisses of the argument to be true and the conclusion false.

Investigation of people's actual competence in matters of deductive logic could use questions of the form "Does r follow from p and q?"  But given the standard informal explanation, one could avoid any uncertainties people might feel about what it is for one statement to follow from another, by framing the question in terms of the notions of truth and falsehood.  So, for instance, in his pioneer investigations of deductive logical reasoning competence, Robert Ennis (Ennis and Paulus, 1965) employed the following question structure:
        Suppose you know that p, q, ....
        Then would it be true that r?
In an adaptation of Ennis' work for use in Jamaica, the three possible answers offered (Yes; No; Maybe) are glossed as "Yes" means "It must be true, given what you are told"; "No" means "It can't be true, given what you are told"; and "Maybe" means "It may be true or it may be false.  You haven't been told enough to be certain whether it is Yes or No."

Given Ennis' question format, and the standard construal of validity, when the sentences constitute a deductively valid argument the correct answer is either "Yes" or "No"; when they do not make up a valid argument the correct answer is "Maybe."

In the course of the Jamaican investigations (reported in Nolan and Brandon, 1986, and Brandon, 1990) a doubt arose concerning the question format.  While the correct answers in the case of valid arguments seem conversationally appropriate, this does not seem so obviously the case for the invalid ones.  Contrast these two dialogues:
    (i) Suppose some vegetarians drink milk;
        would some people who drink milk be vegetarians?
        Yes.
   (ii) Suppose most teachers are women;
        would it be true that most women are teachers?
        Maybe.
Speaking personally, my usual response in cases such as (ii) would be to say "No," or more fully "No, not necessarily."

In Ennis' original investigation, the results were of no consequence for those tested; subjects had no time limit to complete the questions; and they were reminded of the meaning of the answers on every page of the question booklet.  The collection of most of the data in Jamaica has involved three serious departures from this set-up: the questions have formed part of an entrance examination for the Faculty of Education; there has been a time-limit for the examination; and instructions were given only on the first page of the appropriate section of the booklet.  With such an increase in pressure, it is likely that interference from linguistically odd constructions would be more serious than in Ennis' original investigations.

It was decided to investigate the possible effect of the question format by using the same items in two successive years with one small change in question format: to replace "Maybe" with "Not necessarily."

The item analyses in Tables 1 and 2 report the main findings: in general the change of format makes no difference to performance on the valid items (Table 1) but a considerable difference to performance on the invalid questions (Table 2).  The tables give the numbers offering each of the three possible answers or omitting to answer the question, the correct answer in each case, the difficulty index for the item (really a facility index), and finally the chi-square value and its probability for the comparison of the distribution for the two years.

Table 1:  Item analysis 1990/1 - Valid items  

  1990 (N=537) 1991 (N=474) Xsq p
  Y N M Omit Right Diff. Y N NN Omit Diff.    
modpon           0.83         0.84    
q2 426 5 103 3 Y 0.79 381 5 83 5 0.80 1.24 0.74
q4 421 63 44 9 Y 0.78 390 42 37 5 0.82 3.22 0.36
q14 32 447 55 3 N 0.83 43 399 29 3 0.84 8.49 0.04*
q16 510 3 21 3 Y 0.95 454 3 17 0 0.96 2.76 0.43
q19 452 12 67 6 Y 0.84 403 7 61 3 0.85 1.49 0.69
q32 407 24 76 30 Y 0.76 357 20 70 27 0.75 0.12 0.99
modtol           0.69         0.66    
q5 267 140 108 22 Y 0.50 221 110 125 18 0.47 5.67 0.13
q10 64 389 80 4 N 0.72 51 335 87 1 0.71 3.68 0.30
q15 30 385 113 9 N 0.72 32 329 108 5 0.69 1.79 0.62
q29 419 24 75 19 Y 0.78 363 24 69 18 0.77 0.36 0.95
q33 381 55 67 34 Y 0.71 332 33 72 37 0.70 5.27 0.15
q36 60 363 61 53 N 0.68 55 301 77 41 0.64 5.49 0.14
dissyl           0.84         0.86    
q1 394 34 102 7 Y 0.73 359 23 87 5 0.76 1.35 0.72
q8 32 463 40 2 N 0.86 33 406 35 0 0.86 2.17 0.54
q13 51 439 41 6 N 0.82 46 397 29 2 0.84 2.51 0.47
q21 496 9 26 6 Y 0.92 442 11 16 5 0.93 1.86 0.60
q25 475 8 42 12 Y 0.88 422 15 27 10 0.89 4.80 0.19
q34 458 7 38 34 Y 0.85 401 11 24 38 0.85 4.15 0.25

Table 1 shows that in only one of the 18 valid items in the test was there a significantly different distribution of answers between the two years, though it made no difference to the difficulty of the item.  In general the three forms of reasoning were of similar difficulty for the two groups.

Comparison with Table 2 shows that the overall facility of two of the three invalid types of item increased markedly and that in 13 of the 18 invalid items the distribution of answers was significantly different.  

Table 2: Item analysis 1990/1 - Invalid items   

  1990 (N=537) 1991 (N=474) Xsq p
  Y N M Omit Right Diff. Y N NN Omit Diff.    
unmost           0.35         0.32    
q3 319 40 172 6 M 0.32 309 23 138 4 0.29 4.97 0.17
q7 234 91 208 4 M 0.39 246 72 149 7 0.31 9.19 0.03*
q12 264 27 242 4 M 0.45 262 22 187 3 0.39 3.80 0.28
q17 248 29 254 6 M 0.47 230 36 204 4 0.43 3.38 0.34
q20 321 29 166 21 M 0.31 230 35 195 14 0.41 15.46 0.00**
q28 393 44 71 29 M 0.13 379 30 39 26 0.08 8.48 0.04*
minval           0.40         0.50    
q6 203 150 181 3 M 0.34 167 78 224 5 0.47 27.49 0.00***
q11 166 180 183 8 M 0.34 139 111 220 4 0.46 19.01 0.00***
q18 172 87 272 6 M 0.51 140 46 284 4 0.60 12.70 0.01**
q23 178 105 245 9 M 0.46 138 71 259 6 0.55 8.73 0.03*
q26 256 50 214 17 M 0.40 190 18 254 12 0.54 25.28 0.00***
q31 263 52 192 30 M 0.36 238 40 169 27 0.36 0.51 0.92
nonono           0.25         0.34    
q9 115 271 142 9 M 0.26 91 204 168 11 0.35 10.74 0.01*
q22 180 180 144 33 M 0.27 149 129 170 26 0.36 10.44 0.02*
q24 136 263 110 28 M 0.20 93 213 143 25 0.30 13.93 0.00**
q27 123 280 113 21 M 0.21 112 215 126 21 0.27 5.85 0.12
q30 108 262 137 30 M 0.26 93 176 182 23 0.38 21.44 0.00***
q35 144 180 159 54 M 0.30 114 130 171 59 0.36 8.32 0.04*

These results seem to show that the change from "Maybe" to "Not necessarily" has a marked effect in many of those cases where it is the salient and correct answer.  In most cases, more respondents give the correct answer with "Not necessarily" as the prompt.  They suggest also the possibility that the low scores found by Ennis on invalid items might to some small extent be a product of the test format.

While performance seems to improve on the invalid items taken individually, subjects do not seem to improve quite so much when one looks at the consistency of their performance on groups of items.  Ennis characterises mastery of a principle as the correct answering of at least 5 out of the 6 items for that principle; "borderline" performance involves getting 4 of the 6 right.  Table 3 shows percentage mastery and borderline performance on the six principles tested for the two groups.  As can be seen, there is some improvement in mastery and borderline rates for two of the three principles, but there have been smaller fluctuations between years and there is a trend for performance on minval to improve over the years.  To some extent this failure of better performance on individual items to translate into mastery is due to the time constraints - Tables 1 and 2 reveal increasing omissions towards the end of the test.  But it is also a matter of what is still a hazy grasp of the logic - less than half the subjects get right answers on most of the items.

  Table 3: Percentage mastery/borderline performance       

1990
  MODPON MODTOL DISSYL UNMOST NONONO MINVAL
Mastery 72 47 77 8 7 18
Borderline 15 21 9 15 6 12
1991
Mastery 74 43 78 6 9 24
Borderline 13 20 10 12 10 17

REFERENCES

Brandon, E.P. (1990). The Deductive Logical Competence of Non-graduate Caribbean Teachers. ERIC Documentation Service, ED 315 330.

Ennis, R. H. and Paulus, D. H. (1965).  Critical Thinking Readiness in Grades 1-12 (Phase I, Deductive Reasoning in Adolescence).  Cornell Critical Thinking Project (ERIC Document Reproduction Service No. ED 003 818).

Nolan, C. A. and Brandon, E. P. (1986).  Conditional reasoning in Jamaica.  Paper given to the Conference on Thinking, Harvard, 1984 (ERIC Document Reproduction Service No. SO 016 755).  


HTML prepared 23rd January, 2000.

URL http://www.uwichill.edu.bb/bnccde/epb/ERIC.htm