-
Item Index of difficulty. This was computed by dividing the number
of examinees who got the correct answer by the total number of examinees
(i.e. 100 students).
-
Subtest Index of Difficulty. This was obtained by
getting the arithmetic mean of the index of difficulty of all the
items in the subtest. Whole Test
Index of Difficulty. This was obtained by getting the arithmetic mean of the Index of Difficulty of the five (5) subtests.
-
Item Index of discrimination.
This was obtained by subtracting the number of students in the upper 30 per
cent (U) of the group who got the correct answer by the number of students in the lower 30 percent (L) of the group who got the correct answer; and dividing the difference by the 30 (i.e. number of observations in each group). In symbol, Index of Discrimination (ID) is computed as follows:
ID = (U - L ) / n where,
U = number of students in the Upper 30% of the group which got
the correct answer;
L = number of students in the Lower 30% of the group which got
the correct answer;
n = the number of students the Upper 30% or in the Lower 30% of
the group.
-
SubTest Index of Discrimination.
This was obtained by getting the arithmetic mean of the Item Index of Discrimination.
-
Whole Test Index of Discrimination. This was obtained by
getting the arithmetic mean of Index of Discrimination of the five
(5) Subtests.
-
Plausibility of the test options. This was computed by
getting the proportion of students who mistakenly picked the distracter
(wrong option). This statistic was computed only for the multiple-choice item. Since each of the multiple-choice test item had only four (4) option and since there was only one correct answer, it follows that there must always be three distracters. Thus, for every four-option
multiple-choice item there always three proportions obtained that described
the plausibility of each distracter.
-
To decide whether the 3 distracters for each item were
plausible or not, comparison of the proportions of the 3 distracters
were made using the z-test on significant of difference between proportions. In this test, a 0.05 level of significance was used.
-
To determine the reliability of each of the five subtests,
coefficients of correlation between the test and retest score were
computed. The Pearson Product Moment of Correlation was used as indicator of the reliability of the each of the subtests.
-
To determine the reliability of the whole test, the arithmetic
mean of the reliability coefficients of the five subtests was computed.
-
To determine a quantitative measure of the validity of
each item (Vi), the validators were asked to rate the validity of an item using a five-point Likert scale. The average rating given by the five (5) validators was then computed and used as a measure of the validity of an item.
-
To determine a quantitative measure of the validity of
the subtest (Vs), the arithmetic mean of validity ratings (Vi) was computed.
-
To obtain a quantitative value that described the Usability
of the test, instructor-respondents were asked to response to five
questions using a seven-point Likert Scale. Arithmetic mean of the ratings given by the evaluators was used as indicator of usability of the test.
To provide bases
for interpretations and to guide the researcher in deciding whether to reject or retain an item in the proposed diagnostic test, the following guides were adapted:
For Interpreting Difficulty
Index of an Item (Suggested by Oriondo and Dalo-Antonio, cited in Bermiso, 2003)
Ratings
Interpretation Suggestions
0.80 – 1.0 The item is Very Easy Revise or Discard
0.21 – 0.79 The item is Moderately Difficult Retain
0 - 0.20 The item
is Difficult
Revise or Discard
For Interpreting
Index of Discrimination (Suggested by Ochave, cited in Bermiso, 2003)
Ratings
Interpretation Suggestions
0.41–Higher Very
Good Retain
0.31 – 0.39 Good Item
Retain
0.20 – 0.29 Marginal Item Usually
needs revision
0.19 – lower Poor Item Revised or
Discard
For
Interpreting the Validity Ratings. The researcher devised a rating scale with the following interpretations:
Ratings
Interpretation
4.5 – 5.0
Outstandingly Valid
3.5
– 4.4
Very Satisfactorily Valid
2.5 – 3.4
Satisfactorily Valid
1.5
– 2.4 Less
Valid
1.0
- 1.4
Convincingly Invalid
For Interpreting
Reliability
Rating
Interpretation
0.7 – 1.0
Highly Reliable (The test is good)
0.4
– 0.6 Moderately Reliable
(The test needs refinement)
0.1
- 0.3
Less Reliable (Not a good test)
negative –
0
Not Reliable (Totally useless)
For Interpreting
Usability
Rating
Interpretation
2.5 –
3.0
Highly Usable
1.5 – 2.4
Moderately Usable
0.5 –
1.4 Usable
-0.5 –
0.4 No
Decision
-1.5 - -0.6
Not Usable
-2.5 - -1.6
Moderately Not Usable
-3.0 - -2.6
Convincingly Not Usable
FINDINGS
-
Of the original 260 items prepared for the first draft, 65 items
were finally selected to compose the Diagnostic Test in Basic Mathematics.
-
These 65 items have passed the good test item criteria of Difficulty,
Discrimination and Plausibility.
-
Some of the items were of the Open Response type due to the validity
issue that may arise if the multiple-choice format was used.
-
The test was divided into five subtests which were as follows:
(1) Decimals, Ratio and Percent Test, Fraction and Its Operation Test,
(3) Operations Involving Integers Test, (4) Exponents and Radicals Test,
and (5) Problem Solving Test. Distribution of the 65 items were as follows: 15 items in Decimals, Ratio and Percent Test, 10 items in Fractions and Its Operation Test, 10 items in Operations Involving Integers Test, 15 items in Exponents and Radicals Test, and 15 items in Problem
Solving Test.
-
The average index of difficulty of each of the five subtests
were as follows: 0.641 for Decimals, Ratio and Percent Test, 0.532 for
Fraction and Its Operation Test, 0.740 for Operations Involving Integers Test,
0.502 for Exponents and Radicals Test, and 0.435 for Problem Solving Test.
-
The average index of Discrimination of each of the five subtests
were as follows0.589 for Decimals, Ratio and Percent Test, 0.507for Fraction
and Its Operation Test, 0.490 for Operations Involving Integers Test,
0.498 for Exponents and Radicals Test, and 0.436 for Problem Solving Test.
-
The proportions of students choosing the distracters of the multiple-choice
items did not significantly differ.
-
The average validity rating of each of the five subtests were
as follows 4.52 for Decimals, Ratio and Percent Test, 4.48 for Fraction
and Its Operation Test, 4.32for Operations Involving Integers Test, 3.83
for Exponents and Radicals Test, and 4.27 for Problem Solving Test.
-
The reliability coefficients of each of the five subtests were
as follows 0.90 for Decimals, Ratio and Percent Test, 0.74 for Fraction
and Its Operation Test, 0.84 for Operations Involving Integers Test, 0.78
for Exponents and Radicals Test, and 0.83 for Problem Solving Test. The reliability coefficient of the whole test is 0.82.
-
The mean usability rating of the CDROM version of the test was
2.13, which means that the computer-based version of the test has a higher
degree of usability.
CONCLUSIONS
From the
findings of the study, the following conclusions were reached:
-
The 65 test items have difficulty indices that are within the
range of “moderately difficult” item – an indication
of a good test item.
-
The 65 test items have indices of discrimination that are within
the range of “good” to “very good” item.
-
The distracters of the multiple-choice items were equally plausible.
-
The final draft of 65-item test has a high degree of validity.
-
The individual subtests and the whole test have high reliability
coefficients.
-
The computer-based version of the diagnostic test is highly usable.
RECOMMENDATIONS
In
the light of the findings and conclusions reached, the following recommendations were offered:
-
Students who are planning to take the Licensure Examination for
Teachers (LET) may use this automated test to self-diagnose their strengths
and weaknesses in Basic Mathematics.
-
Students who are currently taking Basic Mathematics course (Math
1) may use this automated test to check their progress or readiness in
taking midterm or periodical examinations.
-
Tutors may advise their tutees to take the automated diagnostic
test and be informed of the result so as to determine the tutorial lessons
to be given to the tutees.
-
Interested researchers and test developers are encouraged to
go into construction of automated selfdiagnostic or self-evaluating software
programs that facilitate students’ self-initiative and acknowledge
individual differences in learning.
BIBLIOGRAPHY
Bermiso, Fe S. 2003.
Paghahanda ng Isang Prototype na Lagumang Pagsusulit sa Filipino I para sa mga Magaaral sa Unang Taon sa Tersyarya, Unpublished Master’s Thesis.
Philippine Normal University, Manila.
Brigance Diagnostic Assessment of Basic Skills, http://ericae.net/eac/eac0056.htm.
Buco, Gerard U. 1997. Development and Validation of an Achievement Test in a Flexibly-Paced Program in Mathematics I. Unpublished Master’s Thesis. Philippine
Normal University, Manila.
California Diagnostic Reading Test, http://ericae.net/eac/eac0059.htm.
Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam. 1972.The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York:
John Wiley.
Cronbach, L.J., Nageswari, R., & Gleser, G.C. 1963. Theory
of Generalizability: A Liberation of Reliability Theory. The British Journal of Statistical Psychology, 16, 137-163.
Dacanay, Antonio G. 1988. The Development and Validation of a Creative Thinking Test for Fourth Year High School Students in Metro Manila. Unpublished Master’s
Thesis. Philippine Normal University, Manila.
Devine, Marjorie and N. Yaghlian. Construction
of Objective Tests. http://ericae.net/eac/eac0079.htm.
Downie, N. M. and R. W. Heath. 1983. Basic
Statistical Methods. New York: Harper & Row Publishers.
Ferguson, G. and Y. Takane. 1989. Statistical
Analysis in Psychology and Education. New York: McGraw-Hill Book Company.
Guiterrez, Danilo S. 1984. The Development
and Validation of a Test on Critical Thinking for Grade Six Pupils. Unpublished Master’s Thesis. Philippine Normal University, Manila.
Key-Math Revised-A Diagnostic Inventory of Essential Mathematics,
Form A & B. http://ericae.net/eac/eac0125.htm
Kirlinger, Fred N. 1973. Foundation
of Behavioral Research. New York: Holt, Rhinehart and Winston, Inc.
Lazarsfeld, P. F. and N. W. Henry. 1968. Latent Structure Analysis. Boston: Houghton Mifflin.
Lebrudo, Ma. Luz R. 1993. The Development and Validation of an Achievement Test in Home Economics and Livelihood Education for Grade Six. Unpublished Master’s
Thesis. Philippine Normal University, Manila.
Lord, F. M. & Novick, M. R.1968. Statistical
theories of mental test scores. Reading, MA: Addison-Wesley.
Losbanes, Helen S. 1995. The Development and Validation of a Test on Scientific Literacy in Chemistry of Fourth Year High School Students in Private and Public Schools of Taguig, Metro Manila. Unpublished Master’s Thesis. Philippine Normal University,
Manila.
Mississippi State University. Test
Construction. http://msu.org/edu/test.htm.
Novick, M. R. 1966. The axioms and principal results of classical
test theory. Journal of Mathematical Psychology, 3, 1-18.
Nunnally, J.C., & Bernstein, I.H. 1994. Psychometric theory. 3rd ed. New York: McGraw Hill.
Preparation Drills for the Missouri Mathematics Placement Test
(MMPT) at MU, http://www.saab.org/mathdrills/mmpt.cgi.
Schreyer Institute for Teaching Excellence. 26 June 2000.
Academic Testing Classical Test Theory Approach, University Testing Services, http://www.uts.psu.edu/Classical_theory_frame.htm.
Spearman, C. 1904. The proof and measurement of association between
two things. American Journal of Psychology, 15, 72-101.
The Maculaitis Assessment Program. http://ericae.net/eac/eac0182.htm.