trailblazerlogo.jpg

Prof. Fabian C. Pontiveros

Home
About Us
Volume 1
Volume 2
Volume 3
Volume 4
Volume 5
Contact Us
links

DEVELOPMENT AND VALIDATION OF AUTOMATED

DIAGNOSTIC TEST FOR BASIC MATHEMATICS

Prof. Fabian C. Pontiveros, Jr.

THE PROBLEM

          The main problem of the study is to determine items for developing a diagnostic test for Basic Mathematics, and to validate the proposed automated test.

          Specifically, it attempts to answer the following questions:

  1. What desired learning competencies of students must be included in the proposed diagnostic test in Basic Mathematics?
  2. What should be the appropriate format of a test item that ensures validity as perceived by experienced mathematics professors?
  3. How many test items for each learning objective must be included in the first draft as perceived by experienced mathematics professors
  4. What is the characteristic of the diagnostic test in terms of

    4.1 Index of Difficulty,

    4.2 Index of Discrimination, and

    4.3 Plausibility of the Distracters

  5. How valid is each of the items in the proposed automated test in diagnosing learning difficulties of students in Basic Mathematics as perceived by experts mathematics professors? 
  6. Is there a satisfactory degree of reliability of the proposed diagnostic test?
  7. How usable is the proposed automated test as perceived by selected mathematics professors?

PROCEDURE

         The study was divided into two overlapping phases – (1) the Development Phase and (2) the Validation Phase. The development phase covered four stages. The first three stages were the same as the one outlined by Professors Marjorie Devine and Nevart Yaghlian of the Center for Testing Excellence. These are (1) Planning the test, (2) Preparing the test, and (3) Analyzing and revising the test. The fourth stage of the development phase was the writing of the automated version of the diagnostic test using a test construction software.

          The following were the specific activities or tasks involved in each stage of the development phase.

Stage 1: Planning for the Test.

  1. Outlining of subject-matter content to be considered as the basis for the test.
  2. Identifying learning objectives to be measured by the test.
  3. Preparing Table of Specifications.
  4. Choosing appropriate types of test items

Stage 2: Preparing the Test. This included:

  1. Writing of test items according to rules of construction for the type chosen,
  2. Selecting of items to be included in the test according to the table of specifications,
  3. Reviewing and editing of items according to suggestions of colleagues and friends,
  4. Arranging and grouping of items according to topics and perceived difficulty

  5. Assigning of scoring guide or key answer,

  6. Actual writing of the first draft of the whole test, and

  7. Submitting a sample copy to expert test constructors and mathematics professor for content validation.

Stage 3: Analyzing and Revising the Test. This stage included:

  1. Trying out the test to students,
  2. Performing test analysis to determine difficulty index, discrimination index, plausibility of the distracters, and reliability of the test, and
  3. Revising the test (Preparing a Second Draft of the test)

Stage 4: Writing the Automated Version. This included:

  1. Selecting appropriate test construction software,
  2. Preparing blueprint of flowcharts or “storyboard” (i.e. architectural design),
  3. Writing the soft copy using the selected software,
  4. Test running the completed software material and checking for errors in the program.
  5. Revising and writing the final soft copy version.

The Validation Phase. The validation phase was characterized by the following events/activities.

  1. Informal Content Validation by the Researcher’s Colleagues. This was done after the first draft was written – before the try out (i.e. before Stage 3.1).
  2. Formal Content Validation by Professors in Mathematics. This was done after the second draft was written.
  3. Test (First Testing). This was the tryout stage (Stage 3.1).
  4. Retest. This was done after the formal content validation stage. A group of thirty (30) students taken from the 100 students who tried the first draft was given the retest. These students were made to believe that their scores during the first testing were missing and they need to retake the test.

Usability Survey. This was done after a soft copy of the automated version of the test was constructed. A group of three mathematics professors who were also teaching Information Technology were asked to tryout the CDROM version of the diagnostic test. They were made to respond to a questionnaire design to solicit opinion on the usability of the computer-based DTBM.

TREATMENT OF DATA

         The responses or opinions of expert test constructors were summarized by getting the measure of central tendency of the data. Qualitative observations like improving the stem of the test items, improving the plausibility of distracters, rearranging the order of the items, rewording, etc. were followed.

         The responses of the second group of respondents (students’ group) were categorized and tabulated. As an aid for interpretation, the following parameters were computed:

  • Item Index of difficulty. This was computed by dividing the number of examinees who got the correct answer by the total number of examinees (i.e. 100 students).
  • Subtest Index of Difficulty. This was obtained by getting the arithmetic mean of the index of difficulty of all the items in the subtest. Whole Test Index of Difficulty. This was obtained by getting the arithmetic mean of the Index of Difficulty of the five (5) subtests.
  • Item Index of discrimination. This was obtained by subtracting the number of students in the upper 30 per cent (U) of the group who got the correct answer by the number of students in the lower 30 percent (L) of the group who got the correct answer; and dividing the difference by the 30 (i.e. number of observations in each group). In symbol, Index of Discrimination (ID) is computed as follows:

ID = (U - L ) / n     where,

U = number of students in the Upper 30% of the group which got the correct answer;

L = number of students in the Lower 30% of the group which got the correct answer;

n = the number of students the Upper 30% or in the Lower 30% of the group.

  •  SubTest Index of Discrimination. This was obtained by getting the arithmetic mean of the Item Index of Discrimination.
  • Whole Test Index of Discrimination. This was obtained by getting the arithmetic mean of Index of Discrimination of the five (5) Subtests.
  • Plausibility of the test options. This was computed by getting the proportion of students who mistakenly picked the distracter (wrong option). This statistic was computed only for the multiple-choice item. Since each of the multiple-choice test item had only four (4) option and since there was only one correct answer, it follows that there must always be three distracters. Thus, for every four-option multiple-choice item there always three proportions obtained that described the plausibility of each distracter.
  • To decide whether the 3 distracters for each item were plausible or not, comparison of the proportions of the 3 distracters were made using the z-test on significant of difference between proportions. In this test, a 0.05 level of significance was used.
  • To determine the reliability of each of the five subtests, coefficients of correlation between the test and retest score were computed. The Pearson Product Moment of Correlation was used as indicator of the reliability of the each of the subtests. 
  • To determine the reliability of the whole test, the arithmetic mean of the reliability coefficients of the five subtests was computed.
  • To determine a quantitative measure of the validity of each item (Vi), the validators were asked to rate the validity of an item using a five-point Likert scale. The average rating given by the five (5) validators was then computed and used as a measure of the validity of an item. 
  • To determine a quantitative measure of the validity of the subtest (Vs), the arithmetic mean of validity ratings (Vi) was computed.
  • To obtain a quantitative value that described the Usability of the test, instructor-respondents were asked to response to five questions using a seven-point Likert Scale. Arithmetic mean of the ratings given by the evaluators was used as indicator of usability of the test.

         To provide bases for interpretations and to guide the researcher in deciding whether to reject or retain an item in the proposed diagnostic test, the following guides were adapted:

        For Interpreting Difficulty Index of an Item (Suggested by Oriondo and Dalo-Antonio, cited in Bermiso, 2003)

Ratings            Interpretation                            Suggestions

0.80 – 1.0     The item is Very Easy                  Revise or Discard

0.21 – 0.79   The item is Moderately Difficult          Retain

0 - 0.20        The item is Difficult                       Revise or Discard

         For Interpreting Index of Discrimination (Suggested by Ochave, cited in Bermiso, 2003)

 Ratings             Interpretation               Suggestions

  0.41–Higher       Very Good                    Retain

            0.31 – 0.39        Good Item                     Retain

            0.20 – 0.29       Marginal Item            Usually needs revision

            0.19 – lower      Poor Item Revised          or Discard

           For Interpreting the Validity Ratings. The researcher devised a rating scale with the following interpretations:

                   Ratings                        Interpretation

                  4.5 – 5.0                     Outstandingly Valid

                 3.5 – 4.4                    Very Satisfactorily Valid

                2.5 – 3.4                        Satisfactorily Valid

               1.5 – 2.4                             Less Valid

               1.0 - 1.4                        Convincingly Invalid

        For Interpreting Reliability

              Rating                                     Interpretation

            0.7 – 1.0                       Highly Reliable (The test is good)

           0.4 – 0.6                 Moderately Reliable (The test needs refinement)

          0.1 - 0.3                         Less Reliable (Not a good test)

         negative – 0                       Not Reliable (Totally useless)

         For Interpreting Usability

           Rating                                  Interpretation

          2.5 – 3.0                                Highly Usable

         1.5 – 2.4                               Moderately Usable

         0.5 – 1.4                                      Usable

         -0.5 – 0.4                                  No Decision

        -1.5 - -0.6                                    Not Usable

        -2.5 - -1.6                            Moderately Not Usable

       -3.0 - -2.6                            Convincingly Not Usable

FINDINGS

    1. Of the original 260 items prepared for the first draft, 65 items were finally selected to compose the Diagnostic Test in Basic Mathematics.
    2. These 65 items have passed the good test item criteria of Difficulty, Discrimination and Plausibility.
    3. Some of the items were of the Open Response type due to the validity issue that may arise if the multiple-choice format was used. 
    4. The test was divided into five subtests which were as follows: (1) Decimals, Ratio and Percent Test, Fraction and Its Operation Test, (3) Operations Involving Integers Test, (4) Exponents and Radicals Test, and (5) Problem Solving Test. Distribution of the 65 items were as follows: 15 items in Decimals, Ratio and Percent Test, 10 items in Fractions and Its Operation Test, 10 items in Operations Involving Integers Test, 15 items in Exponents and Radicals Test, and 15 items in Problem Solving Test.
    5. The average index of difficulty of each of the five subtests were as follows: 0.641 for Decimals, Ratio and Percent Test, 0.532 for Fraction and Its Operation Test, 0.740 for Operations Involving Integers Test, 0.502 for Exponents and Radicals Test, and 0.435 for Problem Solving Test.
    6. The average index of Discrimination of each of the five subtests were as follows0.589 for Decimals, Ratio and Percent Test, 0.507for Fraction and Its Operation Test, 0.490 for Operations Involving Integers Test, 0.498 for Exponents and Radicals Test, and 0.436 for Problem Solving Test. 
    7. The proportions of students choosing the distracters of the multiple-choice items did not significantly differ.
    8. The average validity rating of each of the five subtests were as follows 4.52 for Decimals, Ratio and Percent Test, 4.48 for Fraction and Its Operation Test, 4.32for Operations Involving Integers Test, 3.83 for Exponents and Radicals Test, and 4.27 for Problem Solving Test.
    9. The reliability coefficients of each of the five subtests were as follows 0.90 for Decimals, Ratio and Percent Test, 0.74 for Fraction and Its Operation Test, 0.84 for Operations Involving Integers Test, 0.78 for Exponents and Radicals Test, and 0.83 for Problem Solving Test. The reliability coefficient of the whole test is 0.82.
    10. The mean usability rating of the CDROM version of the test was 2.13, which means that the computer-based version of the test has a higher degree of usability.

CONCLUSIONS

          From the findings of the study, the following conclusions were reached:

    1. The 65 test items have difficulty indices that are within the range of “moderately difficult” item – an indication of a good test item.
    2. The 65 test items have indices of discrimination that are within the range of “good” to “very good” item.
    3. The distracters of the multiple-choice items were equally plausible.
    4. The final draft of 65-item test has a high degree of validity.
    5. The individual subtests and the whole test have high reliability coefficients.
    6. The computer-based version of the diagnostic test is highly usable.

RECOMMENDATIONS

          In the light of the findings and conclusions reached, the following recommendations were offered:

    1. Students who are planning to take the Licensure Examination for Teachers (LET) may use this automated test to self-diagnose their strengths and weaknesses in Basic Mathematics.
    2. Students who are currently taking Basic Mathematics course (Math 1) may use this automated test to check their progress or readiness in taking midterm or periodical examinations.
    3. Tutors may advise their tutees to take the automated diagnostic test and be informed of the result so as to determine the tutorial lessons to be given to the tutees. 
    4. Interested researchers and test developers are encouraged to go into construction of automated selfdiagnostic or self-evaluating software programs that facilitate students’ self-initiative and acknowledge individual differences in learning.

 

BIBLIOGRAPHY

Bermiso, Fe S. 2003. Paghahanda ng Isang Prototype na Lagumang Pagsusulit sa Filipino I para sa mga Magaaral sa Unang Taon sa Tersyarya, Unpublished Master’s Thesis. Philippine Normal University, Manila.

Brigance Diagnostic Assessment of Basic Skills, http://ericae.net/eac/eac0056.htm.

Buco, Gerard U. 1997. Development and Validation of an Achievement Test in a Flexibly-Paced Program in Mathematics I. Unpublished Master’s Thesis. Philippine Normal University, Manila.

California Diagnostic Reading Test, http://ericae.net/eac/eac0059.htm.

Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam. 1972.The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley.

Cronbach, L.J., Nageswari, R., & Gleser, G.C. 1963. Theory of Generalizability: A Liberation of Reliability Theory. The British Journal of Statistical Psychology, 16, 137-163.

Dacanay, Antonio G. 1988. The Development and Validation of a Creative Thinking Test for Fourth Year High School Students in Metro Manila. Unpublished Master’s Thesis. Philippine Normal University, Manila.

Devine, Marjorie and N. Yaghlian. Construction of Objective Tests. http://ericae.net/eac/eac0079.htm.

Downie, N. M. and R. W. Heath. 1983. Basic Statistical Methods. New York: Harper & Row Publishers.

Ferguson, G. and Y. Takane. 1989. Statistical Analysis in Psychology and Education. New York: McGraw-Hill Book Company.

Guiterrez, Danilo S. 1984. The Development and Validation of a Test on Critical Thinking for Grade Six Pupils. Unpublished Master’s Thesis. Philippine Normal University, Manila.

Key-Math Revised-A Diagnostic Inventory of Essential Mathematics, Form A & B. http://ericae.net/eac/eac0125.htm

Kirlinger, Fred N. 1973. Foundation of Behavioral Research. New York: Holt, Rhinehart and Winston, Inc.

Lazarsfeld, P. F. and N. W. Henry. 1968. Latent Structure Analysis. Boston: Houghton Mifflin.

Lebrudo, Ma. Luz R. 1993. The Development and Validation of an Achievement Test in Home Economics and Livelihood Education for Grade Six. Unpublished Master’s Thesis. Philippine Normal University, Manila.

Lord, F. M. & Novick, M. R.1968. Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Losbanes, Helen S. 1995. The Development and Validation of a Test on Scientific Literacy in Chemistry of Fourth Year High School Students in Private and Public Schools of Taguig, Metro Manila. Unpublished Master’s Thesis. Philippine Normal University, Manila.

Mississippi State University. Test Construction. http://msu.org/edu/test.htm.

Novick, M. R. 1966. The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3, 1-18.

Nunnally, J.C., & Bernstein, I.H. 1994. Psychometric theory. 3rd ed. New York: McGraw Hill.

Preparation Drills for the Missouri Mathematics Placement Test (MMPT) at MU, http://www.saab.org/mathdrills/mmpt.cgi.

Schreyer Institute for Teaching Excellence. 26 June 2000. Academic Testing Classical Test Theory Approach, University Testing Services, http://www.uts.psu.edu/Classical_theory_frame.htm.

Spearman, C. 1904. The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.

The Maculaitis Assessment Program. http://ericae.net/eac/eac0182.htm.

<<Previous                    Next >>

 
Copyright © 2005 Trailblazer-PNU-AC. All rights reserved.