Self-assessment and Summative Assessment in E-education - Analysis after the Test, and Before the Results are Published

Statistical analysis of the test or "item" analysis which is conducted after processing the exam form or gathering the answers online, has a double role:

Correction of the final test results by excluding the questions which need to be excluded or correcting the answers which need to be adjusted
Correction of the questions in the question data base and guidelines for writing new questions.

Parts of statistical analysis of the tests

Basic descriptive statistics

The following calculations are done: number of students that took the test, medium value, median, standard deviation, the highest and lowest score.

Item difficulty

Question difficulty (or easiness) (P) is the share of correct answers per each question. It is set after the test by calculating the percentage of correct answers.

Easiness = total number of correct answers / number of students who took the test
Difficulty = 1 - easiness

Questions which almost everybody or nobody answers are "bad" questions i.e. they are either too difficult or too easy and should be excluded from the test. The test should be reappraised and the questions corrected in the base for future use. It is recommended for psychological reason that the first couple of questions be easy (approx. 0.9).
Too easy questions can be acceptable in a diagnostic test or preliminary test with which the knowledge of the whole group, and not of an individual, is tested.
Optimum difficulty level in MCQ with n possible answers can be calculated from the formula:
P = 0,5 + 0.5 (1/n)
For true/false questions optimum difficulty level is 0.75, but range from 0.65 - 0.85 is also acceptable. Questions below or above the recommended range should be corrected.
The best MC questions with 3 distracters have 0.67 difficulty level
The best MC questions with 4 distracters have 0.63 difficulty level
The best MC questions with 5 distracters have 0.60 difficulty level
Generally acceptable difficulty level is from 0.3 to 0.7. Questions below 0.3 are too difficult, while questions above 0.7 are too easy.

Analysis after the Test

After the test a professor should conduct statistical analysis with the help of IT.

It enables the correction of the test, as well as question validity check, and it affects the final results.

It also provides information to correct the questions in the data base (for reusability) and guidelines for new questions

Item discrimination

After the test, the discrimination value index of individual questions is calculated by dividing all students (x) into, so to speak, "worse" or the lower third (27%) and the "better" or the top third (27%) students.
Number of correct answers is counted for each question in the worse group (L) and in the better group (B), after which index is calculated by using the formula:

Discrimination = 2(B-L)/x

The bigger the number, the "better" the question. Questions over 0.35 are excellent. Questions between 0.35 and 0.25 are good, those between 0.25 and 0.15 are still acceptable but should be corrected for the next test, while questions below 0.1 should be excluded from the test and the test re-evaluated without them.
Another way of measuring discrimination index is by calculating the point byserial correlation coefficient. This actually calculates the correlation between the frequency of the right answer to a certain question and the total students' score. This index is better than simple discrimination ratio since it takes into consideration overall results of all students.

Response fequencies

If there are there distracters that do not mislead anybody at all, they should be changed before the next exam. On the other hand, "too strong" distracters should be checked to see whether they might be correct.

Test Reliability

Kuder Richardson 20 should be calculated. A good KR 20 test has reusability of 0.70. This is an examination of the uniformity of the tests which are used repeatedly. If the same student takes a high reusability (KR 20) test, they should always achieve the same result.

And finally good news and a valid reason why to use IT in written test analysis:
Good software packages for processing of written exams conduct the test analysis automatically or with one click of the mouse. It also automatically enters the results for each question into the question data base.