Stemler, S.E. (2004). A comparison of consensual, consistent and measurement approaches to estimate the reliability of the interrater. Mr. Pract. Assessment. Res. Eval. 9, 66-78. Van Noord, R.
G., and Prevatt, F. F. (2002). Agreement to miss on quality and performance tests: impact on the assessment of learning disabilities. J. School Psychol. 40, 167-176. doi: 10.1016/S0022-4405 (02)00091-2 Using reliability in the ELAN Manual (Bockmann and Gravel Skies, In 2006), and therefore with a more conservative estimate of reliability, the RCI was significantly lower, DiffT1 – Q2-1.96∗2 (102 (1×0.99)2-2,772, resulting in a critical difference of three T-points. The applied and discussed analytical approach serves as an example for the evaluation of ratings and rating instruments that apply to a wide range of development and behaviour characteristics. It assesses and documents differences and similarities between subgroups of advisors and evaluation subgroups, using a combination of statistical analysis. If future reports succeed in conveying the concepts of agreement, reliability and line correlation, and if the statistical approaches necessary to manage the various aspects are used appropriately, the research results will be better rewarded and, therefore, increased transparency.
Reproducibility refers to the degree to which repeated measurements yield similar results. Compliance parameters assess how close the results of repeated measurements are by assessing measurement error in repeated measurements. Reliability parameters determine whether study subjects, often individuals, can be distinguished despite measurement errors. In this case, the measurement error is related to the variability between people. Therefore, reliability parameters depend to a large extent on the heterogeneity of the sample, while compliance parameters based on measurement errors are rather a mere feature of the measurement instrument. Jonsson, A., and Svingby, G. (2007). The use of evaluation headings: reliability, validity and educational consequences. Edu. Res. Rev.
2, 130-144. doi: 10.1016/j.edurev.2007.05.002 If the research question concerns the distinction of persons, the reliability parameters are the most appropriate. However, if the objective is to measure the evolution of health status, which is often the case in clinical practice, the parameters of agreement are preferred. Pearson`s “R-Displaystyle,” Kendall format or Spearman`s “Displaystyle” can measure the pair correlation between advisors using an orderly scale. Pearson believes that the scale of evaluation is continuous; Kendall and Spearman`s statistics only assume it`s ordinal. If more than two clicks are observed, an average match level for the group can be calculated as the average value of the R-Displaystyle r values, or “Displaystyle” of any pair of debtors. Variations between advisors in measurement methods and variability in the interpretation of measurement results are two examples of sources of error variance in evaluation measures. Clear guidelines for reporting assessments are required for reliability in ambiguous or demanding measurement scenarios. This analysis contains information on the reliability of the ELAN interreferenters and also serves as the basis for one of the two calculations of the Reliable Variation Index (ROI) taking into account the characteristics of the specific study sample. There are several operational definitions of “inter-rated reliability” that reflect different views on what a reliable agreement between advisors is.  There are three operational definitions of the agreement: Bland, M.