This study aims to examine the reliability and consistency of raters of Korean speaking tests. This basic research can identify the type of education that can enhance the reliability of rating in speaking tests, which are mainly assessed using subjective assessment criteria. In this study, we intended to study the rating tendencies, reliability, and consistency of raters by conducting two separate experiments under the same conditions and separated by a certain interval. As a result of the analysis using the FACETS program based on the Many-Facets Rasch Measurement model, the second rating saw an overall improvement in inter-rater reliability; however, raters’ consistency varied regardless of their career experience in the field of Korean language education. It is necessary to train raters to improve assessment reliability, and these study results confirmed that individualized training that can be customized for each rater’s personality or characteristics is needed. In addition, this could be an alternative to training raters if the intention is to improve self-consistency through self-observation by using scientific tools that can measure the rater’s reliability and consistency.