The College English Test (CET) is an English language test designed for educational purposes, administered on a very large scale, and used for making high-stakes decisions. This paper discusses the key issues facing the CET during the course of its development in the past two decades. It argues that the most fundamental and critical concerns of large-scale high-stakes testing are test validity and fairness as defined in the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). The CET has a current annual test population of over 18 million, and the results of the test, intentionally or unintentionally, may affect university graduates`` employment opportunities, the conferment of a bachelor``s degree, and the granting of a residence permit in big cities. The CET test developer, therefore, has been taking measures to make sure that no test taker will be potentially disadvantaged by such factors as test content, test condition, response mode and format, scoring of constructed-response items, and use of test results. Considerable care has been given to the test``s validity as well as its operational standardization, which is critical to fairness in high-stakes testing. The paper begins with an overview of the major developmental stages of the CET since its inception in 1987 and the standardized procedures involved in the CET design, item construction, test administration, test form equation, scoring and score reporting. Following the introductory part, the paper discusses in turn the CET validation efforts in the late 1990s, major revisions of the test with a view to aligning its content and task format with the College English curriculum requirements, and the recent research on the validity of the newly developed internet-based CET, a central focus of which has been on possible biases against test takers who are less proficient in computer operation. Validity and fairness, however, cannot be exclusively addressed in psychometric and technical terms. The use of the test in a particular social context or with particular groups of test takers may be valid and fair or invalid and unfair. In the final part, the paper concludes with a brief discussion of the political dimension of high-stakes testing, with a specialfocus on Messick``s (1992) unified construct validity argument, which views validity not as a feature or a possession of a test, but a process to validate in a multifaceted approach the uses and interpretations of tests and their scores (Davies, 2003).