This study explores differences of caption types and speech rates on learners’ cognitive load, learning flow and academic achievement in English instructional video learning. 231 students from A University in China participated in the experiment. Six types of videos with different caption types(no caption, L2 captions, L1+L2 captions) and speech rates(slow speed, fast speed) were developed. Students’ prior knowledge level of the teaching content was examined before the experiment, and their cognitive load, learning flow and academic achievement were measured after video learning. The collected data were analyzed through multivariate analysis of variance. According to the results, firstly, students who studied the videos with L1+L2 captions had the lowest cognitive load, the highest learning flow and academic achievement. Secondly, students who studied the videos at low speech rate had lower cognitive load, higher learning flow and academic achievement. Thirdly, students who studied videos with L1+L2 captions and the slow speech rate had the lowest cognitive load, the highest learning flow and academic achievement. These findings suggest that if the main teaching objective is to understand the video content, it is preferable to provide bilingual captions. Besides, slow speech rate can lead to better learning effect of EFL learners.