Hyung Won Chung,Le Hou,S. Longpre,Barret Zoph,Yi Tay,W. Fedus,Eric Li,Xuezhi Wang,M. Dehghani,Siddhartha Brahma,Albert Webson,S. Gu,Zhuyun Dai,Mirac Suzgun,Xinyun Chen,Aakanksha Chowdhery,Dasha Valter,Sharan Narang,Gaurav Mishra,A. Yu,Vincent Zhao,Yanping Huang,Andrew M. Dai,Hongkun Yu,Slav Petrov,E. Chi,J. Dean,Jacob Devlin,Adam Roberts,Denny Zhou,Quoc Le,Jason Wei
This result shows that instruction and UL2 continued pre-training are complementary compute-efficient methods to improve the performance of language models without increasing model scale.