"despite being trained on 72b tokens of text"? bro wtf? FineWeb alone is 15T tokens of data
· Sign up or log in to comment