abr 13, 2020 XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization