may 11, 2021 ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision