Aug 28, 2023 RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers