October 3, 2023 - ICCV, Paris
Our workshop Towards the Next Generation of Computer Vision Datasets is taking place at ICCV 2023 in Paris.
The workshop will showcase a series of DataComp submissions, along with other data-centric papers and multiple invited talks by experts in the field.
The workshop will take place on October 3, 2023. The schedule is as follows:
- 9:00 AM - Opening Remarks
- 9:25 AM - Invited Talk - Olga Russakovsky
- 10:00 AM - Coffee Break
- 10:30 AM - Contributed Oral Presentations/Poster Session
- 12:00 PM - Lunch Break
- 1:30 PM - Invited Talk - Georgia Gkioxari
- 2:00 PM - Invited Talk - Dragomir Anguelov
- 2:30 PM - Contributed Oral Presentations/Poster Session
- 3:30 PM - Invited Talk - Joao Carreira
- 4:00 PM - Invited Talk - Tom Duerig
- 4:30 PM - Invited Talk - Swabha Swayamdipta
- 5:00 PM - Panel Discussion and Closing Remarks
Meta AI Research
Washington & AI2 & LAION
Why work on Datasets?
Despite the central role large image-text datasets play in multimodal learning, little is known about them. Many state-of-the-art datasets are proprietary and only available in corporate research labs, as in the case of CLIP, DALL-E, Flamingo, and GPT-4. But even for public datasets such as LAION-2B. it is unclear how design choices during dataset construction, such as the data source or filtering techniques, affect the resulting models. While there are thousands of ablation studies for algorithmic design choices (loss function, optimizer, model architecture, etc.), datasets are usually treated as monolithic artifacts without detailed investigation or further improvements. Moreover, datasets currently lack the benchmark-driven development process that has enabled the community to produce a steady stream of advances on the model side. The hope is that we can use this workshop and the DataComp benchmark as a way to drive community involvement in this space.
Call for DataComp Submission Papers
We invite researchers and practitioners to submit a short paper detailing their DataComp Submissions (for any track!) to the workshop. The top performing submissions will be invited to give a talk during the workshop. All accepted submissions will be given a chance to present a poster. Submitted papers should not exceed 4 pages (excluding references) and should follow the ICCV 2023 formatting guidelines. The submission deadline will be September 8th, 2023 AOE. Accepted papers will be presented at the workshop. The submission portal for workshop papers is here
Call for Other Data-Centric Papers
We also invite researchers and practitioners to submit papers related to the next generation of computer vision datasets. Topics of interest include but are not limited to:
- Novel data collection, curation, and annotation methodologies
- Dataset bias, fairness, and ethical considerations
- Dataset quality assessment and improvement
- Active learning and dataset curation
- Domain adaptation and transfer learning
Submitted papers should not exceed 4 pages (excluding references) and should follow the ICCV 2023 formatting guidelines. The submission deadline will be September 8th, 2023 AOE. Accepted papers will be presented at the workshop. The submission portal for workshop papers is here
Getting started with DataComp
You can find instructions to get started with DataComp at our github repository
You can get started by downloading our smallest (12.8M) data pool to your current directory with the following commands (note this pool is 450GB so the download can take a few hours)
git clone https://github.com/mlfoundations/datacomp.git
conda activate datacomp
python download_upstream.py --scale small --data_dir .