.. _datasets: Datasets ========= Here we provide a comprehensive guide on how to prepare the datasets. Suppose you want to store the reid data in a directory called "path/to/reid-data/", you need to specify the ``root`` as *root='path/to/reid-data/'* when initializing ``DataManager``. Below we use ``$REID`` to denote "path/to/reid-data". Please refer to :ref:`torchreid_data` for details regarding the arguments. .. note:: Dataset with a :math:`\dagger` symbol means that the process is automated, so you can directly call the dataset in ``DataManager`` (which automatically downloads the dataset and organizes the data structure). However, we also provide a way below to help the manual setup in case the automation fails. .. note:: The keys to use specific datasets are enclosed in the parantheses beside the datasets' names. .. note:: You are suggested to use the provided names for dataset folders such as "market1501" for Market1501 and "dukemtmcreid" for DukeMTMC-reID when doing the manual setup, otherwise you need to modify the source code accordingly (i.e. the ``dataset_dir`` attribute). .. note:: Some download links provided by the original authors might not work. You can email `Kaiyang Zhou `_ to reqeust new links. Please do provide your full name, institution, and purpose of using the data in the email (best use your work email address). .. contents:: :local: Image Datasets -------------- Market1501 :math:`^\dagger` (``market1501``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a directory named "market1501" under ``$REID``. - Download the dataset to "market1501" from http://www.liangzheng.org/Project/project_reid.html and extract the files. - The data structure should look like .. code-block:: none market1501/ Market-1501-v15.09.15/ query/ bounding_box_train/ bounding_box_test/ - To use the extra 500K distractors (i.e. Market1501 + 500K), go to the **Market-1501+500k Dataset** section at http://www.liangzheng.org/Project/project_reid.html, download the zip file "distractors_500k.zip" and extract it under "market1501/Market-1501-v15.09.15". The argument to use these 500K distrctors is ``market1501_500k`` in ``ImageDataManager``. CUHK03 (``cuhk03``) ^^^^^^^^^^^^^^^^^^^^^ - Create a folder named "cuhk03" under ``$REID``. - Download the dataset to "cuhk03/" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extract "cuhk03_release.zip", resulting in "cuhk03/cuhk03_release/". - Download the new split (767/700) from `person-re-ranking `_. What you need are "cuhk03_new_protocol_config_detected.mat" and "cuhk03_new_protocol_config_labeled.mat". Put these two mat files under "cuhk03/". - The data structure should look like .. code-block:: none cuhk03/ cuhk03_release/ cuhk03_new_protocol_config_detected.mat cuhk03_new_protocol_config_labeled.mat - In the default mode, we load data using the new split (767/700). If you wanna use the original (20) splits (1367/100), please set ``cuhk03_classic_split`` to True in ``ImageDataManager``. As the CMC is computed differently from Market1501 for the 1367/100 split (see `here `_), you need to enable ``use_metric_cuhk03`` in ``ImageDataManager`` to activate the *single-gallery-shot* metric for fair comparison with some methods that adopt the old splits (*do not need to report mAP*). In addition, we support both *labeled* and *detected* modes. The default mode loads *detected* images. Enable ``cuhk03_labeled`` in ``ImageDataManager`` if you wanna train and test on *labeled* images. .. note:: The code will extract images in "cuhk-03.mat" and save them under "cuhk03/images_detected" and "cuhk03/images_labeled". Also, four json files will be automatically generated, i.e. "splits_classic_detected.json", "splits_classic_labeled.json", "splits_new_detected.json" and "splits_new_labeled.json". If the parent path of ``$REID`` is changed, these json files should be manually deleted. The code can automatically generate new json files to match the new path. DukeMTMC-reID :math:`^\dagger` (``dukemtmcreid``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a directory called "dukemtmc-reid" under ``$REID``. - Download "DukeMTMC-reID" from http://vision.cs.duke.edu/DukeMTMC/ and extract it under "dukemtmc-reid". - The data structure should look like .. code-block:: none dukemtmc-reid/ DukeMTMC-reID/ query/ bounding_box_train/ bounding_box_test/ ... MSMT17 (``msmt17``) ^^^^^^^^^^^^^^^^^^^^^ - Create a directory called "msmt17" under ``$REID``. - Download the dataset from http://www.pkuvmc.com/publications/msmt17.html to "msmt17" and extract the files. - The data structure should look like .. code-block:: none msmt17/ MSMT17_V1/ # or MSMT17_V2 train/ test/ list_train.txt list_query.txt list_gallery.txt list_val.txt VIPeR :math:`^\dagger` (``viper``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - The download link is http://users.soe.ucsc.edu/~manduchi/VIPeR.v1.0.zip. - Organize the dataset in a folder named "viper" as follows .. code-block:: none viper/ VIPeR/ cam_a/ cam_b/ GRID :math:`^\dagger` (``grid``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - The download link is http://personal.ie.cuhk.edu.hk/~ccloy/files/datasets/underground_reid.zip. - Organize the dataset in a folder named "grid" as follows .. code-block:: none grid/ underground_reid/ probe/ gallery/ ... CUHK01 (``cuhk01``) ^^^^^^^^^^^^^^^^^^^^^^^^ - Create a folder named "cuhk01" under ``$REID``. - Download "CUHK01.zip" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and place it under "cuhk01/". - The code can automatically extract the files, or you can do it yourself. - The data structure should look like .. code-block:: none cuhk01/ campus/ SenseReID (``sensereid``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create "sensereid" under ``$REID``. - Download the dataset from this `link `_ and extract it to "sensereid". - Organize the data to be like .. code-block:: none sensereid/ SenseReID/ test_probe/ test_gallery/ QMUL-iLIDS :math:`^\dagger` (``ilids``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a folder named "ilids" under ``$REID``. - Download the dataset from http://www.eecs.qmul.ac.uk/~jason/data/i-LIDS_Pedestrian.tgz and organize it to look like .. code-block:: none ilids/ i-LIDS_Pedestrian/ Persons/ PRID (``prid``) ^^^^^^^^^^^^^^^^^^^ - Create a directory named "prid2011" under ``$REID``. - Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011". - The data structure should end up with .. code-block:: none prid2011/ prid_2011/ single_shot/ multi_shot/ CUHK02 (``cuhk02``) ^^^^^^^^^^^^^^^^^^^^^ - Create a folder named "cuhk02" under ``$REID``. - Download the data from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and put it under "cuhk02/". - Extract the file so the data structure looks like .. code-block:: none cuhk02/ Dataset/ P1/ P2/ P3/ P4/ P5/ CUHKSYSU (``cuhksysu``) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a folder named "cuhksysu" under ``$REID``. - Download the data to "cuhksysu/" from this `google drive link `_. - Extract the zip file under "cuhksysu/". - The data structure should look like .. code-block:: none cuhksysu/ cropped_images Video Datasets -------------- MARS (``mars``) ^^^^^^^^^^^^^^^^^ - Create "mars/" under ``$REID``. - Download the dataset from http://www.liangzheng.com.cn/Project/project_mars.html and place it in "mars/". - Extract "bbox_train.zip" and "bbox_test.zip". - Download the split metadata from https://github.com/liangzheng06/MARS-evaluation/tree/master/info and put "info/" in "mars/". - The data structure should end up with .. code-block:: none mars/ bbox_test/ bbox_train/ info/ iLIDS-VID :math:`^\dagger` (``ilidsvid``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create "ilids-vid" under ``$REID``. - Download the dataset from https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html to "ilids-vid". - Organize the data structure to match .. code-block:: none ilids-vid/ i-LIDS-VID/ train-test people splits/ PRID2011 (``prid2011``) ^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a directory named "prid2011" under ``$REID``. - Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011". - Download the split created by *iLIDS-VID* from `this google drive `_ and put it under "prid2011/". Following the standard protocol, only 178 persons whose sequences are more than a threshold are used. - The data structure should end up with .. code-block:: none prid2011/ splits_prid2011.json prid_2011/ single_shot/ multi_shot/ DukeMTMC-VideoReID :math:`^\dagger` (``dukemtmcvidreid``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create "dukemtmc-vidreid" under ``$REID``. - Download "DukeMTMC-VideoReID" from http://vision.cs.duke.edu/DukeMTMC/ and unzip the file to "dukemtmc-vidreid/". - The data structure should look like .. code-block:: none dukemtmc-vidreid/ DukeMTMC-VideoReID/ train/ query/ gallery/