Datasets

Here we provide a comprehensive guide on how to prepare the datasets.

Suppose you want to store the reid data in a directory called “path/to/reid-data/”, you need to specify the root as root=’path/to/reid-data/’ when initializing DataManager. Below we use $REID to denote “path/to/reid-data”.

Please refer to torchreid.data for details regarding the arguments.

Note

Dataset with a \(\dagger\) symbol means that the process is automated, so you can directly call the dataset in DataManager (which automatically downloads the dataset and organizes the data structure). However, we also provide a way below to help the manual setup in case the automation fails.

Note

The keys to use specific datasets are enclosed in the parantheses beside the datasets’ names.

Note

You are suggested to use the provided names for dataset folders such as “market1501” for Market1501 and “dukemtmcreid” for DukeMTMC-reID when doing the manual setup, otherwise you need to modify the source code accordingly (i.e. the dataset_dir attribute).

Note

Some download links provided by the original authors might not work. You can email Kaiyang Zhou to reqeust new links. Please do provide your full name, institution, and purpose of using the data in the email (best use your work email address).

Image Datasets

Market1501 \(^\dagger\) (market1501)

market1501/
    Market-1501-v15.09.15/
        query/
        bounding_box_train/
        bounding_box_test/
  • To use the extra 500K distractors (i.e. Market1501 + 500K), go to the Market-1501+500k Dataset section at http://www.liangzheng.org/Project/project_reid.html, download the zip file “distractors_500k.zip” and extract it under “market1501/Market-1501-v15.09.15”. The argument to use these 500K distrctors is market1501_500k in ImageDataManager.

CUHK03 (cuhk03)

  • Create a folder named “cuhk03” under $REID.

  • Download the dataset to “cuhk03/” from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extract “cuhk03_release.zip”, resulting in “cuhk03/cuhk03_release/”.

  • Download the new split (767/700) from person-re-ranking. What you need are “cuhk03_new_protocol_config_detected.mat” and “cuhk03_new_protocol_config_labeled.mat”. Put these two mat files under “cuhk03/”.

  • The data structure should look like

cuhk03/
    cuhk03_release/
    cuhk03_new_protocol_config_detected.mat
    cuhk03_new_protocol_config_labeled.mat
  • In the default mode, we load data using the new split (767/700). If you wanna use the original (20) splits (1367/100), please set cuhk03_classic_split to True in ImageDataManager. As the CMC is computed differently from Market1501 for the 1367/100 split (see here), you need to enable use_metric_cuhk03 in ImageDataManager to activate the single-gallery-shot metric for fair comparison with some methods that adopt the old splits (do not need to report mAP). In addition, we support both labeled and detected modes. The default mode loads detected images. Enable cuhk03_labeled in ImageDataManager if you wanna train and test on labeled images.

Note

The code will extract images in “cuhk-03.mat” and save them under “cuhk03/images_detected” and “cuhk03/images_labeled”. Also, four json files will be automatically generated, i.e. “splits_classic_detected.json”, “splits_classic_labeled.json”, “splits_new_detected.json” and “splits_new_labeled.json”. If the parent path of $REID is changed, these json files should be manually deleted. The code can automatically generate new json files to match the new path.

DukeMTMC-reID \(^\dagger\) (dukemtmcreid)

  • Create a directory called “dukemtmc-reid” under $REID.

  • Download “DukeMTMC-reID” from http://vision.cs.duke.edu/DukeMTMC/ and extract it under “dukemtmc-reid”.

  • The data structure should look like

dukemtmc-reid/
    DukeMTMC-reID/
        query/
        bounding_box_train/
        bounding_box_test/
        ...

MSMT17 (msmt17)

msmt17/
    MSMT17_V1/ # or MSMT17_V2
        train/
        test/
        list_train.txt
        list_query.txt
        list_gallery.txt
        list_val.txt

VIPeR \(^\dagger\) (viper)

viper/
    VIPeR/
        cam_a/
        cam_b/

GRID \(^\dagger\) (grid)

grid/
    underground_reid/
        probe/
        gallery/
        ...

CUHK01 (cuhk01)

cuhk01/
    campus/

SenseReID (sensereid)

  • Create “sensereid” under $REID.

  • Download the dataset from this link and extract it to “sensereid”.

  • Organize the data to be like

sensereid/
    SenseReID/
        test_probe/
        test_gallery/

QMUL-iLIDS \(^\dagger\) (ilids)

ilids/
    i-LIDS_Pedestrian/
        Persons/

PRID (prid)

prid2011/
    prid_2011/
        single_shot/
        multi_shot/

CUHK02 (cuhk02)

cuhk02/
    Dataset/
        P1/
        P2/
        P3/
        P4/
        P5/

CUHKSYSU (cuhksysu)

  • Create a folder named “cuhksysu” under $REID.

  • Download the data to “cuhksysu/” from this google drive link.

  • Extract the zip file under “cuhksysu/”.

  • The data structure should look like

cuhksysu/
    cropped_images

Video Datasets

MARS (mars)

mars/
    bbox_test/
    bbox_train/
    info/

iLIDS-VID \(^\dagger\) (ilidsvid)

ilids-vid/
    i-LIDS-VID/
    train-test people splits/

PRID2011 (prid2011)

prid2011/
    splits_prid2011.json
    prid_2011/
        single_shot/
        multi_shot/

DukeMTMC-VideoReID \(^\dagger\) (dukemtmcvidreid)

  • Create “dukemtmc-vidreid” under $REID.

  • Download “DukeMTMC-VideoReID” from http://vision.cs.duke.edu/DukeMTMC/ and unzip the file to “dukemtmc-vidreid/”.

  • The data structure should look like

dukemtmc-vidreid/
    DukeMTMC-VideoReID/
        train/
        query/
        gallery/