torchreid.data

Data Manager

class torchreid.data.datamanager.DataManager(sources=None, targets=None, height=256, width=128, transforms='random_flip', norm_mean=None, norm_std=None, use_gpu=False)[source]

Base data manager.

Parameters
  • sources (str or list) – source dataset(s).

  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.

  • height (int, optional) – target image height. Default is 256.

  • width (int, optional) – target image width. Default is 128.

  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.

  • norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).

  • norm_std (list or None, optional) – data std. Default is None (use imagenet std).

  • use_gpu (bool, optional) – use gpu. Default is True.

fetch_test_loaders(name)[source]

Returns query and gallery of a test dataset, each containing tuples of (img_path(s), pid, camid).

Parameters

name (str) – dataset name.

property num_train_cams

Returns the number of training cameras.

property num_train_pids

Returns the number of training person identities.

preprocess_pil_img(img)[source]

Transforms a PIL image to torch tensor for testing.

class torchreid.data.datamanager.ImageDataManager(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', k_tfm=1, norm_mean=None, norm_std=None, use_gpu=True, split_id=0, combineall=False, load_train_targets=False, batch_size_train=32, batch_size_test=32, workers=4, num_instances=4, num_cams=1, num_datasets=1, train_sampler='RandomSampler', train_sampler_t='RandomSampler', cuhk03_labeled=False, cuhk03_classic_split=False, market1501_500k=False)[source]

Image data manager.

Parameters
  • root (str) – root path to datasets.

  • sources (str or list) – source dataset(s).

  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.

  • height (int, optional) – target image height. Default is 256.

  • width (int, optional) – target image width. Default is 128.

  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.

  • k_tfm (int) – number of times to apply augmentation to an image independently. If k_tfm > 1, the transform function will be applied k_tfm times to an image. This variable will only be useful for training and is currently valid for image datasets only.

  • norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).

  • norm_std (list or None, optional) – data std. Default is None (use imagenet std).

  • use_gpu (bool, optional) – use gpu. Default is True.

  • split_id (int, optional) – split id (0-based). Default is 0.

  • combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.

  • load_train_targets (bool, optional) – construct train-loader for target datasets. Default is False. This is useful for domain adaptation research.

  • batch_size_train (int, optional) – number of images in a training batch. Default is 32.

  • batch_size_test (int, optional) – number of images in a test batch. Default is 32.

  • workers (int, optional) – number of workers. Default is 4.

  • num_instances (int, optional) – number of instances per identity in a batch. Default is 4.

  • num_cams (int, optional) – number of cameras to sample in a batch (when using RandomDomainSampler). Default is 1.

  • num_datasets (int, optional) – number of datasets to sample in a batch (when using RandomDatasetSampler). Default is 1.

  • train_sampler (str, optional) – sampler. Default is RandomSampler.

  • train_sampler_t (str, optional) – sampler for target train loader. Default is RandomSampler.

  • cuhk03_labeled (bool, optional) – use cuhk03 labeled images. Default is False (defaul is to use detected images).

  • cuhk03_classic_split (bool, optional) – use the classic split in cuhk03. Default is False.

  • market1501_500k (bool, optional) – add 500K distractors to the gallery set in market1501. Default is False.

Examples:

datamanager = torchreid.data.ImageDataManager(
    root='path/to/reid-data',
    sources='market1501',
    height=256,
    width=128,
    batch_size_train=32,
    batch_size_test=100
)

# return train loader of source data
train_loader = datamanager.train_loader

# return test loader of target data
test_loader = datamanager.test_loader

# return train loader of target data
train_loader_t = datamanager.train_loader_t
class torchreid.data.datamanager.VideoDataManager(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', norm_mean=None, norm_std=None, use_gpu=True, split_id=0, combineall=False, batch_size_train=3, batch_size_test=3, workers=4, num_instances=4, num_cams=1, num_datasets=1, train_sampler='RandomSampler', seq_len=15, sample_method='evenly')[source]

Video data manager.

Parameters
  • root (str) – root path to datasets.

  • sources (str or list) – source dataset(s).

  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.

  • height (int, optional) – target image height. Default is 256.

  • width (int, optional) – target image width. Default is 128.

  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.

  • norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).

  • norm_std (list or None, optional) – data std. Default is None (use imagenet std).

  • use_gpu (bool, optional) – use gpu. Default is True.

  • split_id (int, optional) – split id (0-based). Default is 0.

  • combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.

  • batch_size_train (int, optional) – number of tracklets in a training batch. Default is 3.

  • batch_size_test (int, optional) – number of tracklets in a test batch. Default is 3.

  • workers (int, optional) – number of workers. Default is 4.

  • num_instances (int, optional) – number of instances per identity in a batch. Default is 4.

  • num_cams (int, optional) – number of cameras to sample in a batch (when using RandomDomainSampler). Default is 1.

  • num_datasets (int, optional) – number of datasets to sample in a batch (when using RandomDatasetSampler). Default is 1.

  • train_sampler (str, optional) – sampler. Default is RandomSampler.

  • seq_len (int, optional) – how many images to sample in a tracklet. Default is 15.

  • sample_method (str, optional) – how to sample images in a tracklet. Default is “evenly”. Choices are [“evenly”, “random”, “all”]. “evenly” and “random” will sample seq_len images in a tracklet while “all” samples all images in a tracklet, where the batch size needs to be set to 1.

Examples:

datamanager = torchreid.data.VideoDataManager(
    root='path/to/reid-data',
    sources='mars',
    height=256,
    width=128,
    batch_size_train=3,
    batch_size_test=3,
    seq_len=15,
    sample_method='evenly'
)

# return train loader of source data
train_loader = datamanager.train_loader

# return test loader of target data
test_loader = datamanager.test_loader

Note

The current implementation only supports image-like training. Therefore, each image in a sampled tracklet will undergo independent transformation functions. To achieve tracklet-aware training, you need to modify the transformation functions for video reid such that each function applies the same operation to all images in a tracklet to keep consistency.

Sampler

class torchreid.data.sampler.RandomDatasetSampler(data_source, batch_size, n_dataset)[source]

Random dataset sampler.

How does the sampling work: 1. Randomly sample N datasets (based on the “dsetid” label). 2. From each dataset, randomly sample K images.

Parameters
  • data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).

  • batch_size (int) – batch size.

  • n_dataset (int) – number of datasets to sample in a batch.

class torchreid.data.sampler.RandomDomainSampler(data_source, batch_size, n_domain)[source]

Random domain sampler.

We consider each camera as a visual domain.

How does the sampling work: 1. Randomly sample N cameras (based on the “camid” label). 2. From each camera, randomly sample K images.

Parameters
  • data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).

  • batch_size (int) – batch size.

  • n_domain (int) – number of cameras to sample in a batch.

class torchreid.data.sampler.RandomIdentitySampler(data_source, batch_size, num_instances)[source]

Randomly samples N identities each with K instances.

Parameters
  • data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).

  • batch_size (int) – batch size.

  • num_instances (int) – number of instances per identity in a batch.

torchreid.data.sampler.build_train_sampler(data_source, train_sampler, batch_size=32, num_instances=4, num_cams=1, num_datasets=1, **kwargs)[source]

Builds a training sampler.

Parameters
  • data_source (list) – contains tuples of (img_path(s), pid, camid).

  • train_sampler (str) – sampler name (default: RandomSampler).

  • batch_size (int, optional) – batch size. Default is 32.

  • num_instances (int, optional) – number of instances per identity in a batch (when using RandomIdentitySampler). Default is 4.

  • num_cams (int, optional) – number of cameras to sample in a batch (when using RandomDomainSampler). Default is 1.

  • num_datasets (int, optional) – number of datasets to sample in a batch (when using RandomDatasetSampler). Default is 1.

Transforms

class torchreid.data.transforms.ColorAugmentation(p=0.5)[source]

Randomly alters the intensities of RGB channels.

Reference:

Krizhevsky et al. ImageNet Classification with Deep ConvolutionalNeural Networks. NIPS 2012.

Parameters

p (float, optional) – probability that this operation takes place. Default is 0.5.

class torchreid.data.transforms.Random2DTranslation(height, width, p=0.5, interpolation=2)[source]

Randomly translates the input image with a probability.

Specifically, given a predefined shape (height, width), the input is first resized with a factor of 1.125, leading to (height*1.125, width*1.125), then a random crop is performed. Such operation is done with a probability.

Parameters
  • height (int) – target image height.

  • width (int) – target image width.

  • p (float, optional) – probability that this operation takes place. Default is 0.5.

  • interpolation (int, optional) – desired interpolation. Default is PIL.Image.BILINEAR

class torchreid.data.transforms.RandomErasing(probability=0.5, sl=0.02, sh=0.4, r1=0.3, mean=[0.4914, 0.4822, 0.4465])[source]

Randomly erases an image patch.

Origin: https://github.com/zhunzhong07/Random-Erasing

Reference:

Zhong et al. Random Erasing Data Augmentation.

Parameters
  • probability (float, optional) – probability that this operation takes place. Default is 0.5.

  • sl (float, optional) – min erasing area.

  • sh (float, optional) – max erasing area.

  • r1 (float, optional) – min aspect ratio.

  • mean (list, optional) – erasing value.

class torchreid.data.transforms.RandomPatch(prob_happen=0.5, pool_capacity=50000, min_sample_size=100, patch_min_area=0.01, patch_max_area=0.5, patch_min_ratio=0.1, prob_rotate=0.5, prob_flip_leftright=0.5)[source]

Random patch data augmentation.

There is a patch pool that stores randomly extracted pathces from person images.

For each input image, RandomPatch
  1. extracts a random patch and stores the patch in the patch pool;

  2. randomly selects a patch from the patch pool and pastes it on the input (at random position) to simulate occlusion.

Reference:
  • Zhou et al. Omni-Scale Feature Learning for Person Re-Identification. ICCV, 2019.

  • Zhou et al. Learning Generalisable Omni-Scale Representations for Person Re-Identification. TPAMI, 2021.

torchreid.data.transforms.build_transforms(height, width, transforms='random_flip', norm_mean=[0.485, 0.456, 0.406], norm_std=[0.229, 0.224, 0.225], **kwargs)[source]

Builds train and test transform functions.

Parameters
  • height (int) – target image height.

  • width (int) – target image width.

  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.

  • norm_mean (list or None, optional) – normalization mean values. Default is ImageNet means.

  • norm_std (list or None, optional) – normalization standard deviation values. Default is ImageNet standard deviation values.

Dataset

class torchreid.data.datasets.dataset.Dataset(train, query, gallery, transform=None, k_tfm=1, mode='train', combineall=False, verbose=True, **kwargs)[source]

An abstract class representing a Dataset.

This is the base class for ImageDataset and VideoDataset.

Parameters
  • train (list) – contains tuples of (img_path(s), pid, camid).

  • query (list) – contains tuples of (img_path(s), pid, camid).

  • gallery (list) – contains tuples of (img_path(s), pid, camid).

  • transform – transform function.

  • k_tfm (int) – number of times to apply augmentation to an image independently. If k_tfm > 1, the transform function will be applied k_tfm times to an image. This variable will only be useful for training and is currently valid for image datasets only.

  • mode (str) – ‘train’, ‘query’ or ‘gallery’.

  • combineall (bool) – combines train, query and gallery in a dataset for training.

  • verbose (bool) – show information.

check_before_run(required_files)[source]

Checks if required files exist before going deeper.

Parameters

required_files (str or list) – string file name(s).

combine_all()[source]

Combines train, query and gallery in a dataset for training.

download_dataset(dataset_dir, dataset_url)[source]

Downloads and extracts dataset.

Parameters
  • dataset_dir (str) – dataset directory.

  • dataset_url (str) – url to download dataset.

get_num_cams(data)[source]

Returns the number of training cameras.

Each tuple in data contains (img_path(s), pid, camid, dsetid).

get_num_datasets(data)[source]

Returns the number of datasets included.

Each tuple in data contains (img_path(s), pid, camid, dsetid).

get_num_pids(data)[source]

Returns the number of training person identities.

Each tuple in data contains (img_path(s), pid, camid, dsetid).

show_summary()[source]

Shows dataset statistics.

class torchreid.data.datasets.dataset.ImageDataset(train, query, gallery, **kwargs)[source]

A base class representing ImageDataset.

All other image datasets should subclass it.

__getitem__ returns an image given index. It will return img, pid, camid and img_path where img has shape (channel, height, width). As a result, data in each batch has shape (batch_size, channel, height, width).

show_summary()[source]

Shows dataset statistics.

class torchreid.data.datasets.dataset.VideoDataset(train, query, gallery, seq_len=15, sample_method='evenly', **kwargs)[source]

A base class representing VideoDataset.

All other video datasets should subclass it.

__getitem__ returns an image given index. It will return imgs, pid and camid where imgs has shape (seq_len, channel, height, width). As a result, data in each batch has shape (batch_size, seq_len, channel, height, width).

show_summary()[source]

Shows dataset statistics.

torchreid.data.datasets.__init__.init_image_dataset(name, **kwargs)[source]

Initializes an image dataset.

torchreid.data.datasets.__init__.init_video_dataset(name, **kwargs)[source]

Initializes a video dataset.

torchreid.data.datasets.__init__.register_image_dataset(name, dataset)[source]

Registers a new image dataset.

Parameters
  • name (str) – key corresponding to the new dataset.

  • dataset (Dataset) – the new dataset class.

Examples:

import torchreid
import NewDataset
torchreid.data.register_image_dataset('new_dataset', NewDataset)
# single dataset case
datamanager = torchreid.data.ImageDataManager(
    root='reid-data',
    sources='new_dataset'
)
# multiple dataset case
datamanager = torchreid.data.ImageDataManager(
    root='reid-data',
    sources=['new_dataset', 'dukemtmcreid']
)
torchreid.data.datasets.__init__.register_video_dataset(name, dataset)[source]

Registers a new video dataset.

Parameters
  • name (str) – key corresponding to the new dataset.

  • dataset (Dataset) – the new dataset class.

Examples:

import torchreid
import NewDataset
torchreid.data.register_video_dataset('new_dataset', NewDataset)
# single dataset case
datamanager = torchreid.data.VideoDataManager(
    root='reid-data',
    sources='new_dataset'
)
# multiple dataset case
datamanager = torchreid.data.VideoDataManager(
    root='reid-data',
    sources=['new_dataset', 'ilidsvid']
)

Image Datasets

class torchreid.data.datasets.image.market1501.Market1501(root='', market1501_500k=False, **kwargs)[source]

Market1501.

Reference:

Zheng et al. Scalable Person Re-identification: A Benchmark. ICCV 2015.

URL: http://www.liangzheng.org/Project/project_reid.html

Dataset statistics:
  • identities: 1501 (+1 for background).

  • images: 12936 (train) + 3368 (query) + 15913 (gallery).

class torchreid.data.datasets.image.cuhk03.CUHK03(root='', split_id=0, cuhk03_labeled=False, cuhk03_classic_split=False, **kwargs)[source]

CUHK03.

Reference:

Li et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. CVPR 2014.

URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html#!

Dataset statistics:
  • identities: 1360.

  • images: 13164.

  • cameras: 6.

  • splits: 20 (classic).

class torchreid.data.datasets.image.dukemtmcreid.DukeMTMCreID(root='', **kwargs)[source]

DukeMTMC-reID.

Reference:
  • Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.

  • Zheng et al. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ICCV 2017.

URL: https://github.com/layumi/DukeMTMC-reID_evaluation

Dataset statistics:
  • identities: 1404 (train + query).

  • images:16522 (train) + 2228 (query) + 17661 (gallery).

  • cameras: 8.

class torchreid.data.datasets.image.msmt17.MSMT17(root='', **kwargs)[source]

MSMT17.

Reference:

Wei et al. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. CVPR 2018.

URL: http://www.pkuvmc.com/publications/msmt17.html

Dataset statistics:
  • identities: 4101.

  • images: 32621 (train) + 11659 (query) + 82161 (gallery).

  • cameras: 15.

class torchreid.data.datasets.image.viper.VIPeR(root='', split_id=0, **kwargs)[source]

VIPeR.

Reference:

Gray et al. Evaluating appearance models for recognition, reacquisition, and tracking. PETS 2007.

URL: https://vision.soe.ucsc.edu/node/178

Dataset statistics:
  • identities: 632.

  • images: 632 x 2 = 1264.

  • cameras: 2.

class torchreid.data.datasets.image.grid.GRID(root='', split_id=0, **kwargs)[source]

GRID.

Reference:

Loy et al. Multi-camera activity correlation analysis. CVPR 2009.

URL: http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html

Dataset statistics:
  • identities: 250.

  • images: 1275.

  • cameras: 8.

class torchreid.data.datasets.image.cuhk01.CUHK01(root='', split_id=0, **kwargs)[source]

CUHK01.

Reference:

Li et al. Human Reidentification with Transferred Metric Learning. ACCV 2012.

URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html

Dataset statistics:
  • identities: 971.

  • images: 3884.

  • cameras: 4.

Note: CUHK01 and CUHK02 overlap.

prepare_split()[source]

Image name format: 0001001.png, where first four digits represent identity and last four digits represent cameras. Camera 1&2 are considered the same view and camera 3&4 are considered the same view.

class torchreid.data.datasets.image.ilids.iLIDS(root='', split_id=0, **kwargs)[source]

QMUL-iLIDS.

Reference:

Zheng et al. Associating Groups of People. BMVC 2009.

Dataset statistics:
  • identities: 119.

  • images: 476.

  • cameras: 8 (not explicitly provided).

class torchreid.data.datasets.image.sensereid.SenseReID(root='', **kwargs)[source]

SenseReID.

This dataset is used for test purpose only.

Reference:

Zhao et al. Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion. CVPR 2017.

URL: https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view

Dataset statistics:
  • query: 522 ids, 1040 images.

  • gallery: 1717 ids, 3388 images.

class torchreid.data.datasets.image.prid.PRID(single-shot version of prid-2011)[source]
Reference:

Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.

URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

Dataset statistics:
  • Two views.

  • View A captures 385 identities.

  • View B captures 749 identities.

  • 200 identities appear in both views (index starts from 1 to 200).

Video Datasets

class torchreid.data.datasets.video.mars.Mars(root='', **kwargs)[source]

MARS.

Reference:

Zheng et al. MARS: A Video Benchmark for Large-Scale Person Re-identification. ECCV 2016.

URL: http://www.liangzheng.com.cn/Project/project_mars.html

Dataset statistics:
  • identities: 1261.

  • tracklets: 8298 (train) + 1980 (query) + 9330 (gallery).

  • cameras: 6.

combine_all()[source]

Combines train, query and gallery in a dataset for training.

class torchreid.data.datasets.video.ilidsvid.iLIDSVID(root='', split_id=0, **kwargs)[source]

iLIDS-VID.

Reference:

Wang et al. Person Re-Identification by Video Ranking. ECCV 2014.

URL: http://www.eecs.qmul.ac.uk/~xiatian/downloads_qmul_iLIDS-VID_ReID_dataset.html

Dataset statistics:
  • identities: 300.

  • tracklets: 600.

  • cameras: 2.

class torchreid.data.datasets.video.prid2011.PRID2011(root='', split_id=0, **kwargs)[source]

PRID2011.

Reference:

Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.

URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

Dataset statistics:
  • identities: 200.

  • tracklets: 400.

  • cameras: 2.

class torchreid.data.datasets.video.dukemtmcvidreid.DukeMTMCVidReID(root='', min_seq_len=0, **kwargs)[source]

DukeMTMCVidReID.

Reference:
  • Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.

  • Wu et al. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. CVPR 2018.

URL: https://github.com/Yu-Wu/DukeMTMC-VideoReID

Dataset statistics:
  • identities: 702 (train) + 702 (test).

  • tracklets: 2196 (train) + 2636 (test).