torchreid.data¶
Data Manager¶
-
class
torchreid.data.datamanager.
DataManager
(sources=None, targets=None, height=256, width=128, transforms='random_flip', norm_mean=None, norm_std=None, use_gpu=False)[source]¶ Base data manager.
- Parameters
sources (str or list) – source dataset(s).
targets (str or list, optional) – target dataset(s). If not given, it equals to
sources
.height (int, optional) – target image height. Default is 256.
width (int, optional) – target image width. Default is 128.
transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).
norm_std (list or None, optional) – data std. Default is None (use imagenet std).
use_gpu (bool, optional) – use gpu. Default is True.
-
fetch_test_loaders
(name)[source]¶ Returns query and gallery of a test dataset, each containing tuples of (img_path(s), pid, camid).
- Parameters
name (str) – dataset name.
-
property
num_train_cams
¶ Returns the number of training cameras.
-
property
num_train_pids
¶ Returns the number of training person identities.
-
class
torchreid.data.datamanager.
ImageDataManager
(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', k_tfm=1, norm_mean=None, norm_std=None, use_gpu=True, split_id=0, combineall=False, load_train_targets=False, batch_size_train=32, batch_size_test=32, workers=4, num_instances=4, num_cams=1, num_datasets=1, train_sampler='RandomSampler', train_sampler_t='RandomSampler', cuhk03_labeled=False, cuhk03_classic_split=False, market1501_500k=False)[source]¶ Image data manager.
- Parameters
root (str) – root path to datasets.
sources (str or list) – source dataset(s).
targets (str or list, optional) – target dataset(s). If not given, it equals to
sources
.height (int, optional) – target image height. Default is 256.
width (int, optional) – target image width. Default is 128.
transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
k_tfm (int) – number of times to apply augmentation to an image independently. If k_tfm > 1, the transform function will be applied k_tfm times to an image. This variable will only be useful for training and is currently valid for image datasets only.
norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).
norm_std (list or None, optional) – data std. Default is None (use imagenet std).
use_gpu (bool, optional) – use gpu. Default is True.
split_id (int, optional) – split id (0-based). Default is 0.
combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.
load_train_targets (bool, optional) – construct train-loader for target datasets. Default is False. This is useful for domain adaptation research.
batch_size_train (int, optional) – number of images in a training batch. Default is 32.
batch_size_test (int, optional) – number of images in a test batch. Default is 32.
workers (int, optional) – number of workers. Default is 4.
num_instances (int, optional) – number of instances per identity in a batch. Default is 4.
num_cams (int, optional) – number of cameras to sample in a batch (when using
RandomDomainSampler
). Default is 1.num_datasets (int, optional) – number of datasets to sample in a batch (when using
RandomDatasetSampler
). Default is 1.train_sampler (str, optional) – sampler. Default is RandomSampler.
train_sampler_t (str, optional) – sampler for target train loader. Default is RandomSampler.
cuhk03_labeled (bool, optional) – use cuhk03 labeled images. Default is False (defaul is to use detected images).
cuhk03_classic_split (bool, optional) – use the classic split in cuhk03. Default is False.
market1501_500k (bool, optional) – add 500K distractors to the gallery set in market1501. Default is False.
Examples:
datamanager = torchreid.data.ImageDataManager( root='path/to/reid-data', sources='market1501', height=256, width=128, batch_size_train=32, batch_size_test=100 ) # return train loader of source data train_loader = datamanager.train_loader # return test loader of target data test_loader = datamanager.test_loader # return train loader of target data train_loader_t = datamanager.train_loader_t
-
class
torchreid.data.datamanager.
VideoDataManager
(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', norm_mean=None, norm_std=None, use_gpu=True, split_id=0, combineall=False, batch_size_train=3, batch_size_test=3, workers=4, num_instances=4, num_cams=1, num_datasets=1, train_sampler='RandomSampler', seq_len=15, sample_method='evenly')[source]¶ Video data manager.
- Parameters
root (str) – root path to datasets.
sources (str or list) – source dataset(s).
targets (str or list, optional) – target dataset(s). If not given, it equals to
sources
.height (int, optional) – target image height. Default is 256.
width (int, optional) – target image width. Default is 128.
transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
norm_mean (list or None, optional) – data mean. Default is None (use imagenet mean).
norm_std (list or None, optional) – data std. Default is None (use imagenet std).
use_gpu (bool, optional) – use gpu. Default is True.
split_id (int, optional) – split id (0-based). Default is 0.
combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.
batch_size_train (int, optional) – number of tracklets in a training batch. Default is 3.
batch_size_test (int, optional) – number of tracklets in a test batch. Default is 3.
workers (int, optional) – number of workers. Default is 4.
num_instances (int, optional) – number of instances per identity in a batch. Default is 4.
num_cams (int, optional) – number of cameras to sample in a batch (when using
RandomDomainSampler
). Default is 1.num_datasets (int, optional) – number of datasets to sample in a batch (when using
RandomDatasetSampler
). Default is 1.train_sampler (str, optional) – sampler. Default is RandomSampler.
seq_len (int, optional) – how many images to sample in a tracklet. Default is 15.
sample_method (str, optional) – how to sample images in a tracklet. Default is “evenly”. Choices are [“evenly”, “random”, “all”]. “evenly” and “random” will sample
seq_len
images in a tracklet while “all” samples all images in a tracklet, where the batch size needs to be set to 1.
Examples:
datamanager = torchreid.data.VideoDataManager( root='path/to/reid-data', sources='mars', height=256, width=128, batch_size_train=3, batch_size_test=3, seq_len=15, sample_method='evenly' ) # return train loader of source data train_loader = datamanager.train_loader # return test loader of target data test_loader = datamanager.test_loader
Note
The current implementation only supports image-like training. Therefore, each image in a sampled tracklet will undergo independent transformation functions. To achieve tracklet-aware training, you need to modify the transformation functions for video reid such that each function applies the same operation to all images in a tracklet to keep consistency.
Sampler¶
-
class
torchreid.data.sampler.
RandomDatasetSampler
(data_source, batch_size, n_dataset)[source]¶ Random dataset sampler.
How does the sampling work: 1. Randomly sample N datasets (based on the “dsetid” label). 2. From each dataset, randomly sample K images.
- Parameters
data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).
batch_size (int) – batch size.
n_dataset (int) – number of datasets to sample in a batch.
-
class
torchreid.data.sampler.
RandomDomainSampler
(data_source, batch_size, n_domain)[source]¶ Random domain sampler.
We consider each camera as a visual domain.
How does the sampling work: 1. Randomly sample N cameras (based on the “camid” label). 2. From each camera, randomly sample K images.
- Parameters
data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).
batch_size (int) – batch size.
n_domain (int) – number of cameras to sample in a batch.
-
class
torchreid.data.sampler.
RandomIdentitySampler
(data_source, batch_size, num_instances)[source]¶ Randomly samples N identities each with K instances.
- Parameters
data_source (list) – contains tuples of (img_path(s), pid, camid, dsetid).
batch_size (int) – batch size.
num_instances (int) – number of instances per identity in a batch.
-
torchreid.data.sampler.
build_train_sampler
(data_source, train_sampler, batch_size=32, num_instances=4, num_cams=1, num_datasets=1, **kwargs)[source]¶ Builds a training sampler.
- Parameters
data_source (list) – contains tuples of (img_path(s), pid, camid).
train_sampler (str) – sampler name (default:
RandomSampler
).batch_size (int, optional) – batch size. Default is 32.
num_instances (int, optional) – number of instances per identity in a batch (when using
RandomIdentitySampler
). Default is 4.num_cams (int, optional) – number of cameras to sample in a batch (when using
RandomDomainSampler
). Default is 1.num_datasets (int, optional) – number of datasets to sample in a batch (when using
RandomDatasetSampler
). Default is 1.
Transforms¶
-
class
torchreid.data.transforms.
ColorAugmentation
(p=0.5)[source]¶ Randomly alters the intensities of RGB channels.
- Reference:
Krizhevsky et al. ImageNet Classification with Deep ConvolutionalNeural Networks. NIPS 2012.
- Parameters
p (float, optional) – probability that this operation takes place. Default is 0.5.
-
class
torchreid.data.transforms.
Random2DTranslation
(height, width, p=0.5, interpolation=2)[source]¶ Randomly translates the input image with a probability.
Specifically, given a predefined shape (height, width), the input is first resized with a factor of 1.125, leading to (height*1.125, width*1.125), then a random crop is performed. Such operation is done with a probability.
- Parameters
height (int) – target image height.
width (int) – target image width.
p (float, optional) – probability that this operation takes place. Default is 0.5.
interpolation (int, optional) – desired interpolation. Default is
PIL.Image.BILINEAR
-
class
torchreid.data.transforms.
RandomErasing
(probability=0.5, sl=0.02, sh=0.4, r1=0.3, mean=[0.4914, 0.4822, 0.4465])[source]¶ Randomly erases an image patch.
Origin: https://github.com/zhunzhong07/Random-Erasing
- Reference:
Zhong et al. Random Erasing Data Augmentation.
- Parameters
probability (float, optional) – probability that this operation takes place. Default is 0.5.
sl (float, optional) – min erasing area.
sh (float, optional) – max erasing area.
r1 (float, optional) – min aspect ratio.
mean (list, optional) – erasing value.
-
class
torchreid.data.transforms.
RandomPatch
(prob_happen=0.5, pool_capacity=50000, min_sample_size=100, patch_min_area=0.01, patch_max_area=0.5, patch_min_ratio=0.1, prob_rotate=0.5, prob_flip_leftright=0.5)[source]¶ Random patch data augmentation.
There is a patch pool that stores randomly extracted pathces from person images.
- For each input image, RandomPatch
extracts a random patch and stores the patch in the patch pool;
randomly selects a patch from the patch pool and pastes it on the input (at random position) to simulate occlusion.
- Reference:
Zhou et al. Omni-Scale Feature Learning for Person Re-Identification. ICCV, 2019.
Zhou et al. Learning Generalisable Omni-Scale Representations for Person Re-Identification. TPAMI, 2021.
-
torchreid.data.transforms.
build_transforms
(height, width, transforms='random_flip', norm_mean=[0.485, 0.456, 0.406], norm_std=[0.229, 0.224, 0.225], **kwargs)[source]¶ Builds train and test transform functions.
- Parameters
height (int) – target image height.
width (int) – target image width.
transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
norm_mean (list or None, optional) – normalization mean values. Default is ImageNet means.
norm_std (list or None, optional) – normalization standard deviation values. Default is ImageNet standard deviation values.
Dataset¶
-
class
torchreid.data.datasets.dataset.
Dataset
(train, query, gallery, transform=None, k_tfm=1, mode='train', combineall=False, verbose=True, **kwargs)[source]¶ An abstract class representing a Dataset.
This is the base class for
ImageDataset
andVideoDataset
.- Parameters
train (list) – contains tuples of (img_path(s), pid, camid).
query (list) – contains tuples of (img_path(s), pid, camid).
gallery (list) – contains tuples of (img_path(s), pid, camid).
transform – transform function.
k_tfm (int) – number of times to apply augmentation to an image independently. If k_tfm > 1, the transform function will be applied k_tfm times to an image. This variable will only be useful for training and is currently valid for image datasets only.
mode (str) – ‘train’, ‘query’ or ‘gallery’.
combineall (bool) – combines train, query and gallery in a dataset for training.
verbose (bool) – show information.
-
check_before_run
(required_files)[source]¶ Checks if required files exist before going deeper.
- Parameters
required_files (str or list) – string file name(s).
-
download_dataset
(dataset_dir, dataset_url)[source]¶ Downloads and extracts dataset.
- Parameters
dataset_dir (str) – dataset directory.
dataset_url (str) – url to download dataset.
-
get_num_cams
(data)[source]¶ Returns the number of training cameras.
Each tuple in data contains (img_path(s), pid, camid, dsetid).
-
get_num_datasets
(data)[source]¶ Returns the number of datasets included.
Each tuple in data contains (img_path(s), pid, camid, dsetid).
-
class
torchreid.data.datasets.dataset.
ImageDataset
(train, query, gallery, **kwargs)[source]¶ A base class representing ImageDataset.
All other image datasets should subclass it.
__getitem__
returns an image given index. It will returnimg
,pid
,camid
andimg_path
whereimg
has shape (channel, height, width). As a result, data in each batch has shape (batch_size, channel, height, width).
-
class
torchreid.data.datasets.dataset.
VideoDataset
(train, query, gallery, seq_len=15, sample_method='evenly', **kwargs)[source]¶ A base class representing VideoDataset.
All other video datasets should subclass it.
__getitem__
returns an image given index. It will returnimgs
,pid
andcamid
whereimgs
has shape (seq_len, channel, height, width). As a result, data in each batch has shape (batch_size, seq_len, channel, height, width).
-
torchreid.data.datasets.__init__.
init_image_dataset
(name, **kwargs)[source]¶ Initializes an image dataset.
-
torchreid.data.datasets.__init__.
init_video_dataset
(name, **kwargs)[source]¶ Initializes a video dataset.
-
torchreid.data.datasets.__init__.
register_image_dataset
(name, dataset)[source]¶ Registers a new image dataset.
- Parameters
name (str) – key corresponding to the new dataset.
dataset (Dataset) – the new dataset class.
Examples:
import torchreid import NewDataset torchreid.data.register_image_dataset('new_dataset', NewDataset) # single dataset case datamanager = torchreid.data.ImageDataManager( root='reid-data', sources='new_dataset' ) # multiple dataset case datamanager = torchreid.data.ImageDataManager( root='reid-data', sources=['new_dataset', 'dukemtmcreid'] )
-
torchreid.data.datasets.__init__.
register_video_dataset
(name, dataset)[source]¶ Registers a new video dataset.
- Parameters
name (str) – key corresponding to the new dataset.
dataset (Dataset) – the new dataset class.
Examples:
import torchreid import NewDataset torchreid.data.register_video_dataset('new_dataset', NewDataset) # single dataset case datamanager = torchreid.data.VideoDataManager( root='reid-data', sources='new_dataset' ) # multiple dataset case datamanager = torchreid.data.VideoDataManager( root='reid-data', sources=['new_dataset', 'ilidsvid'] )
Image Datasets¶
-
class
torchreid.data.datasets.image.market1501.
Market1501
(root='', market1501_500k=False, **kwargs)[source]¶ Market1501.
- Reference:
Zheng et al. Scalable Person Re-identification: A Benchmark. ICCV 2015.
URL: http://www.liangzheng.org/Project/project_reid.html
- Dataset statistics:
identities: 1501 (+1 for background).
images: 12936 (train) + 3368 (query) + 15913 (gallery).
-
class
torchreid.data.datasets.image.cuhk03.
CUHK03
(root='', split_id=0, cuhk03_labeled=False, cuhk03_classic_split=False, **kwargs)[source]¶ CUHK03.
- Reference:
Li et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. CVPR 2014.
URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html#!
- Dataset statistics:
identities: 1360.
images: 13164.
cameras: 6.
splits: 20 (classic).
-
class
torchreid.data.datasets.image.dukemtmcreid.
DukeMTMCreID
(root='', **kwargs)[source]¶ DukeMTMC-reID.
- Reference:
Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.
Zheng et al. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ICCV 2017.
URL: https://github.com/layumi/DukeMTMC-reID_evaluation
- Dataset statistics:
identities: 1404 (train + query).
images:16522 (train) + 2228 (query) + 17661 (gallery).
cameras: 8.
-
class
torchreid.data.datasets.image.msmt17.
MSMT17
(root='', **kwargs)[source]¶ MSMT17.
- Reference:
Wei et al. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. CVPR 2018.
URL: http://www.pkuvmc.com/publications/msmt17.html
- Dataset statistics:
identities: 4101.
images: 32621 (train) + 11659 (query) + 82161 (gallery).
cameras: 15.
-
class
torchreid.data.datasets.image.viper.
VIPeR
(root='', split_id=0, **kwargs)[source]¶ VIPeR.
- Reference:
Gray et al. Evaluating appearance models for recognition, reacquisition, and tracking. PETS 2007.
URL: https://vision.soe.ucsc.edu/node/178
- Dataset statistics:
identities: 632.
images: 632 x 2 = 1264.
cameras: 2.
-
class
torchreid.data.datasets.image.grid.
GRID
(root='', split_id=0, **kwargs)[source]¶ GRID.
- Reference:
Loy et al. Multi-camera activity correlation analysis. CVPR 2009.
URL: http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html
- Dataset statistics:
identities: 250.
images: 1275.
cameras: 8.
-
class
torchreid.data.datasets.image.cuhk01.
CUHK01
(root='', split_id=0, **kwargs)[source]¶ CUHK01.
- Reference:
Li et al. Human Reidentification with Transferred Metric Learning. ACCV 2012.
URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html
- Dataset statistics:
identities: 971.
images: 3884.
cameras: 4.
Note: CUHK01 and CUHK02 overlap.
-
class
torchreid.data.datasets.image.ilids.
iLIDS
(root='', split_id=0, **kwargs)[source]¶ QMUL-iLIDS.
- Reference:
Zheng et al. Associating Groups of People. BMVC 2009.
- Dataset statistics:
identities: 119.
images: 476.
cameras: 8 (not explicitly provided).
-
class
torchreid.data.datasets.image.sensereid.
SenseReID
(root='', **kwargs)[source]¶ SenseReID.
This dataset is used for test purpose only.
- Reference:
Zhao et al. Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion. CVPR 2017.
URL: https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view
- Dataset statistics:
query: 522 ids, 1040 images.
gallery: 1717 ids, 3388 images.
-
class
torchreid.data.datasets.image.prid.
PRID
(single-shot version of prid-2011)[source]¶ - Reference:
Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.
URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/
- Dataset statistics:
Two views.
View A captures 385 identities.
View B captures 749 identities.
200 identities appear in both views (index starts from 1 to 200).
Video Datasets¶
-
class
torchreid.data.datasets.video.mars.
Mars
(root='', **kwargs)[source]¶ MARS.
- Reference:
Zheng et al. MARS: A Video Benchmark for Large-Scale Person Re-identification. ECCV 2016.
URL: http://www.liangzheng.com.cn/Project/project_mars.html
- Dataset statistics:
identities: 1261.
tracklets: 8298 (train) + 1980 (query) + 9330 (gallery).
cameras: 6.
-
class
torchreid.data.datasets.video.ilidsvid.
iLIDSVID
(root='', split_id=0, **kwargs)[source]¶ iLIDS-VID.
- Reference:
Wang et al. Person Re-Identification by Video Ranking. ECCV 2014.
URL: http://www.eecs.qmul.ac.uk/~xiatian/downloads_qmul_iLIDS-VID_ReID_dataset.html
- Dataset statistics:
identities: 300.
tracklets: 600.
cameras: 2.
-
class
torchreid.data.datasets.video.prid2011.
PRID2011
(root='', split_id=0, **kwargs)[source]¶ PRID2011.
- Reference:
Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.
URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/
- Dataset statistics:
identities: 200.
tracklets: 400.
cameras: 2.
-
class
torchreid.data.datasets.video.dukemtmcvidreid.
DukeMTMCVidReID
(root='', min_seq_len=0, **kwargs)[source]¶ DukeMTMCVidReID.
- Reference:
Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.
Wu et al. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. CVPR 2018.
URL: https://github.com/Yu-Wu/DukeMTMC-VideoReID
- Dataset statistics:
identities: 702 (train) + 702 (test).
tracklets: 2196 (train) + 2636 (test).