经典网络归纳: ResNet

Test Picture

摘要: 作为深度卷积神经网络的里程碑式的作品,ResNet为卷积网络往更深层次扩展指明了方向,本文结合相关论文总结一下ResNet中创造性的想法。


利用迁移学习实现车辆识别

摘要: 利用迁移学习技术训练识别汽车厂商和款式的模型。

@[toc]

概述

数据集: Stanford Cars Dataset

car

数据集特点:

  1. 存在明显的数据不平衡问题;
  2. 每个分类图像数目过少,无法达到准确预测分类目标的基准;

针对数据集的特点,利用迁移学习Fine-Tune来训练一个在ImageNet上预训练的模型是一个不错的方式,下面我们将从数据的准备开始一步步得完成模型的训练任务。

数据准备

在这一部分中,我们将读入原始数据,进行基本的数据处理,然后对数据进行统一存储,一般面对大规模的数据集可以选用HDF5或MXNet的.LST格式进行存储。

通过这种方式进行存储可以解决每张图片读取都要产生一次IO带来的访问时延,同时可以利用存储系统连续读的方式,直接对大规模数据集进行切片操作。

配置信息

为了配合后续处理流程方便,新建一个car.config的配置文件,用于对相关配置信息的存储:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from os import path

# define the base path to the cars dataset
BASE_PATH = "Path-to-car-dataset"

# based on the base path, derive the images path and meta file path
IMAGES_PATH = path.sep.join([BASE_PATH, "car_ims"])
LABELS_PATH = path.sep.join([BASE_PATH, "complete_dataset.csv"])

#define path for HDF5
TRAIN_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/train.hdf5"])
VAL_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/val.hdf5"])
TEST_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/test.hdf5"])

#define path for storing Mean R G B data
DATASET_MEAN = path.sep.join([BASE_PATH, "output/car_mean.json"])

# define the path to the output directory used for storing plots,
# classification reports, etc.
OUTPUT_PATH = "output"
MODEL_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.hdf5"])
FIG_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.png"])
JSON_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.json"])

# define the path to the label encoder
LABEL_ENCODER_PATH = path.sep.join([BASE_PATH, "output/le.cpickle"])

# define the percentage of validation and testing images relative
# to the number of training images
NUM_CLASSES = 164
NUM_VAL_IMAGES = 0.15
NUM_TEST_IMAGES = 0.15

# define the batch size
BATCH_SIZE = 64

配置文件中包括HDF5文件的存放位置描述,原始数据和数据描述文件路径,RGB均值存储位置,训练过程中输出的图像和日志存储位置等信息。

数据概览

  1. 我们先将数据描述文件complete_dataset.csv导入,了解下数据格式:
1
2
3
4
import pandas as pd
# loading image paths and labels
df = pd.read_csv(config.LABELS_PATH)
df.head()
id Image FileName Make Model Vechicle Typle Year
0 car_ims/000090.jpg Acura RL Sedan 2012
1 car_ims/000091.jpg Acura RL Sedan 2012
2 car_ims/000092.jpg Acura RL Sedan 2012
3 car_ims/000093.jpg Acura RL Sedan 2012
  1. 遍历文件列表,将文件地址和样本标签分别进行存储,在本实验中值使用了制造商和款式两种特征,所以构成分类总共有164个:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import os
    from sklearn.preprocessing import LabelEncoder
    trainPaths = []
    trainLabels = []

    for id,name in enumerate(df["Image Filename"]):
    trainPaths.append(os.sep.join([config.IMAGES_PATH,name]))
    trainLabels.append("{}:{}".format(df.iloc[id]["Make"], df.iloc[id]["Model"]))
    #Encoding labels to num
    le = LabelEncoder()
    trainLabels = le.fit_transform(trainLabels)
  2. 按照70%,15%,15%切分训练集、验证集和测试集:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    from sklearn.model_selection import train_test_split

    numVal = int(len(trainPaths)*0.15)
    numTest = int(len(trainPaths)*0.15)

    # perform sampling from the training set to construct a a validation set
    split = train_test_split(trainPaths, trainLabels, test_size=numVal,
    stratify=trainLabels)
    (trainPaths, valPaths, trainLabels, valLabels) = split

    # perform stratified sampling from the training set to construct a testing set
    split = train_test_split(trainPaths, trainLabels, test_size=numTest,
    stratify=trainLabels)
    (trainPaths, testPaths, trainLabels, testLabels) = split
  3. 初始化相关配置:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    # initialize the lists of RGB channel averages
    (R, G, B) = ([], [], [])

    # construct a list pairing the training, validation, and testing
    # image paths along with their corresponding labels and output list
    # files
    datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_HDF5),
    ("val", valPaths, valLabels, config.VAL_HDF5),
    ("test", testPaths, testLabels, config.TEST_HDF5)]
  4. 遍历数据集并存储至HDF5文件:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    import HDF5DatasetWriter
    import AspectAwarePreprocessor
    import progressbar

    #resize images to (256,256,3)
    aap = AspectAwarePreprocessor(256,256)

    # loop over the dataset tuples
    for (dType, paths, labels, outputPath) in datasets:
    # create HDF5 writer
    print("[INFO] building {}...".format(outputPath))
    writer = HDF5DatasetWriter((len(paths), 256, 256, 3), outputPath)

    # initialize the progress bar
    widgets = ["Building Dataset: ", progressbar.Percentage(), " ",
    progressbar.Bar(), " ", progressbar.ETA()]
    pbar = progressbar.ProgressBar(maxval=len(paths),
    widgets=widgets).start()

    # loop over the image paths
    for (i, (path, label)) in enumerate(zip(paths, labels)):
    # load the image from disk
    try:
    image = cv2.imread(path)
    image = aap.preprocess(image)
    #print(image.shape)
    # if we are building the training dataset, then compute the
    # mean of each channel in the image, then update the respective lists
    if dType == "train":
    (b, g, r) = cv2.mean(image)[:3]
    R.append(r)
    G.append(g)
    B.append(b)

    # add the image and label to the HDF5 dataset
    writer.add([image], [label])
    pbar.update(i)
    except:
    print(path)
    print(label)
    break

    # close the HDF5 writer
    pbar.finish()
    writer.close()

    我们首先统一将文件调整到(256,256,3)大小,再进行存储,所以在读取文件之后进行了简单的预处理。

  5. 将RGB均值存储至单独文件

    1
    2
    3
    4
    5
    6
    7
    8
    import pandas as pd
    import json
    # construct a dictionary of averages, then serialize the means to a JSON file
    print("[INFO] serializing means...")
    D = {"R": np.mean(R), "G": np.mean(G), "B": np.mean(B)}
    f = open(config.DATASET_MEAN, "w")
    f.write(json.dumps(D))
    f.close()

至此,我们完成了将图像文件分为三个类别并分别存到了三个HDF5文件之中。

关于HDF5文件存储相关内容,详见后续推出的预处理博客~~

如果使用MXNET的list和rec来构建数据存储集合,在上述步骤3的基础上,按如下步骤:

  1. 构建数据集列表文件’.list’文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    # construct a list pairing the training, validation, and testing
    # image paths along with their corresponding labels and output list
    # files
    datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_MX_LIST),
    ("val", valPaths, valLabels, config.VAL_MX_LIST),
    ("test", testPaths, testLabels, config.TEST_MX_LIST)]

    # loop over the dataset tuples
    for (dType, paths, labels, outputPath) in datasets:
    # open the output file for writing
    print("[INFO] building {}...".format(outputPath))

    f = open(outputPath, "w")

    # loop over each of the individual images + labels
    for (i, (path, label)) in enumerate(zip(paths, labels)):
    # write the image index, label, and output path to file
    row = "\t".join([str(i), str(label), path])
    f.write("{}\n".format(row))

    # close the output file
    f.close()
  2. 将Label名称序列化存储,便于后续调用:

    1
    2
    3
    f = open(config.LABEL_ENCODER_PATH, "wb")
    f.write(pickle.dumps(le))
    f.close()

    3.利用MXNet工具im2rec创建记录文件

    1
    2
    3
    4
    5
    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/train.lst "" ./raid/datasets/cars/rec/train.rec resize=256 encoding='.jpg' quality=100

    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/test.lst "" ./raid/datasets/cars/rec/test.rec resize=256 encoding='.jpg' quality=100

    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/val.lst "" ./raid/datasets/cars/rec/val.rec resize=256 encoding='.jpg' quality=100

训练迁移学习网络

在迁移学习的模型选择上我们选择了基于Keras提供的InceptionV3,可通过Keras官方文档了解更多使用说明。下表列出了在keras中各模型的表现:

模型 大小 Top1准确率 Top5准确率 参数数目 深度
Xception 88MB 0.790 0.945 22,910,480 126
VGG16 528MB 0.715 0.901 138,357,544 23
VGG19 549MB 0.727 0.910 143,667,240 26
ResNet50 99MB 0.759 0.929 25,636,712 168
InceptionV3 92MB 0.788 0.944 23,851,784 159
IncetionResNetV2 215MB 0.804 0.953 55,873,736 572
MobileNet 17MB 0.665 0.871 4,253,864 88

数据读入

输入读入过程主要包括几个关键任务:读取RGB均值文件,对原始数据进行预处理:图像扣取、数据增强、去通道均值、矩阵化等。

这块内容不是本篇重点,只能挖坑留给后续更新。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest")

# load the RGB means for the training set
means = json.loads(open(config.DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(224, 224)
pp = PatchPreprocessor(224, 224)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
iap = ImageToArrayPreprocessor()

# initialize the training and validation dataset generators
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, 64, aug=aug,
preprocessors=[pp, mp, iap], classes=config.NUM_CLASSES)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, 64,
preprocessors=[sp, mp, iap], classes=config.NUM_CLASSES)

模型设计

模型设计过程参考了Keras官方文档给出的演示,导入没有top的预训练InceptionV3模型,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# load the Inception network, ensuring the head FC layer sets are left off
baseModel = InceptionV3(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# initialize the new head of the network, a set of FC layers
# followed by a softmax classifier
x = baseModel.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
headModel = Dense(config.NUM_CLASSES, activation='softmax')(x)

model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they
# will *not* be updated during the training process
for layer in baseModel.layers:
layer.trainable = False

# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=0.005,momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

特别注意在迁移学习中,由于新添加增的初始权重是随机生成的,而前面大量网络参数并frozen之后不再发生变化,所以需要一个预测的过程来学习参数到一定水平,需要控制学习率在一个比较小的范围。

这个过程可能需要反复尝试试错。

训练过程优化

训练过程参考了《Deep Learning for Computer Vison with Python》作者给出的Ctrl+C训练方法,可以随时保存训练现场,调整训练率继续进行训练。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import argparse
import json
import os
import logging
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--checkpoints", required=True, help="path to output checkpoint directory")
ap.add_argument("-m", "--model", type=str, help="path to *specific* model checkpoint to load")
ap.add_argument("-s", "--start-epoch", type=int, default=0, help="epoch to restart training at")
args = vars(ap.parse_args())

# set the logging level and output file
logging.basicConfig(level=logging.DEBUG,
filename="training_{}.log".format(args["start_epoch"]), filemode="w")

if args["model"] is None:
# load the VGG16 network, ensuring the head FC layer sets are left off

...
## 实现上一步骤的预训练模型定义和模型预热

else:
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])
# update the learning rate
print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr)))
K.set_value(model.optimizer.lr, 1e-3)
print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr)))

# construct the set of callbacks
callbacks = [
EpochCheckpoint(args["checkpoints"], every=5,
startAt=args["start_epoch"]),
TrainingMonitor(config.FIG_PATH, jsonPath=config.JSON_PATH,
startAt=args["start_epoch"])]

# train the network
print("[INFO] training network...")
model.fit_generator(
trainGen.generator(),
steps_per_epoch=trainGen.numImages // config.BATCH_SIZE,
validation_data=valGen.generator(),
validation_steps=valGen.numImages // config.BATCH_SIZE,
epochs=100,
max_queue_size=config.BATCH_SIZE * 2,
callbacks=callbacks, verbose=1)

参考

  1. deep learning for computer vision with python
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×