利用迁移学习实现车辆识别

摘要: 利用迁移学习技术训练识别汽车厂商和款式的模型。

@[toc]

概述

数据集: Stanford Cars Dataset

car

数据集特点:

  1. 存在明显的数据不平衡问题;
  2. 每个分类图像数目过少,无法达到准确预测分类目标的基准;

针对数据集的特点,利用迁移学习Fine-Tune来训练一个在ImageNet上预训练的模型是一个不错的方式,下面我们将从数据的准备开始一步步得完成模型的训练任务。

数据准备

在这一部分中,我们将读入原始数据,进行基本的数据处理,然后对数据进行统一存储,一般面对大规模的数据集可以选用HDF5或MXNet的.LST格式进行存储。

通过这种方式进行存储可以解决每张图片读取都要产生一次IO带来的访问时延,同时可以利用存储系统连续读的方式,直接对大规模数据集进行切片操作。

配置信息

为了配合后续处理流程方便,新建一个car.config的配置文件,用于对相关配置信息的存储:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from os import path

# define the base path to the cars dataset
BASE_PATH = "Path-to-car-dataset"

# based on the base path, derive the images path and meta file path
IMAGES_PATH = path.sep.join([BASE_PATH, "car_ims"])
LABELS_PATH = path.sep.join([BASE_PATH, "complete_dataset.csv"])

#define path for HDF5
TRAIN_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/train.hdf5"])
VAL_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/val.hdf5"])
TEST_HDF5 = path.sep.join([MX_OUTPUT, "hdf5/test.hdf5"])

#define path for storing Mean R G B data
DATASET_MEAN = path.sep.join([BASE_PATH, "output/car_mean.json"])

# define the path to the output directory used for storing plots,
# classification reports, etc.
OUTPUT_PATH = "output"
MODEL_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.hdf5"])
FIG_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.png"])
JSON_PATH = path.sep.join([OUTPUT_PATH,"inceptionv3_stanfordcar.json"])

# define the path to the label encoder
LABEL_ENCODER_PATH = path.sep.join([BASE_PATH, "output/le.cpickle"])

# define the percentage of validation and testing images relative
# to the number of training images
NUM_CLASSES = 164
NUM_VAL_IMAGES = 0.15
NUM_TEST_IMAGES = 0.15

# define the batch size
BATCH_SIZE = 64

配置文件中包括HDF5文件的存放位置描述,原始数据和数据描述文件路径,RGB均值存储位置,训练过程中输出的图像和日志存储位置等信息。

数据概览

  1. 我们先将数据描述文件complete_dataset.csv导入,了解下数据格式:
1
2
3
4
import pandas as pd
# loading image paths and labels
df = pd.read_csv(config.LABELS_PATH)
df.head()
id Image FileName Make Model Vechicle Typle Year
0 car_ims/000090.jpg Acura RL Sedan 2012
1 car_ims/000091.jpg Acura RL Sedan 2012
2 car_ims/000092.jpg Acura RL Sedan 2012
3 car_ims/000093.jpg Acura RL Sedan 2012
  1. 遍历文件列表,将文件地址和样本标签分别进行存储,在本实验中值使用了制造商和款式两种特征,所以构成分类总共有164个:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import os
    from sklearn.preprocessing import LabelEncoder
    trainPaths = []
    trainLabels = []

    for id,name in enumerate(df["Image Filename"]):
    trainPaths.append(os.sep.join([config.IMAGES_PATH,name]))
    trainLabels.append("{}:{}".format(df.iloc[id]["Make"], df.iloc[id]["Model"]))
    #Encoding labels to num
    le = LabelEncoder()
    trainLabels = le.fit_transform(trainLabels)
  2. 按照70%,15%,15%切分训练集、验证集和测试集:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    from sklearn.model_selection import train_test_split

    numVal = int(len(trainPaths)*0.15)
    numTest = int(len(trainPaths)*0.15)

    # perform sampling from the training set to construct a a validation set
    split = train_test_split(trainPaths, trainLabels, test_size=numVal,
    stratify=trainLabels)
    (trainPaths, valPaths, trainLabels, valLabels) = split

    # perform stratified sampling from the training set to construct a testing set
    split = train_test_split(trainPaths, trainLabels, test_size=numTest,
    stratify=trainLabels)
    (trainPaths, testPaths, trainLabels, testLabels) = split
  3. 初始化相关配置:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    # initialize the lists of RGB channel averages
    (R, G, B) = ([], [], [])

    # construct a list pairing the training, validation, and testing
    # image paths along with their corresponding labels and output list
    # files
    datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_HDF5),
    ("val", valPaths, valLabels, config.VAL_HDF5),
    ("test", testPaths, testLabels, config.TEST_HDF5)]
  4. 遍历数据集并存储至HDF5文件:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    import HDF5DatasetWriter
    import AspectAwarePreprocessor
    import progressbar

    #resize images to (256,256,3)
    aap = AspectAwarePreprocessor(256,256)

    # loop over the dataset tuples
    for (dType, paths, labels, outputPath) in datasets:
    # create HDF5 writer
    print("[INFO] building {}...".format(outputPath))
    writer = HDF5DatasetWriter((len(paths), 256, 256, 3), outputPath)

    # initialize the progress bar
    widgets = ["Building Dataset: ", progressbar.Percentage(), " ",
    progressbar.Bar(), " ", progressbar.ETA()]
    pbar = progressbar.ProgressBar(maxval=len(paths),
    widgets=widgets).start()

    # loop over the image paths
    for (i, (path, label)) in enumerate(zip(paths, labels)):
    # load the image from disk
    try:
    image = cv2.imread(path)
    image = aap.preprocess(image)
    #print(image.shape)
    # if we are building the training dataset, then compute the
    # mean of each channel in the image, then update the respective lists
    if dType == "train":
    (b, g, r) = cv2.mean(image)[:3]
    R.append(r)
    G.append(g)
    B.append(b)

    # add the image and label to the HDF5 dataset
    writer.add([image], [label])
    pbar.update(i)
    except:
    print(path)
    print(label)
    break

    # close the HDF5 writer
    pbar.finish()
    writer.close()

    我们首先统一将文件调整到(256,256,3)大小,再进行存储,所以在读取文件之后进行了简单的预处理。

  5. 将RGB均值存储至单独文件

    1
    2
    3
    4
    5
    6
    7
    8
    import pandas as pd
    import json
    # construct a dictionary of averages, then serialize the means to a JSON file
    print("[INFO] serializing means...")
    D = {"R": np.mean(R), "G": np.mean(G), "B": np.mean(B)}
    f = open(config.DATASET_MEAN, "w")
    f.write(json.dumps(D))
    f.close()

至此,我们完成了将图像文件分为三个类别并分别存到了三个HDF5文件之中。

关于HDF5文件存储相关内容,详见后续推出的预处理博客~~

如果使用MXNET的list和rec来构建数据存储集合,在上述步骤3的基础上,按如下步骤:

  1. 构建数据集列表文件’.list’文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    # construct a list pairing the training, validation, and testing
    # image paths along with their corresponding labels and output list
    # files
    datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_MX_LIST),
    ("val", valPaths, valLabels, config.VAL_MX_LIST),
    ("test", testPaths, testLabels, config.TEST_MX_LIST)]

    # loop over the dataset tuples
    for (dType, paths, labels, outputPath) in datasets:
    # open the output file for writing
    print("[INFO] building {}...".format(outputPath))

    f = open(outputPath, "w")

    # loop over each of the individual images + labels
    for (i, (path, label)) in enumerate(zip(paths, labels)):
    # write the image index, label, and output path to file
    row = "\t".join([str(i), str(label), path])
    f.write("{}\n".format(row))

    # close the output file
    f.close()
  2. 将Label名称序列化存储,便于后续调用:

    1
    2
    3
    f = open(config.LABEL_ENCODER_PATH, "wb")
    f.write(pickle.dumps(le))
    f.close()

    3.利用MXNet工具im2rec创建记录文件

    1
    2
    3
    4
    5
    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/train.lst "" ./raid/datasets/cars/rec/train.rec resize=256 encoding='.jpg' quality=100

    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/test.lst "" ./raid/datasets/cars/rec/test.rec resize=256 encoding='.jpg' quality=100

    $ /dsvm/tools/mxnet/bin/im2rec ./raid/datasets/cars/lists/val.lst "" ./raid/datasets/cars/rec/val.rec resize=256 encoding='.jpg' quality=100

训练迁移学习网络

在迁移学习的模型选择上我们选择了基于Keras提供的InceptionV3,可通过Keras官方文档了解更多使用说明。下表列出了在keras中各模型的表现:

模型 大小 Top1准确率 Top5准确率 参数数目 深度
Xception 88MB 0.790 0.945 22,910,480 126
VGG16 528MB 0.715 0.901 138,357,544 23
VGG19 549MB 0.727 0.910 143,667,240 26
ResNet50 99MB 0.759 0.929 25,636,712 168
InceptionV3 92MB 0.788 0.944 23,851,784 159
IncetionResNetV2 215MB 0.804 0.953 55,873,736 572
MobileNet 17MB 0.665 0.871 4,253,864 88

数据读入

输入读入过程主要包括几个关键任务:读取RGB均值文件,对原始数据进行预处理:图像扣取、数据增强、去通道均值、矩阵化等。

这块内容不是本篇重点,只能挖坑留给后续更新。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest")

# load the RGB means for the training set
means = json.loads(open(config.DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(224, 224)
pp = PatchPreprocessor(224, 224)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
iap = ImageToArrayPreprocessor()

# initialize the training and validation dataset generators
trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, 64, aug=aug,
preprocessors=[pp, mp, iap], classes=config.NUM_CLASSES)
valGen = HDF5DatasetGenerator(config.VAL_HDF5, 64,
preprocessors=[sp, mp, iap], classes=config.NUM_CLASSES)

模型设计

模型设计过程参考了Keras官方文档给出的演示,导入没有top的预训练InceptionV3模型,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# load the Inception network, ensuring the head FC layer sets are left off
baseModel = InceptionV3(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# initialize the new head of the network, a set of FC layers
# followed by a softmax classifier
x = baseModel.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
headModel = Dense(config.NUM_CLASSES, activation='softmax')(x)

model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they
# will *not* be updated during the training process
for layer in baseModel.layers:
layer.trainable = False

# compile our model (this needs to be done after our setting our
# layers to being non-trainable
print("[INFO] compiling model...")
opt = SGD(lr=0.005,momentum=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

特别注意在迁移学习中,由于新添加增的初始权重是随机生成的,而前面大量网络参数并frozen之后不再发生变化,所以需要一个预测的过程来学习参数到一定水平,需要控制学习率在一个比较小的范围。

这个过程可能需要反复尝试试错。

训练过程优化

训练过程参考了《Deep Learning for Computer Vison with Python》作者给出的Ctrl+C训练方法,可以随时保存训练现场,调整训练率继续进行训练。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import argparse
import json
import os
import logging
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--checkpoints", required=True, help="path to output checkpoint directory")
ap.add_argument("-m", "--model", type=str, help="path to *specific* model checkpoint to load")
ap.add_argument("-s", "--start-epoch", type=int, default=0, help="epoch to restart training at")
args = vars(ap.parse_args())

# set the logging level and output file
logging.basicConfig(level=logging.DEBUG,
filename="training_{}.log".format(args["start_epoch"]), filemode="w")

if args["model"] is None:
# load the VGG16 network, ensuring the head FC layer sets are left off

...
## 实现上一步骤的预训练模型定义和模型预热

else:
print("[INFO] loading {}...".format(args["model"]))
model = load_model(args["model"])
# update the learning rate
print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr)))
K.set_value(model.optimizer.lr, 1e-3)
print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr)))

# construct the set of callbacks
callbacks = [
EpochCheckpoint(args["checkpoints"], every=5,
startAt=args["start_epoch"]),
TrainingMonitor(config.FIG_PATH, jsonPath=config.JSON_PATH,
startAt=args["start_epoch"])]

# train the network
print("[INFO] training network...")
model.fit_generator(
trainGen.generator(),
steps_per_epoch=trainGen.numImages // config.BATCH_SIZE,
validation_data=valGen.generator(),
validation_steps=valGen.numImages // config.BATCH_SIZE,
epochs=100,
max_queue_size=config.BATCH_SIZE * 2,
callbacks=callbacks, verbose=1)

参考

  1. deep learning for computer vision with python

基于机器视觉技术的品牌LOGO检测

利用Flickr LOGO数据集训练一个检测品牌LOGO的网络,对机器视觉的物体识别技术进行验证。

@[toc]

概述

最近在做一个利用机器视觉技术进行超市物品检点的项目调研分析,需要先寻找一个可行的技术方案验证可行性,Flickr提供的LOGO数据集是一个很好的品牌LOGO识别例子,本文记录利用Flickr LOGO数据集训练一个物体识别的深度神经网络过程。

b

数据集

Flickr LOGO数据集提供了三种不同类型的LOGO数据集集合,分别为Flickr Logos 27 datasetDatasets: FlickrLogos-32以及Datasets: FlickrLogos-47。我们先来看一下每种数据集的组成及数据结构:

  1. Flickr Logos 27 dataset

    • 训练集包含27个分类的810张标记照片,每个分类30张照片

    • 分散集包含4207张logo图片

    • 测试集有270张照片,每个分类5张照片,另外有135张分类外照片集

    • 27个分类包括:Adidas, Apple, BMW, Citroen, Coca Cola, DHL, Fedex, Ferrari, Ford, Google, Heineken, HP, McDonalds, Mini, Nbc, Nike, Pepsi, Porsche, Puma, Red Bull, Sprite, Starbucks, Intel, Texaco, Unisef, Vodafone and Yahoo.

    • 下载地址:下载

    • 数据格式:下载文件夹中提供一个txt文件用于描述每个文件中LOGO的分类和位置信息

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      #  FileName ClassName subset   Coordinates(x1 y1 x2 y2)
      4763210295.jpg Adidas 1 91 288 125 306
      4763210295.jpg Adidas 1 182 63 229 94
      4763210295.jpg Adidas 1 192 291 225 306
      4763210295.jpg Adidas 1 285 61 317 79
      4763210295.jpg Adidas 1 285 298 324 329
      4763210295.jpg Adidas 1 377 292 421 324
      4763210295.jpg Adidas 1 383 55 416 76
      1230939811.jpg Adidas 2 129 326 257 423
      1230939811.jpg Adidas 2 137 336 243 395

      dataset1_bboxes

  2. Flickr Logos 32/47 dataset

    FlickrLogos-32 was designed for logo retrieval and multi-class logo detection and object recognition. However, the annotations for object detection were often incomplete,since only the most prominent logo instances were labelled.

    FlickrLogos-47 uses the same image corpus as FlickrLogos-32 but has been re-annotated specifically for the task of object detection and recognition.

    2.1 Flickr Logos-32

Partition Description Images #Images
P1 (training set) Hand-picked images 10 per class 320 images
P2 (validation set) Images showing at least a single logo under various views 30 per class + 3000 non-logo images 3960 images
P3 (test set = query set) Images showing at least a single logo under various views 30 per class + 3000 non-logo images 3960 images
/ / / 8240 images
2.2 FlickrLogos-47

小结

  • Flickr Logo数据集虽然类别数目众多,但具体到每个分类提供的样本数目有限,在数据预处理环节需要配合数据增强手段来扩充数据集的数目;
  • 另外也可以仿照车牌识别的方法,将扣取的LOGO图像添加到不同背景噪声的图像中,生成多种训练数据;
  • 由于LOGO图像包含图像特征有限,同时提供小样本数据,通过迁移学习的方案利用ImageNet训练好的模型进行迁移学习是一种很好的方式,本文将对这种方式进行讨论及实现;
  • 三种数据集面向不同的功能也设计需求,从图像质量上来看Flickr-47质量相对较好,同时在32分类和47分类中提供了对图像语义分割的标定数据;

在概览过任务数据集之后,我们将按照深度学习业务处理流程,逐步进行数据的预处理、模型准备、训练和验证等工作。为简化问题处理难度,我们使用Flickr Logo -27来进行本次实验。

数据准备

数据准备环节主要使用如下基本的工具和库文件:

1
2
3
4
5
6
7
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import cv2
import imutils

其中,

  • numpy用来做基本的矩阵处理;
  • pandas用于读取和分析数据描述文件;
  • matplotlib用于辅助显示预处理结果;
  • cv2是opencv的python封装,进行图像读取、图像分析等操作;
  • imutils是一个很好用的图像处理库,可以满足基本的图像处理需求

首先我们先查看从flickr-27上下载的文件flickr_logos_27_dataset_training_set_annotation.txt来了解基本的图像数据信息和分类信息:

1
2
3
4
df=pd.read_csv("./flickr_logos_27_dataset/flickr_logos_27_dataset_training_set_annotation.txt",sep=" ", header=None)
df.drop(df.columns[-1],axis=1, inplace=True)
df.columns=["Name","labels","subset","x1","y1","x2","y2"]
df.head()
1
2
3
4
5
6
7
output:>>>>
Name labels subset x1 y1 x2 y2
0 144503924.jpg Adidas 1 38 12 234 142
1 2451569770.jpg Adidas 1 242 208 413 331
2 390321909.jpg Adidas 1 13 5 89 60
3 4761260517.jpg Adidas 1 43 122 358 354
4 4763210295.jpg Adidas 1 83 63 130 93

在描述文件中总共提供了4536条记录,而实际提供的图像文件只有1000多张,这说明很多文件包括不止一个LOGO。

1
2
len(df)
>>>: 4536

我们可以利用pandas对文件进行一个简单的shuffle处理,便于快速切分成训练集和测试集:

1
2
# shuffle the datasets
df = df.sample(frac=1).reset_index(drop=True)

为了快速查看描述文件提供的标记信息在图像中的显示效果,我们写一个函数来查看一下LOGO标记信息的效果:

1
2
3
4
5
6
7
8
9
10
11
12
def show_image(id):
fig = plt.figure()
image = os.path.join("./flickr_logos_27_dataset/flickr_logos_27_dataset_images/",df.loc[id]["Name"])
image = cv2.imread(image)
plt.figure(8)
plt.imshow(image)
currentAxis=plt.gca()
rect=patches.Rectangle((df["x1"].iloc[id], df["y1"].iloc[id]),
df["x2"].iloc[id]-df["x1"].iloc[id],
df["y2"].iloc[id]-df["y1"].iloc[id],
linewidth=2,edgecolor='r',facecolor='none')
currentAxis.add_patch(rect)

其中用到了plt.gca()和matplotlib的patches函数用于图像的叠加显示,当然也可以直接调用cv2.rectangle函数

随机查看一个标记在图像中的显示效果

1
2
3
import random
id = random.randint(0,len(df))
show_image(id)

BMW_sample

下面需要写一个抠图程序,把所有LOGO从原始图像中扣取出来,形成训练用数据集,在保存图像之前进行图像简单的预处理和调整形状:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def crop_img(id):
image = os.path.join("./flickr_logos_27_dataset/flickr_logos_27_dataset_images/",df.loc[id]["Name"])
image = cv2.imread(image)
crop_image = image[df["y1"].iloc[id]:df["y2"].iloc[id],df["x1"].iloc[id]:df["x2"].iloc[id]]
return crop_image

WIDTH = 64
HEIGHT = 64

for id, name in enumerate(df["Name"]):
cropped_image = crop_img(id)
try:
resized_image = cv2.resize(cropped_image,
(WIDTH,HEIGHT),interpolation=cv2.INTER_CUBIC)
except:
print(id)
continue
image_name = str(id)+"_"+df.iloc[id]["labels"]+".jpg"
cv2.imwrite(os.path.join("./flickr_logos_27_dataset/cropped/",image_name),resized_image)

在图像扣取过程中,有几点需要注意:

  1. 由于描述问题提供的信息本身的问题,有一些异常数据需要剔除,比如有5条记录提供的x1=x2,或y1=y2,即在原始图像上没有进行标记;
  2. 图像缩放其实不应该采用这种傻瓜的压缩方式,应该尽量控制长宽比,保证不产生明显的形变;

处理完成之后,扣取图像将在cropped文件夹中以{id}_{label}.jpg的文件名存储。

图像扣取之后,通过人工核对,我们发现仍然存在一些明显有问题的图像,比如多张puma的图像,其实存在明显的标记问题,需要从数据集中剔除:

puma_error_img

数据集切分

我们将扣取数据读入进行简单预处理和数据切分:

1
2
3
4
5
6
7
8
9
10
data = []
labels = []
for img in os.listdir("./flickr_logos_27_dataset/cropped/"):
img_file = cv2.imread(os.path.join("./flickr_logos_27_dataset/cropped/",img))
data.append(img_file)
labels.append(img.split("_")[1].split(".")[0])
data = np.stack(data)
labels = np.stack(labels)

data = data/255

将标签数据转变成OneHot矩阵:

1
2
3
from sklearn.preprocessing import LabelBinarizer
le = LabelBinarizer()
labels = le.fit_transform(labels)

切分数据集

1
X,testX,y,testy = train_test_split(data, labels,test_size=0.1,stratify=labels,random_state=42 )

数据增强

数据增强是图像处理中经常采用的一种数据处理方式,由于涉及内容较多,在本篇实战中不单独展开,仅把利用Keras数据增强工具ImageDataGenerator的方法提供一下:

1
2
3
4
5
6
7
8
from keras.preprocessing.image import ImageDataGenerator 
# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=18, zoom_range=0.15,
width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
horizontal_flip=True, fill_mode="nearest")

gen_flow=aug.flow(X, y,batch_size=64,seed=0)
validation=aug.flow(testX,testy,batch_size=32,seed=0)

模型定义

根据前文对数据的分析,我们分别采取两种方式设计网络模型:从头训练一个深度卷积神经网络和利用迁移学习Fine-Tune一个满足需求的网络模型。

从头训练一个网络模型

由于问题本质是一个物体识别任务,所以在实现上应该包括图像分类和定位的回归两个子任务,我们可以简化问题通过一个滑动窗口来对输入图像进行扫描,然后针对每个扫描窗口进行图像分类。

当然实际过程中,问题要远比这复杂,很难选择合适的滑动窗口大小适用现实图像的需求,所以在主流的物体识别模型中一般都采用多种不同大小的Anchor box来回归图像的位置。

由于LOGO每张图像包含特征有限,我们在本次实验中利用LeNet的架构,设计了一个简单的卷积网络模型如下图所示:

model

模型主体利用三个CONV => RELU => POOL结构来抽取图像特征,最后利用全联通网络+Softmax分类器来获得最终27类分类结果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras import backend as K


model = Sequential()
inputShape = (HEIGHT, WIDTH, 3)
# first set of CONV => RELU => POOL layers
model.add(Conv2D(16, (3, 3), padding="same",input_shape=inputShape))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

# second set of CONV => RELU => POOL layers
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

# third set of CONV => RELU => POOL layers
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
model.add(Dropout(0.25))
# softmax classifier
model.add(Dense(len(CLASSNAME)))
model.add(Activation("softmax"))

定义目标优化函数:

1
2
3
from keras.optimizers import Adam,SGD,RMSprop
opt = RMSprop(lr=0.001, rho=0.9)
model.compile(loss="categorical_crossentropy", optimizer=opt,metrics=["accuracy"])

迭代训练100个epoch:

1
2
3
4
5
6
7
history=model.fit_generator(
gen_flow,
steps_per_epoch=len(X) // 32,
validation_data=aug.flow(testX,testy,batch_size=32,seed=0),
validation_steps=len(testX) // 32,
epochs=100,
verbose=1)

100轮之后,验证集达到了99.89%的准确率,基本满足了要求,训练过程中训练数据和验证数据的准确率及Loss变化详见下图:

accuracy_curve

loss_curve

测试

1
2
3
4
5
6
7
8
9
10
11
plt.figure(figsize = (15,40))
for i,test_img in enumerate(os.listdir("./test")):
img = cv2.imread(os.path.join("./test",test_img))
img = cv2.resize(img, (WIDTH,HEIGHT),interpolation=cv2.INTER_CUBIC)
img = np.expand_dims(img,axis=0)
result = model.predict(img)
result = le.inverse_transform(result)
plt.subplot(8,4, i+1)
img = cv2.cvtColor(img[0], cv2.COLOR_BGR2RGB)
plt.imshow(img)
plt.title('pred:' + str(result[0]))

test_result

设计滑动窗口和特征金字塔

其中滑动窗口用来遍历图像,特征金字塔用于实现图像的多尺度变换,保证多种不同大小的LOGO都可以被准确识别。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def sliding_window(image, step, ws):
# slide a window across the image
for y in range(0, image.shape[0] - ws[1], step):
for x in range(0, image.shape[1] - ws[0], step):
# yield the current window
yield (x, y, image[y:y + ws[1], x:x + ws[0]])

def image_pyramid(image, scale=1.5, minSize=(64, 64)):
# yield the original image
yield image

# keep looping over the image pyramid
while True:
# compute the dimensions of the next image in the pyramid
w = int(image.shape[1] / scale)
image = imutils.resize(image, width=w)

# if the resized image does not meet the supplied minimum
# size, then stop constructing the pyramid
if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
break

# yield the next image in the pyramid
yield image

特征金字塔

为了检测不同尺度的目标,依次将原图按比例缩放并送入网络。缺点是需要多次resize图像,繁琐耗时。

我们定义了输入图像的尺寸为(150,150),滑动窗口大小与我们前面训练的分类网络的输入一致为(64,64),特征金字塔的缩小比例为1.5倍,这样将在原始图像基础上进行两次缩放;另外定义了滑动窗口的步长为16。

1
2
3
4
5
6
7
8
# initialize variables used for the object detection procedure
INPUT_SIZE = (150, 150)
PYR_SCALE = 1.5
WIN_STEP = 16
ROI_SIZE = (64, 64)

labels = {}
CLASS_NAMES = list(lb.classes_)

为简化后续分析,定义一个预测函数,用于返回图像中预测准确率超过minProb窗口及对象分类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def logo_prediction(model, batchROIs, batchLocs, labels, minProb=0.5,dims=(64, 64)):
preds = model.predict(batchROIs)
for i in range(0,len(preds)):
prob = np.max(preds[i])
if prob > 0.5:
index = np.argmax(preds[i])
label = CLASS_NAMES[int(index)]
# grab the coordinates of the sliding window for
# the prediction and construct the bounding box
(pX, pY) = batchLocs[i]
box = (pX, pY, pX + dims[0], pY + dims[1])
L = labels.get(label, [])
L.append((box,prob))
labels[label] = L
return labels

我们将遍历每个特征金字塔和每个滑动窗口,对识别结果进行预测:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
img_file = "./test/2.jpg"
orig = cv2.imread(img_file)
# resize the input image to be a square
resized = cv2.resize(orig, INPUT_SIZE, interpolation=cv2.INTER_CUBIC)

# initialize the batch ROIs and (x, y)-coordinates
batchROIs = None
batchLocs = []
# loop over the image pyramid
for image in image_pyramid(resized, scale=PYR_SCALE,minSize=ROI_SIZE):
# loop over the sliding window locations
for (x, y, roi) in sliding_window(resized, WIN_STEP, ROI_SIZE):
# take the ROI and pre-process it so we can later classify the
# region with Keras
#roi = img_to_array(roi)
roi = roi/255
roi = np.expand_dims(roi, axis=0)
# roi = imagenet_utils.preprocess_input(roi)

# if the batch is None, initialize it
if batchROIs is None:
batchROIs = roi

# otherwise, add the ROI to the bottom of the batch
else:
batchROIs = np.vstack([batchROIs, roi])

# add the (x, y)-coordinates of the sliding window to the batch
batchLocs.append((x, y))


# classify the batch, then reset the batch ROIs and
# (x, y)-coordinates
model.predict(batchROIs)
labels = logo_prediction(model, batchROIs, batchLocs,labels, minProb=0.9)

当进行到这步骤才突然发现训练分类中缺了一个很重要的背景分类,将导致在背景上很多信息的预测会出问题,后续等整些背景图片再重新训练网络,😭

最后一步是预测结果的极大值抑制和显示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from imutils.object_detection import non_max_suppression
# loop over the labels for each of detected objects in the image
for k in labels.keys():
# clone the input image so we can draw on it
clone = resized.copy()

# loop over all bounding boxes for the label and draw them on the image
for (box, prob) in labels[k]:
(xA, yA, xB, yB) = box
cv2.rectangle(clone, (xA, yA), (xB, yB), (0, 255, 0), 2)

# grab the bounding boxes and associated probabilities for each
# detection, then apply non-maxima suppression to suppress
# weaker, overlapping detections
boxes = np.array([p[0] for p in labels[k]])
proba = np.array([p[1] for p in labels[k]])
boxes = non_max_suppression(boxes, proba)

# loop over the bounding boxes again, this time only drawing the
# ones that were *not* suppressed
for (xA, yA, xB, yB) in boxes:
cv2.rectangle(clone, (xA, yA), (xB, yB), (0, 0, 255), 2)

# show the output image
print("[INFO] {}: {}".format(k, len(boxes)))
plt.imshow(clone)

极大值抑制是物体识别中很重要的一个环节,相关概念以后在慢慢整理

利用迁移学习优化一个物体识别网络模型

上述方法虽然简单容易理解,但存在很大的计算效率问题,每张图片需要进行多次特征提取和多次运算,对计算效率造成很大影响。目前主流的物体识别算法往往都可以应用于实时视频流的分析,显然使用上述方法是不合适的。我们将在后面探讨利用现有的物体识别网络通过迁移学习解决我们的目标识别问题。

由于本篇内容太多,利用迁移学习实现的方法,将单独作为一篇,此处留待插入链接

本文涉及代码详见Github

训练一个二分类网络检查货架上是否有百事可乐

参考Github实现一个物品检测原型:训练一个二分类分类器

  1. 在数据准备阶段与上述过程唯一不同是label的设置,如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import os, cv2
import numpy as np
data = []
labels = []
HEIGHT = 64
WIDTH = 64
for img in os.listdir("./flickr_logos_27_dataset/cropped/"):
img_file = cv2.imread(os.path.join("./flickr_logos_27_dataset/cropped/",img))
data.append(img_file)
label = img.split("_")[1].split(".")[0]
if label != "Pepsi":
label = "Nop"
labels.append(label)
data = np.stack(data)
labels = np.stack(labels)
  1. 由于是二分类问题,所以只需要最后一层使用sigmoid函数构建分类器即可,label的序列话方面使用LabelEncoder转换为0或者1即可:

    1
    2
    3
    4
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import LabelEncoder
    lb = LabelEncoder()
    y = lb.fit_transform(labels)
  2. 数据增强与前文类似,不再赘言。在网络结构上,只需要修改最后为sigmoid函数输出,优化目标使用binary_crossentropy

    1
    2
    3
    4
    5
    ...
    model.add(Activation("sigmoid"))
    ...
    model.compile(loss="binary_crossentropy", optimizer=opt,
    metrics=["accuracy"])

    由于只有两个分类,所以模型很容易收敛,最后准确率也接近100%。

  3. 最后利用一个滑动窗口不停的扫描图像并利用cv2展示结果即可:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    for (x, y, window) in sliding_window(img, stepSize=32, windowSize=(winW, winH)):
    # if the window does not meet our desired window size, ignore it
    if window.shape[0] != winH or window.shape[1] != winW:
    continue

    crop_img=crop_image(sample_path,x, y, x + winW, y + winH)
    crop_img=imresize(crop_img,(64,64))
    crop_img = crop_img/255
    prediction=model.predict(crop_img.reshape(1,64,64,3))
    if prediction == 1:
    pred = 'Pepsi'
    else:
    pred=' '

    clone = img.copy()
    cv2.putText(clone, pred, (10, 30),cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
    cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
    clone = cv2.cvtColor(clone,cv2.COLOR_BGR2RGB)
    cv2.imshow("Window", clone)

    cv2.waitKey(1)
    time.sleep(0.5)

Logo检测的应用及分析

DeepSense.ai给出了一种Logo检测的分析方法,通过分析视频中不同品牌的logo呈现,统计了不同品牌在同一个视频中Logo出现的时间、出现的方式、呈现的效果等,最终提供给客户一个Logo Visubility Report

方案的主要流程如下图所示:

logo_detection_overview

生成的分析报告参见下图:

logo_detection_report

针对的分析视频如下:


参考

  1. Flickr Logos 27 dataset
  2. Datasets: FlickrLogos-32 / FlickrLogos-47
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×