Skip to content
This repository was archived by the owner on Sep 1, 2021. It is now read-only.

Commit 45b0e68

Browse files
committed
Finished example_4, improved example_3, cleanup
1 parent 44d526f commit 45b0e68

File tree

6 files changed

+113
-238
lines changed

6 files changed

+113
-238
lines changed

README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,11 @@ Before getting started, we have to download a dataset and generate a csv file co
1111
3. tar xf images.tar.gz
1212
4. tar xf annotations.tar.gz
1313
5. mv annotations/xmls/* images/
14-
6. Optionally for data augmentations: pip3 install imgaug
15-
7. python3 generate_dataset.py
14+
6. python3 generate_dataset.py
1615

1716
# Single-object detection
1817

19-
## Example 1: Find the dogs/cats
18+
## Example 1: Finding dogs/cats
2019

2120
### Architecture
2221

@@ -35,7 +34,7 @@ We proceed in the same way to build the object detector:
3534
3. Add one/multiple/no convolution block (or `_inverted_res_block` for MobileNetv2)
3635
4. Add a convolution layer for the coordinates
3736

38-
The code in this repository uses MobileNetv2 [1], because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see [2] for a comparison). If you use another architecture, change `preprocess_input`.
37+
The code in this repository uses MobileNetv2, because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see [here](https://github.com/keras-team/keras-applications/blob/master/keras_applications/mobilenet_v2.py) for a comparison). If you use another architecture, change `preprocess_input`.
3938

4039
1. `python3 example_1/train.py`
4140
2. Adjust the WEIGHTS_FILE in `example_1/test.py` (given by the last script)
@@ -49,7 +48,7 @@ In the following images red is the predicted box, green is the ground truth:
4948

5049
![Image 2](https://i.imgur.com/ll9PNOF.jpg)
5150

52-
## Example 2: Find the dogs/cats and distinguish classes
51+
## Example 2: Finding dogs/cats and distinguishing classes
5352

5453
This time we have to run the scripts `example_2/train.py` and `example_2/test.py`.
5554

@@ -73,12 +72,22 @@ In this example, we use a skip-net architecture similar to U-Net. For an in-dept
7372

7473
![Dog](https://lars76.github.io/assets/images/dog2.gif)
7574

75+
## Example 4: YOLO-like detection
76+
77+
### Architecture
78+
79+
This example is based on the three YOLO papers. For an in-depth explanation see [this blog post](https://lars76.github.io/neural-networks/object-detection/obj-detection-from-scratch/).
80+
81+
### Result
82+
83+
![Multiple dogs](https://lars76.github.io/assets/images/multiple_dogs.jpg)
84+
7685
# Guidelines
7786

7887
## Improve accuracy (IoU)
7988

80-
- enable augmentations: set `AUGMENTATION=True` in generate_dataset.py and install *imgaug*.
81-
- better augmentations: increase `AUGMENTATION_PER_IMAGE` and try out different transformations.
89+
- enable augmentations: see `example_4` the same code can be added to the other examples
90+
- better augmentations: try out different values (flips, rotation etc.)
8291
- for MobileNetv1/2: increase `ALPHA` and `IMAGE_SIZE` in train_model.py
8392
- other architectures: increase `IMAGE_SIZE`
8493
- add more layers
@@ -98,9 +107,3 @@ In this example, we use a skip-net architecture similar to U-Net. For an in-dept
98107
- If the new dataset is small and not similar to ImageNet, freeze some layers.
99108
- If the new dataset is large, freeze no layers.
100109
- read http://cs231n.github.io/transfer-learning/
101-
102-
# References
103-
104-
[1] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen. *MobileNetV2: Inverted Residuals and Linear Bottlenecks*.
105-
106-
[2] https://github.com/keras-team/keras-applications/blob/master/keras_applications/mobilenet_v2.py

example_3/test.py

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import cv2
33
import glob
44

5-
WEIGHTS_FILE = "model-0.34.h5"
5+
WEIGHTS_FILE = "model-0.91.h5"
66
IMAGES = "images/*jpg"
77
THRESHOLD = 0.5
88
EPSILON = 0.02
@@ -13,28 +13,24 @@ def main():
1313

1414
for filename in glob.glob(IMAGES):
1515
unscaled = cv2.imread(filename)
16-
image = cv2.resize(unscaled, (IMAGE_WIDTH, IMAGE_HEIGHT))
16+
image = cv2.resize(unscaled, (IMAGE_SIZE, IMAGE_SIZE))
1717
feat_scaled = preprocess_input(np.array(image, dtype=np.float32))
1818

19-
region = model.predict(x=np.array([feat_scaled]))[0]
19+
region = np.squeeze(model.predict(feat_scaled[np.newaxis,:]))
2020

21-
output = np.zeros(unscaled.shape[:2], dtype=np.uint8)
22-
for i in range(region.shape[1]):
23-
for j in range(region.shape[0]):
24-
if region[i][j] > THRESHOLD:
25-
x = int(CELL_WIDTH * j * unscaled.shape[1] / IMAGE_WIDTH)
26-
y = int(CELL_HEIGHT * i * unscaled.shape[0] / IMAGE_HEIGHT)
27-
x2 = int(CELL_WIDTH * (j + 1) * unscaled.shape[1] / IMAGE_WIDTH)
28-
y2 = int(CELL_HEIGHT * (i + 1) * unscaled.shape[0] / IMAGE_HEIGHT)
29-
#cv2.rectangle(unscaled, (x, y), (x2, y2), (0, 255, 0), 1)
21+
output = np.zeros(region.shape, dtype=np.uint8)
22+
output[region > 0.5] = 1
3023

31-
output[y:y2,x:x2] = 1
32-
33-
_, contours, _ = cv2.findContours(output, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
24+
contours, _ = cv2.findContours(output, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
3425
for cnt in contours:
3526
approx = cv2.approxPolyDP(cnt, EPSILON * cv2.arcLength(cnt, True), True)
3627
x, y, w, h = cv2.boundingRect(approx)
37-
cv2.rectangle(unscaled, (x, y), (x + w, y + h), (0, 255, 0), 1)
28+
29+
x0 = np.rint(x * unscaled.shape[1] / output.shape[1]).astype(int)
30+
x1 = np.rint((x + w) * unscaled.shape[1] / output.shape[1]).astype(int)
31+
y0 = np.rint(y * unscaled.shape[0] / output.shape[0]).astype(int)
32+
y1 = np.rint((y + h) * unscaled.shape[0] / output.shape[0]).astype(int)
33+
cv2.rectangle(unscaled, (x0, y0), (x1, y1), (0, 255, 0), 1)
3834

3935
cv2.imshow("image", unscaled)
4036
cv2.waitKey(0)

example_3/train.py

Lines changed: 21 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,15 @@
1313
from tensorflow.keras.losses import binary_crossentropy
1414
from tensorflow.keras.backend import epsilon
1515

16-
1716
# 0.35, 0.5, 0.75, 1.0
1817
ALPHA = 1.0
1918

20-
IMAGE_HEIGHT = 224
21-
IMAGE_WIDTH = 224
22-
23-
HEIGHT_CELLS = 28
24-
WIDTH_CELLS = 28
19+
GRID_SIZE = 28
20+
IMAGE_SIZE = 224
2521

26-
CELL_WIDTH = IMAGE_WIDTH / WIDTH_CELLS
27-
CELL_HEIGHT = IMAGE_HEIGHT / HEIGHT_CELLS
22+
# first train with frozen weights, then fine tune
23+
TRAINABLE = False
24+
WEIGHTS = "model-0.89.h5"
2825

2926
EPOCHS = 200
3027
BATCH_SIZE = 8
@@ -42,7 +39,7 @@ def __init__(self, csv_file):
4239
self.paths = []
4340

4441
with open(csv_file, "r") as file:
45-
self.mask = np.zeros((sum(1 for line in file), HEIGHT_CELLS, WIDTH_CELLS))
42+
self.mask = np.zeros((sum(1 for line in file), GRID_SIZE, GRID_SIZE))
4643
file.seek(0)
4744

4845
reader = csv.reader(file, delimiter=",")
@@ -53,18 +50,13 @@ def __init__(self, csv_file):
5350

5451
path, image_height, image_width, x0, y0, x1, y1, _, _ = row
5552

56-
x0 *= IMAGE_WIDTH / image_width
57-
y0 *= IMAGE_HEIGHT / image_height
58-
x1 *= IMAGE_WIDTH / image_width
59-
y1 *= IMAGE_HEIGHT / image_height
53+
cell_start_x = np.rint(((GRID_SIZE - 1) / image_width) * x0).astype(int)
54+
cell_stop_x = np.rint(((GRID_SIZE - 1) / image_width) * x1).astype(int)
6055

61-
cell_start_x = max(math.ceil(x0 / CELL_WIDTH) - 1, 0)
62-
cell_stop_x = min(math.ceil(x1 / CELL_WIDTH), WIDTH_CELLS) - 1
56+
cell_start_y = np.rint(((GRID_SIZE - 1) / image_height) * y0).astype(int)
57+
cell_stop_y = np.rint(((GRID_SIZE - 1) / image_height) * y1).astype(int)
6358

64-
cell_start_y = max(math.ceil(y0 / CELL_HEIGHT) - 1, 0)
65-
cell_stop_y = min(math.ceil(y1 / CELL_HEIGHT), HEIGHT_CELLS) - 1
66-
67-
self.mask[index, cell_start_y:cell_stop_y+1, cell_start_x:cell_stop_x+1] = 1
59+
self.mask[index, cell_start_y : cell_stop_y, cell_start_x : cell_stop_x] = 1
6860

6961
self.paths.append(path)
7062

@@ -75,16 +67,16 @@ def __getitem__(self, idx):
7567
batch_paths = self.paths[idx * BATCH_SIZE:(idx + 1) * BATCH_SIZE]
7668
batch_masks = self.mask[idx * BATCH_SIZE:(idx + 1) * BATCH_SIZE]
7769

78-
batch_images = np.zeros((len(batch_paths), IMAGE_HEIGHT, IMAGE_WIDTH, 3), dtype=np.float32)
70+
batch_images = np.zeros((len(batch_paths), IMAGE_SIZE, IMAGE_SIZE, 3), dtype=np.float32)
7971
for i, f in enumerate(batch_paths):
8072
img = Image.open(f)
81-
img = img.resize((IMAGE_WIDTH, IMAGE_HEIGHT))
73+
img = img.resize((IMAGE_SIZE, IMAGE_SIZE))
8274
img = img.convert('RGB')
8375

8476
batch_images[i] = preprocess_input(np.array(img, dtype=np.float32))
8577
img.close()
8678

87-
return batch_images, batch_masks
79+
return batch_images, batch_masks[:,:,:,np.newaxis]
8880

8981
class Validation(Callback):
9082
def __init__(self, generator):
@@ -110,7 +102,7 @@ def on_epoch_end(self, epoch, logs):
110102
print(" - val_dice: {}".format(dice))
111103

112104
def create_model(trainable=True):
113-
model = MobileNetV2(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top=False, alpha=ALPHA, weights="imagenet")
105+
model = MobileNetV2(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, alpha=ALPHA, weights="imagenet")
114106

115107
for layer in model.layers:
116108
layer.trainable = trainable
@@ -136,23 +128,25 @@ def create_model(trainable=True):
136128
x = Activation("relu")(x)
137129

138130
x = Conv2D(1, kernel_size=1, activation="sigmoid")(x)
139-
x = Reshape((HEIGHT_CELLS, WIDTH_CELLS))(x)
140131

141132
return Model(inputs=model.input, outputs=x)
142133

143134
def loss(y_true, y_pred):
144135
def dice_coefficient(y_true, y_pred):
145-
numerator = 2 * tf.reduce_sum(y_true * y_pred)
146-
denominator = tf.reduce_sum(y_true + y_pred)
136+
numerator = 2 * tf.reduce_sum(y_true * y_pred, axis=-1)
137+
denominator = tf.reduce_sum(y_true + y_pred, axis=-1)
147138

148139
return numerator / (denominator + epsilon())
149140

150141
return binary_crossentropy(y_true, y_pred) - tf.log(dice_coefficient(y_true, y_pred) + epsilon())
151142

152143
def main():
153-
model = create_model()
144+
model = create_model(trainable=TRAINABLE)
154145
model.summary()
155146

147+
if TRAINABLE:
148+
model.load_weights(WEIGHTS)
149+
156150
train_datagen = DataGenerator(TRAIN_CSV)
157151
validation_datagen = Validation(generator=DataGenerator(VALIDATION_CSV))
158152

example_4/test.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,42 @@
33
import glob
44
import numpy as np
55

6-
WEIGHTS_FILE = "model-0.37.h5"
6+
WEIGHTS_FILE = "model-0.51.h5"
77
IMAGES = "images/*jpg"
88

99
IOU_THRESHOLD = 0.5
1010
SCORE_THRESHOLD = 0.5
11-
MAX_OUTPUT_SIZE = 300
11+
MAX_OUTPUT_SIZE = 49
1212

1313
def main():
1414
model = create_model()
1515
model.load_weights(WEIGHTS_FILE)
1616

1717
for filename in glob.glob(IMAGES):
1818
unscaled = cv2.imread(filename)
19-
image = cv2.resize(unscaled, (IMAGE_SIZE, IMAGE_SIZE))
20-
feat_scaled = preprocess_input(np.array(image, dtype=np.float32))
19+
img = cv2.resize(unscaled, (IMAGE_SIZE, IMAGE_SIZE))
2120

22-
pred = model.predict(x=np.array([feat_scaled]))[0]
23-
height, width, y, x, score = pred[..., 0].flatten(), pred[..., 1].flatten(), pred[..., 2].flatten(), pred[..., 3].flatten(), pred[..., 4].flatten()
21+
feat_scaled = preprocess_input(np.array(img, dtype=np.float32))
22+
23+
pred = np.squeeze(model.predict(feat_scaled[np.newaxis,:]))
24+
height, width, y_f, x_f, score = [a.flatten() for a in np.split(pred, pred.shape[-1], axis=-1)]
2425

2526
coords = np.arange(pred.shape[0] * pred.shape[1])
26-
boxes = np.stack([coords // pred.shape[0] + y + 1, coords % pred.shape[1] + x + 1, height, width, score], axis=-1)
27+
y = (y_f + coords // pred.shape[0]) / (pred.shape[0] - 1)
28+
x = (x_f + coords % pred.shape[1]) / (pred.shape[1] - 1)
29+
30+
boxes = np.stack([y, x, height, width, score], axis=-1)
2731
boxes = boxes[np.where(boxes[...,-1] >= SCORE_THRESHOLD)]
2832

2933
selected_indices = tf.image.non_max_suppression(boxes[...,:-1], boxes[...,-1], MAX_OUTPUT_SIZE, IOU_THRESHOLD)
3034
selected_indices = tf.Session().run(selected_indices)
3135

32-
for k in boxes[selected_indices]:
33-
h = k[2] * unscaled.shape[0]
34-
w = k[3] * unscaled.shape[1]
35-
36-
y0 = k[0] * unscaled.shape[0] / pred.shape[0] - h / 2
37-
x0 = k[1] * unscaled.shape[1] / pred.shape[1] - w / 2
38-
y1 = y0 + h
39-
x1 = x0 + w
36+
for y_c, x_c, h, w, _ in boxes[selected_indices]:
37+
x0 = unscaled.shape[1] * (x_c - w / 2)
38+
y0 = unscaled.shape[0] * (y_c - h / 2)
39+
x1 = x0 + unscaled.shape[1] * w
40+
y1 = y0 + unscaled.shape[0] * h
4041

41-
#cv2.rectangle(unscaled, (int(k[1] * unscaled.shape[0] / pred.shape[0]), int(k[0] * unscaled.shape[0] / pred.shape[0])), (int(10 + k[1] * unscaled.shape[0] / pred.shape[0]), int(10 + k[0] * unscaled.shape[0] / pred.shape[0])), (0, 0, 255), 1)
4242
cv2.rectangle(unscaled, (int(x0), int(y0)), (int(x1), int(y1)), (0, 255, 0), 1)
4343

4444
cv2.imshow("image", unscaled)
@@ -47,4 +47,4 @@ def main():
4747

4848

4949
if __name__ == "__main__":
50-
main()
50+
main()

0 commit comments

Comments
 (0)