Skip to content

Excessive Redundant Bounding Boxes in PaliGemma2 Fine-Tuning for Detection Tasks #358

Open
@David-19940718

Description

@David-19940718

Search before asking

  • I have searched the Roboflow Notebooks issues and found no similar bug report.

Notebook name

https://github.com/roboflow/notebooks/blob/main/notebooks/how-to-finetune-paligemma2-on-detection-dataset.ipynb

Bug

Thank you for your excellent work. While attempting to fine-tune for a downstream detection task based on PaliGemma2, I noticed that the final training results produced many redundant bounding boxes. Based on the predictions, the model is capable of detecting the targets, but it continuously outputs additional bounding boxes until it reaches the set max-new-tokens limit. Could you provide any insights or suggestions on this issue?

<loc0400><loc0516><loc0652><loc0712> 7 of clubs ; <loc0292><loc0300><loc0584><loc0512> 8 of clubs ; <loc0406><loc0724><loc0708><loc1007> 5 of clubs ; <loc0216><loc0084><loc0528><loc0316> 6 of clubs ; <loc0400><loc0516><loc0648><loc0708> 6 of clubs ; <loc0292><loc0295><loc0580><loc0512> 8 of clubs ; <loc0412><loc0732><loc0701><loc1007> 4 of clubs ; <loc0208><loc0080><loc0528><loc0316> 5 of clubs ; <loc0756><loc0136><loc1023><loc0316> 10 of clubs ; <loc0000><loc0000><loc1023><loc1016> 9 of clubs ; <loc0000><loc0000><loc0580><loc0540> 10 of clubs ; <loc0416><loc0540><loc0644><loc0708> 8 of clubs ; <loc0756><loc0144><loc0880><loc0292> 5 of clubs ; <loc0756><loc0144><loc1023><loc0322> 2 of clubs ; <loc0756><loc0144><loc1023><loc0316> 9 of clubs ; <loc0756><loc0144><loc1023><loc0316> 5 of clubs ; <loc0756><loc0144><loc1023><loc0305> 5 of clubs ; <loc0756><loc0144><loc1023><loc0305> 5 of clubs ; <loc0756><loc0144><loc1023><loc0295> 5 of clubs ; <loc0756><loc0232><loc1023><loc0322> 5 of clubs ; <loc0738><loc0000><loc1023><loc0136> 5 of clubs ; <loc0756><loc0000><loc1023><loc0136> 5 of clubs ; <loc0756><loc0000><loc1023><loc0136> 5 of clubs ; <loc0000><loc0000><loc0580><loc0372> 10 of clubs ; <loc0738><loc0000><loc1023><loc0136> 5 of clubs ; <loc0738><loc0000><loc1023>

Environment

  • Local
  • OS: Ubuntu 20.04
  • Python: 3.10.6
  • Transformers: 4.47.0

Minimal Reproducible Example

No response

Additional

Additionally, here is the terminal log output:

ubuntu@ubuntu:/ssd2/workspace/mllm/fine-tune-paligemma/Google-PaliGemma2-Finetune$ CUDA_VISIBLE_DEVICES=2,3 python train.py --lora --epochs 8
hyperparameters: remove_unused_columns=False, gradient_accumulation_steps=16, warmup_steps=2, weight_decay=1e-06, adam_beta2=0.999, logging_steps=50, optim=adamw_hf, save_strategy=steps, save_steps=200, save_total_limit=1, bf16=True, report_to=['tensorboard'], dataloader_pin_memory=False
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  3.00s/it]
trainable params: 11,876,352 || all params: 3,045,003,504 || trainable%: 0.3900
freezing vision model layers
freezing multi-modal projector
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 2.6005, 'grad_norm': 23.956716537475586, 'learning_rate': 1.7587939698492464e-05, 'epoch': 0.99}                                                                                                                   
{'loss': 1.9132, 'grad_norm': 23.055450439453125, 'learning_rate': 1.5075376884422112e-05, 'epoch': 1.97}                                                                                                                   
{'loss': 1.6768, 'grad_norm': 33.97260284423828, 'learning_rate': 1.256281407035176e-05, 'epoch': 2.95}                                                                                                                     
{'loss': 1.5855, 'grad_norm': 26.143875122070312, 'learning_rate': 1.0050251256281408e-05, 'epoch': 3.93}                                                                                                                   
{'loss': 1.5406, 'grad_norm': 24.072601318359375, 'learning_rate': 7.537688442211056e-06, 'epoch': 4.91}                                                                                                                    
{'loss': 1.515, 'grad_norm': 34.959720611572266, 'learning_rate': 5.025125628140704e-06, 'epoch': 5.89}                                                                                                                     
{'loss': 1.5009, 'grad_norm': 29.38210105895996, 'learning_rate': 2.512562814070352e-06, 'epoch': 6.87}                                                                                                                     
{'loss': 1.4799, 'grad_norm': 39.997161865234375, 'learning_rate': 0.0, 'epoch': 7.85}                                                                                                                                      
{'train_runtime': 5094.8047, 'train_samples_per_second': 1.273, 'train_steps_per_second': 0.079, 'train_loss': 1.726562728881836, 'epoch': 7.85}                                                                            
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [1:24:54<00:00, 12.74s/it]
  0%|                                                                                                                                                                                                | 0/44 [00:00<?, ?it/s]The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44/44 [32:50<00:00, 44.77s/it]
map_result: MeanAveragePrecisionResult:
Metric target: MetricTarget.BOXES
Class agnostic: False
mAP @ 50:95: 0.4415
mAP @ 50:    0.4892
mAP @ 75:    0.4738
mAP scores: [0.48919323 0.48853667 0.48907614 0.47919466 0.47776663 0.47379989
 0.47379989 0.4553695  0.38925978 0.19944912]
IoU thresh: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
AP per class:
  0: [0.03280268 0.03280268 0.03280268 0.03280268 0.03280268 0.03280268
 0.03280268 0.03280268 0.03280268 0.02328249]
  1: [0.07497781 0.07497781 0.07497781 0.07497781 0.07497781 0.07497781
 0.07497781 0.07497781 0.07497781 0.02398884]
  2: [0.05993939 0.05993939 0.05993939 0.05993939 0.05993939 0.05993939
 0.05993939 0.05993939 0.04830372 0.03577281]
  3: [0.07545013 0.07545013 0.07545013 0.07545013 0.07545013 0.07545013
 0.07545013 0.07545013 0.07545013 0.04084158]
  4: [0.23377338 0.23377338 0.23377338 0.23377338 0.23377338 0.23377338
 0.23377338 0.23377338 0.23377338 0.12376238]
  5: [0.1980198 0.1980198 0.1980198 0.1980198 0.1980198 0.1980198 0.1980198
 0.1980198 0.1980198 0.1980198]
  6: [0.24752475 0.24752475 0.24752475 0.24752475 0.24752475 0.24752475
 0.24752475 0.24752475 0.24752475 0.24752475]
  7: [0.23883888 0.23883888 0.23883888 0.23883888 0.23883888 0.23883888
 0.23883888 0.09207921 0.09207921 0.        ]
  8: [0.330033   0.330033   0.330033   0.330033   0.330033   0.330033
 0.330033   0.330033   0.330033   0.08250825]
  9: [0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033
 0.330033 0.330033]
  10: [0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495
 0.4950495 0.4950495 0.4950495]
  11: [0.32850071 0.32850071 0.32850071 0.32850071 0.32850071 0.32850071
 0.32850071 0.32850071 0.12835926 0.04084158]
  12: [0.7029703 0.7029703 0.7029703 0.7029703 0.7029703 0.7029703 0.7029703
 0.7029703 0.7029703 0.       ]
  13: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
  14: [0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495
 0.4950495 0.4950495 0.       ]
  15: [0.32850071 0.32850071 0.32850071 0.32850071 0.32850071 0.32850071
 0.32850071 0.32850071 0.20226308 0.0466761 ]
  16: [0.46153408 0.42739274 0.34818482 0.34818482 0.27392739 0.27392739
 0.27392739 0.27392739 0.27392739 0.27392739]
  17: [0.34771334 0.34771334 0.34771334 0.34771334 0.34771334 0.34771334
 0.34771334 0.34771334 0.34771334 0.16184311]
  18: [0.34470678 0.34470678 0.34470678 0.34470678 0.34470678 0.34470678
 0.34470678 0.34470678 0.19455264 0.04158416]
  19: [0.11261932 0.11261932 0.11261932 0.11261932 0.11261932 0.11261932
 0.11261932 0.07263803 0.0090009  0.        ]
  20: [0.76661952 0.76661952 0.76661952 0.76661952 0.76661952 0.76661952
 0.76661952 0.76661952 0.47100424 0.04950495]
  21: [0.62164074 0.62164074 0.62164074 0.62164074 0.62164074 0.62164074
 0.62164074 0.62164074 0.62164074 0.04950495]
  22: [0.83168317 0.83168317 0.83168317 0.83168317 0.83168317 0.83168317
 0.83168317 0.83168317 0.61110325 0.        ]
  23: [0.71047105 0.71047105 0.71047105 0.47794779 0.47794779 0.47794779
 0.47794779 0.47794779 0.08550855 0.04950495]
  24: [0.51815182 0.51815182 0.51815182 0.51815182 0.51815182 0.51815182
 0.51815182 0.51815182 0.51815182 0.43894389]
  25: [0.71239981 0.71239981 0.71239981 0.71239981 0.71239981 0.71239981
 0.71239981 0.71239981 0.71239981 0.5709571 ]
  26: [0.6165732  0.6165732  0.6165732  0.6165732  0.6165732  0.6165732
 0.6165732  0.6165732  0.52602183 0.20660066]
  27: [0.72811567 0.72811567 0.72811567 0.44680182 0.44680182 0.44680182
 0.44680182 0.15558699 0.07260726 0.        ]
  28: [0.74422442 0.74422442 0.74422442 0.74422442 0.74422442 0.5379538
 0.5379538  0.5379538  0.5379538  0.34818482]
  29: [0.77310231 0.77310231 0.77310231 0.77310231 0.77310231 0.77310231
 0.77310231 0.77310231 0.77310231 0.14438944]
  30: [0.74014555 0.74014555 0.74014555 0.74014555 0.74014555 0.74014555
 0.74014555 0.74014555 0.74014555 0.17161716]
  31: [0.55941981 0.55941981 0.55941981 0.55941981 0.55941981 0.55941981
 0.55941981 0.55941981 0.55941981 0.01414427]
  32: [0.47854785 0.47854785 0.47854785 0.47854785 0.47854785 0.47854785
 0.47854785 0.47854785 0.47854785 0.22277228]
  33: [0.42285479 0.42285479 0.42285479 0.42285479 0.42285479 0.42285479
 0.42285479 0.42285479 0.24752475 0.25636492]
  34: [0.2491377  0.2491377  0.2491377  0.2491377  0.2491377  0.2491377
 0.2491377  0.10245912 0.10245912 0.04479019]
  35: [0.08392268 0.08392268 0.08392268 0.08392268 0.08392268 0.08392268
 0.08392268 0.08392268 0.08392268 0.01815182]
  36: [0.44554455 0.44554455 0.44554455 0.44554455 0.44554455 0.44554455
 0.44554455 0.44554455 0.44554455 0.0990099 ]
  37: [0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495
 0.4950495 0.4950495 1.       ]
  38: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
  39: [0.44059406 0.44059406 0.44059406 0.44059406 0.44059406 0.44059406
 0.44059406 0.44059406 0.4950495  0.08168317]
  40: [1.         1.         1.         1.         1.         1.
 1.         1.         1.         0.69059406]
  41: [0.17030549 0.17030549 0.17030549 0.17030549 0.17030549 0.17030549
 0.17030549 0.17030549 0.13645595 0.00634679]
  42: [0.8019802 0.8019802 0.8019802 0.8019802 0.8019802 0.8019802 0.8019802
 0.8019802 0.8019802 0.0990099]
  43: [0.91584158 0.91584158 0.80693069 0.80693069 0.80693069 0.80693069
 0.80693069 0.80693069 0.08168317 0.        ]
  44: [0.81848185 0.81848185 0.81848185 0.81848185 0.81848185 0.81848185
 0.81848185 0.81848185 0.81848185 0.1320132 ]
  45: [0.25990099 0.25990099 0.47607261 0.47607261 0.47607261 0.47607261
 0.47607261 0.14232673 0.14232673 0.14232673]
  46: [0.82791136 0.82791136 0.82791136 0.82791136 0.82791136 0.82791136
 0.82791136 0.82791136 0.82791136 0.42479962]
  47: [0.7019802  0.7019802  0.7019802  0.7019802  0.7019802  0.7019802
 0.7019802  0.7019802  0.06534653 0.        ]
  48: [0.09806695 0.09806695 0.09806695 0.09806695 0.09806695 0.09806695
 0.09806695 0.09806695 0.09806695 0.00884017]
  49: [0.62871287 0.62871287 0.62871287 0.62871287 0.62871287 0.62871287
 0.62871287 0.62871287 0.62871287 0.62871287]
  50: [0.71287129 0.71287129 0.71287129 0.71287129 0.71287129 0.71287129
 0.71287129 0.71287129 0.42574257 0.30693069]
  51: [0.12575994 0.12575994 0.12575994 0.12575994 0.12575994 0.12575994
 0.12575994 0.12575994 0.12575994 0.        ]

Small objects:
  MeanAveragePrecisionResult:
  Metric target: MetricTarget.BOXES
  Class agnostic: False
  mAP @ 50:95: 0.0000
  mAP @ 50:    0.0000
  mAP @ 75:    0.0000
  mAP scores: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  IoU thresh: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
  AP per class:
    No results
  
Medium objects:
  MeanAveragePrecisionResult:
  Metric target: MetricTarget.BOXES
  Class agnostic: False
  mAP @ 50:95: 0.0000
  mAP @ 50:    0.0000
  mAP @ 75:    0.0000
  mAP scores: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  IoU thresh: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
  AP per class:
    No results
  
Large objects:
  MeanAveragePrecisionResult:
  Metric target: MetricTarget.BOXES
  Class agnostic: False
  mAP @ 50:95: 0.4800
  mAP @ 50:    0.5283
  mAP @ 75:    0.5193
  mAP scores: [0.52829363 0.52765895 0.53252578 0.52467807 0.52340872 0.51931504
   0.51931504 0.49847883 0.42658422 0.20021746]
  IoU thresh: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
  AP per class:
    0: [0.03619003 0.03619003 0.03619003 0.03619003 0.03619003 0.03619003
   0.03619003 0.03619003 0.03619003 0.02657742]
    1: [0.05798595 0.05798595 0.05798595 0.05798595 0.05798595 0.05798595
   0.05798595 0.05798595 0.05798595 0.0256342 ]
    2: [0.04018435 0.04018435 0.04018435 0.04018435 0.04018435 0.04018435
   0.04018435 0.04018435 0.0289021  0.01678454]
    3: [0.11543189 0.11543189 0.11543189 0.11543189 0.11543189 0.11543189
   0.11543189 0.11543189 0.11543189 0.08168317]
    4: [0.26520509 0.26520509 0.26520509 0.26520509 0.26520509 0.26520509
   0.26520509 0.26520509 0.26520509 0.12376238]
    5: [0.24752475 0.24752475 0.24752475 0.24752475 0.24752475 0.24752475
   0.24752475 0.24752475 0.24752475 0.24752475]
    6: [0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033
   0.330033 0.330033]
    7: [0.56831683 0.56831683 0.56831683 0.56831683 0.56831683 0.56831683
   0.56831683 0.4019802  0.4019802  0.        ]
    8: [0.330033   0.330033   0.330033   0.330033   0.330033   0.330033
   0.330033   0.330033   0.330033   0.08250825]
    9: [0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033 0.330033
   0.330033 0.330033]
    10: [0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495
   0.4950495 0.4950495 0.4950495]
    11: [0.62623762 0.62623762 0.62623762 0.62623762 0.62623762 0.62623762
   0.62623762 0.62623762 0.41831683 0.33663366]
    12: [0.75247525 0.75247525 0.75247525 0.75247525 0.75247525 0.75247525
   0.75247525 0.75247525 0.75247525 0.        ]
    13: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
    14: [0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495 0.4950495
   0.4950495 0.4950495 0.       ]
    15: [0.57953795 0.57953795 0.57953795 0.57953795 0.57953795 0.57953795
   0.57953795 0.57953795 0.1379538  0.03630363]
    16: [0.44494449 0.41194119 0.33993399 0.33993399 0.27392739 0.27392739
   0.27392739 0.27392739 0.27392739 0.27392739]
    17: [0.37600189 0.37600189 0.37600189 0.37600189 0.37600189 0.37600189
   0.37600189 0.37600189 0.37600189 0.16184311]
    18: [0.33200051 0.33200051 0.33200051 0.33200051 0.33200051 0.33200051
   0.33200051 0.33200051 0.18223141 0.03272827]
    19: [0.34899919 0.34899919 0.34899919 0.34899919 0.34899919 0.34899919
   0.34899919 0.19309074 0.0330033  0.        ]
    20: [0.81188119 0.81188119 0.81188119 0.81188119 0.81188119 0.81188119
   0.81188119 0.81188119 0.5049505  0.04950495]
    21: [0.48797737 0.48797737 0.48797737 0.48797737 0.48797737 0.48797737
   0.48797737 0.48797737 0.48797737 0.04950495]
    22: [0.74257426 0.74257426 0.74257426 0.74257426 0.74257426 0.74257426
   0.74257426 0.74257426 0.5675389  0.        ]
    23: [0.60003143 0.60003143 0.60003143 0.39336791 0.39336791 0.39336791
   0.39336791 0.39336791 0.07150715 0.04950495]
    24: [0.57001414 0.57001414 0.57001414 0.57001414 0.57001414 0.57001414
   0.57001414 0.57001414 0.57001414 0.49080622]
    25: [0.71239981 0.71239981 0.71239981 0.71239981 0.71239981 0.71239981
   0.71239981 0.71239981 0.71239981 0.5709571 ]
    26: [0.60591059 0.60591059 0.60591059 0.60591059 0.60591059 0.60591059
   0.60591059 0.60591059 0.51749175 0.20660066]
    27: [0.59619491 0.59619491 0.59619491 0.39477771 0.39477771 0.39477771
   0.39477771 0.11745292 0.08958039 0.        ]
    28: [0.77062706 0.77062706 0.77062706 0.77062706 0.77062706 0.55775578
   0.55775578 0.55775578 0.55775578 0.36138614]
    29: [0.77310231 0.77310231 0.77310231 0.77310231 0.77310231 0.77310231
   0.77310231 0.77310231 0.77310231 0.14438944]
    30: [0.73443344 0.73443344 0.73443344 0.73443344 0.73443344 0.73443344
   0.73443344 0.73443344 0.73443344 0.16984006]
    31: [0.6039604 0.6039604 0.6039604 0.6039604 0.6039604 0.6039604 0.6039604
   0.6039604 0.6039604 0.0330033]
    32: [0.51980198 0.51980198 0.51980198 0.51980198 0.51980198 0.51980198
   0.51980198 0.51980198 0.51980198 0.24339934]
    33: [0.41136256 0.41136256 0.41136256 0.41136256 0.41136256 0.41136256
   0.41136256 0.41136256 0.25636492 0.24752475]
    34: [0.26440296 0.26440296 0.26440296 0.26440296 0.26440296 0.26440296
   0.26440296 0.11423612 0.11423612 0.04696624]
    35: [0.15470297 0.15470297 0.15470297 0.15470297 0.15470297 0.15470297
   0.15470297 0.15470297 0.15470297 0.04084158]
    36: [0.41254125 0.41254125 0.41254125 0.41254125 0.41254125 0.41254125
   0.41254125 0.41254125 0.41254125 0.12376238]
    37: [1.        1.        1.        1.        1.        1.        1.
   1.        1.        0.4950495]
    38: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
    39: [0.7019802  0.7019802  0.7019802  0.7019802  0.7019802  0.7019802
   0.7019802  0.7019802  0.75643564 0.06534653]
    40: [1.         1.         1.         1.         1.         1.
   1.         1.         1.         0.69059406]
    41: [0.17030549 0.17030549 0.17030549 0.17030549 0.17030549 0.17030549
   0.17030549 0.17030549 0.13645595 0.00634679]
    42: [0.75247525 0.75247525 0.75247525 0.75247525 0.75247525 0.75247525
   0.75247525 0.75247525 0.75247525 0.0990099 ]
    43: [0.80693069 0.80693069 0.91584158 0.91584158 0.91584158 0.91584158
   0.91584158 0.91584158 0.08168317 0.        ]
    44: [0.81848185 0.81848185 0.81848185 0.81848185 0.81848185 0.81848185
   0.81848185 0.81848185 0.81848185 0.1320132 ]
    45: [0.25990099 0.25990099 0.47607261 0.47607261 0.47607261 0.47607261
   0.47607261 0.14232673 0.14232673 0.14232673]
    46: [0.87741631 0.87741631 0.87741631 0.87741631 0.87741631 0.87741631
   0.87741631 0.87741631 0.87741631 0.39179632]
    47: [0.75643564 0.75643564 0.75643564 0.75643564 0.75643564 0.75643564
   0.75643564 0.75643564 0.06534653 0.        ]
    48: [0.09806695 0.09806695 0.09806695 0.09806695 0.09806695 0.09806695
   0.09806695 0.09806695 0.09806695 0.00884017]
    49: [0.61103253 0.61103253 0.61103253 0.61103253 0.61103253 0.61103253
   0.61103253 0.61103253 0.61103253 0.61103253]
    50: [0.64686469 0.64686469 0.64686469 0.64686469 0.64686469 0.64686469
   0.64686469 0.64686469 0.45874587 0.33993399]
    51: [0.42822549 0.42822549 0.42822549 0.42822549 0.42822549 0.42822549
   0.42822549 0.42822549 0.42822549 0.        ]

Image

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions