Fix tie breaking and handle missing evals #8

piojanu · 2025-02-25T11:08:53Z

Before, when tie breaking, the suboptimal run could be picked (e.g. with best cost, but invalid program).

Missing evals can happen when there are problems with docker run evaluation (see the examples in my comments below).

ronch99 · 2025-02-25T15:39:35Z

Hi @piojanu, thanks for bringing my attention to this! It seems like the script is a previous version during development that even has the variable names messed around. I have pushed our correct script for this calculation, which aligns with your fix here.

However, I am not sure about the missing evals part -- there isn't supposed to be any missing evaluation log? Would you mind explaining a bit what issue did you have with this part and why you want to add that? Thanks!

piojanu · 2025-02-28T07:54:27Z

I didn't debug it throughly really, but whenever docker instance eval doesn't finish, you put an empty line for such an eval in the evaluations.jsonl. I only suspect it has something to do with the visual judge timeout or program execution timeout.

piojanu · 2025-03-03T12:49:15Z

@ronch99 Here is the example error that causes the evaluation to fail and produce an empty line in the evaluations.jsonl:

Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1863, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/opt/miniconda3/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py", line 43, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 53, in <module>
    from .loss.loss_utils import LOSS_MAPPING
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/loss/loss_utils.py", line 19, in <module>
    from .loss_deformable_detr import DeformableDetrForObjectDetectionLoss, DeformableDetrForSegmentationLoss
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/loss/loss_deformable_detr.py", line 4, in <module>
    from ..image_transforms import center_to_corners_format
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/image_transforms.py", line 22, in <module>
    from .image_utils import (
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/image_utils.py", line 84, in <module>
    import cv2
  File "/opt/miniconda3/lib/python3.10/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/opt/miniconda3/lib/python3.10/site-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
  File "/opt/miniconda3/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/testbed/compute_scores.py", line 86, in <module>
    main()
  File "/testbed/compute_scores.py", line 78, in main
    output_tuple = compute_scores(example, eval_program_path, gold_program_path)
  File "/testbed/compute_scores.py", line 31, in compute_scores
    cbs = codebert_score(example['task_inst'], pred_fname, gold_fname)
  File "/testbed/compute_scores.py", line 23, in codebert_score
    return code_bert_score.score(cands=[pred_program], refs=[gold_program], lang='python', sources=[task_inst])[2].item()
  File "/opt/miniconda3/lib/python3.10/site-packages/code_bert_score/score.py", line 112, in score
    model = get_model(model_type, num_layers, all_layers)
  File "/opt/miniconda3/lib/python3.10/site-packages/code_bert_score/utils.py", line 145, in get_model
    model = AutoModel.from_pretrained(model_type)
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 388, in _get_model_class
    supported_models = model_mapping[type(config)]
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 763, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 777, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 693, in getattribute_from_module
    if hasattr(module, attr):
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1851, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/opt/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1865, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.roberta.modeling_roberta because of the following error (look up to see its traceback):
libGL.so.1: cannot open shared object file: No such file or directory

Here are the example generated program by llama-70b that produced the error:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from biosppy.signals.ecg import ecg
from BioPsyKit import EcgProcessor

# Load ECG data and sampling rate
ecg_data = pd.read_pickle('benchmark/datasets/ecg_processing_data/ecg_data.pkl')
sampling_rate = float(open('benchmark/datasets/ecg_processing_data/sampling_rate.txt').read())

# Perform R peak detection and outlier correction
r_peaks, = ecg.ecg(signal=ecg_data['ecg'].values, sampling_rate=sampling_rate, show=False)

# Create EcgProcessor object
processor = EcgProcessor(r_peaks=r_peaks, sampling_rate=sampling_rate)

# Divide data into two phases
phase1_start = pd.to_datetime('2019-10-23 12:32:00+02:00')
phase1_end = pd.to_datetime('2019-10-23 12:35:00+02:00')
phase2_start = pd.to_datetime('2019-10-23 12:35:00+02:00')
phase2_end = pd.to_datetime('2019-10-23 12:38:00+02:00')

phase1_data = ecg_data[(ecg_data['time'] >= phase1_start) & (ecg_data['time'] <= phase1_end)]
phase2_data = ecg_data[(ecg_data['time'] >= phase2_start) & (ecg_data['time'] <= phase2_end)]

# Compute HRV for the second phase
hrv = processor.hrv_process(signal=phase2_data['ecg'].values)

# Create HRV plot
plt.figure(figsize=(10, 6))
plt.plot(hrv)
plt.xlabel('Time (s)')
plt.ylabel('Heart Rate Variability')
plt.title('HRV Plot for Phase 2')
plt.grid(True)
plt.savefig('pred_results/ecg_processing_vis2_pred_result.png', bbox_inches='tight')
plt.close()

ronch99 · 2025-03-03T15:52:48Z

@piojanu Thanks for the details! I wonder have you tried changing the dockerfile according to this post?

@flyhero99 or @btyu can maybe help to do a quick check as well

piojanu · 2025-03-04T11:17:06Z

@ronch99 Nope, I didn't.

Here is another error, this time during the instance docker build:

...
2025-03-04 00:23:59,352 - INFO - Collecting osgeo (from -r /testbed/instance_requirements.txt (line 2))
2025-03-04 00:23:59,560 - INFO - Downloading osgeo-0.0.1.tar.gz (1.2 kB)
2025-03-04 00:23:59,562 - INFO - Preparing metadata (setup.py): started
2025-03-04 00:23:59,665 - INFO - Preparing metadata (setup.py): finished with status 'done'
2025-03-04 00:23:59,668 - INFO - Building wheels for collected packages: osgeo
2025-03-04 00:23:59,668 - INFO - Building wheel for osgeo (setup.py): started
2025-03-04 00:23:59,785 - INFO - Building wheel for osgeo (setup.py): finished with status 'error'
2025-03-04 00:23:59,788 - INFO - error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [47 lines of output]
      running bdist_wheel
      running build
      /opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:79: SetuptoolsDeprecationWarning: setup.py install is deprecated.
      !!

              ********************************************************************************
              Please avoid running ``setup.py`` directly.
              Instead, use pypa/build, pypa/installer or other
              standards-based tools.

              See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
              ********************************************************************************

      !!
        self.initialize_options()
      installing to build/bdist.linux-x86_64/wheel
      running install
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-8is3hcy_/osgeo_e82f99b8b68d4f92a473181274054635/setup.py", line 32, in <module>
          setup(
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 186, in setup
          return run_commands(dist)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
          dist.run_commands()
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 983, in run_commands
          self.run_command(cmd)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 999, in run_command
          super().run_command(command)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
          cmd_obj.run()
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/command/bdist_wheel.py", line 414, in run
          self.run_command("install")
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 339, in run_command
          self.distribution.run_command(command)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 999, in run_command
          super().run_command(command)
        File "/opt/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-8is3hcy_/osgeo_e82f99b8b68d4f92a473181274054635/setup.py", line 27, in run
          raise Exception(error_msg)
      Exception: In order to be able to run `from osgeo import gdal`,
      You were probably trying to install `gdal` by running `pip install osgeo`.
      Instead, you should either `pip install gdal` or replace `osgeo` with `gdal` in your requirements.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
2025-03-04 00:23:59,788 - INFO - ERROR: Failed building wheel for osgeo
2025-03-04 00:23:59,788 - INFO - Running setup.py clean for osgeo
2025-03-04 00:23:59,865 - INFO - Failed to build osgeo
2025-03-04 00:24:00,146 - INFO - ERROR: Failed to build installable wheels for some pyproject.toml based projects (osgeo)
2025-03-04 00:24:00,303 - INFO - ---> Removed intermediate container 2565c28f78cf
2025-03-04 00:24:00,304 - ERROR - Error: The command '/bin/sh -c /opt/miniconda3/bin/pip install --exists-action i --no-cache-dir -r /testbed/instance_requirements.txt' returned a non-zero code: 1
2025-03-04 00:24:00,304 - ERROR - docker.errors.BuildError during sab.eval.x86_64.53:latest: The command '/bin/sh -c /opt/miniconda3/bin/pip install --exists-action i --no-cache-dir -r /testbed/instance_requirements.txt' returned a non-zero code: 1

Here is the program generated by llama-70b:

import os
import numpy as np
from osgeo import gdal

# Define the classification dictionaries
landCover_classification = {
  11:10,#Open Water
  21:8,#Developed, Open Space
  22:7,#Developed, Low Intensity
  23:8,#Developed, Medium Intensity
  24:9,#Developed, High Intensity
  31:6,#Barren Land
  41:2,#Deciduous Forest
  42:1,#Evergreen Forest
  43:2,#Mixed Forest
  52:3,#Shrub/Scrub
  71:3,#Grassland/Herbaceous
  72:3,#Sedge/Herbaceous
  81:4,#Hay/Pasture
  82:6,#Cultivated Crops
  90:4,#Woody Wetlands
  95:4,#Emergent Herbaceous Wetlands
  255:10
}

protected_status_classification = {
  0:1,
  1:3,
  2:6,
  3:9,
  4:10,
  255:10
}

# Define the input and output file paths
input_landCover_file = 'benchmark/datasets/MountainLionNew/landCover.tif'
input_protected_status_file = 'benchmark/datasets/MountainLionNew/Protected_Status.tif'
output_landCover_file = 'pred_results/landCover_reclassified.tif'
output_protected_status_file = 'pred_results/protected_status_reclassified.tif'

# Open the input files
landCover_dataset = gdal.Open(input_landCover_file)
protected_status_dataset = gdal.Open(input_protected_status_file)

# Get the input data
landCover_data = landCover_dataset.GetRasterBand(1).ReadAsArray()
protected_status_data = protected_status_dataset.GetRasterBand(1).ReadAsArray()

# Reclassify the data
reclassified_landCover_data = np.vectorize(landCover_classification.get)(landCover_data)
reclassified_protected_status_data = np.vectorize(protected_status_classification.get)(protected_status_data)

# Create output files
driver = gdal.GetDriverByName('GTiff')
output_landCover_dataset = driver.CreateCopy(output_landCover_file, landCover_dataset, 1, [gdal.GDT_Byte])
output_protected_status_dataset = driver.CreateCopy(output_protected_status_file, protected_status_dataset, 1, [gdal.GDT_Byte])

# Write the reclassified data to the output files
output_landCover_dataset.GetRasterBand(1).WriteArray(reclassified_landCover_data)
output_protected_status_dataset.GetRasterBand(1).WriteArray(reclassified_protected_status_data)

# Close the datasets
landCover_dataset = None
protected_status_dataset = None
output_landCover_dataset = None
output_protected_status_dataset = None

Fix tie breaking and handle missing evals

5ce8705

piojanu mentioned this pull request Feb 25, 2025

Bug in the best run selection in calculate_metrics.py #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tie breaking and handle missing evals #8

Fix tie breaking and handle missing evals #8

piojanu commented Feb 25, 2025 •

edited

Loading

ronch99 commented Feb 25, 2025

piojanu commented Feb 28, 2025

piojanu commented Mar 3, 2025

ronch99 commented Mar 3, 2025

piojanu commented Mar 4, 2025

Fix tie breaking and handle missing evals #8

Are you sure you want to change the base?

Fix tie breaking and handle missing evals #8

Conversation

piojanu commented Feb 25, 2025 • edited Loading

ronch99 commented Feb 25, 2025

piojanu commented Feb 28, 2025

piojanu commented Mar 3, 2025

ronch99 commented Mar 3, 2025

piojanu commented Mar 4, 2025

piojanu commented Feb 25, 2025 •

edited

Loading