Skip to content

refactor: Prepare nullable column by default #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gab23r
Copy link
Contributor

@gab23r gab23r commented May 5, 2025

Motivation

towards #20

Changes

Changing to nullable=False by default would be a breaking change. For now, this set nullable to None and warn if nullable is not explicitly set.

Question, how to handle warning in tests, should I ignore them ? should I explicitly set nullable=True, nullable=False everywhere ?

Copy link

codecov bot commented May 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (54cbc75) to head (0ec0924).
Report is 16 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #31   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        39    +3     
  Lines         1788      1869   +81     
=========================================
+ Hits          1788      1869   +81     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AndreasAlbertQC
Copy link
Collaborator

Question, how to handle warning in tests, should I ignore them ? should I explicitly set nullable=True, nullable=False everywhere ?

I think most of the tests would be fine with nullable=False, so adding the new value there now will mean removal again soon (and it's a lot of lines).

I would propose to add dataframely/_deprecation.py and do this:

import os
import warnings
from collections.abc import Callable
from functools import wraps

TRUTHY_VALUES = ["1", "true"]


def skip_if(env: str) -> Callable:
    """Decorator to skip warnings based on environment variable.

    If the environment variable is equivalent to any of TRUTHY_VALUES, the wrapped
    function is skipped.
    """
    def decorator(fun: Callable) -> Callable:
        @wraps(fun)
        def wrapper() -> None:
            should_skip = os.getenv(env, "").lower() in TRUTHY_VALUES
            if should_skip:
                return
            fun()
        return wrapper
    return decorator



@skip_if(env="DATAFRAMELY_IGNORE_NULLABLE_DEFAULT")
def warn_nullable_default_change() -> None:
     # the actual warning goes here

and then we can import warn_nullable_default_change for use in Column. I think this is a neat solution because:

  1. It provides a pattern for future deprecations
  2. We can just set the environment variable to True in tests
  3. It provides the user with an easy way to turn off the warnings if they don't care

WDYT?

@gab23r
Copy link
Contributor Author

gab23r commented May 5, 2025

I don't know, I have no strong feeling, but more i think about it more I think that we should go for the simplest solution: Ignoring the warning in the test from the ini_options.
I will go forward this this, but would be happy to revert if necessary.

@AndreasAlbertQC
Copy link
Collaborator

@gab23r I like my version better, but I'm biased :D Let's get a tie breaker from @delsner or @borchero

@delsner
Copy link
Member

delsner commented May 9, 2025

I'm in favor of @AndreasAlbertQC's suggestion as I think it's a nice pattern for future deprecations and it's easy to implement and control.

pyproject.toml Outdated
@@ -83,6 +83,7 @@ module = ["pyarrow.*"]
addopts = "--import-mode=importlib"
filterwarnings = [
"ignore:datetime.datetime.utcfromtimestamp\\(\\) is deprecated.*:DeprecationWarning",
"ignore:The 'nullable' argument was not explicitly set.*:FutureWarning",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want our tests to generate warnings. I know this is a bit of work and please let us know if we can support this effort but we should adapt all of our tests 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just to make sure we are align before starting the work. You want to add explicitly nullable=False everywhere in the test suite ? And remove them (or not) in the future breaking release as it will be the default ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @AndreasAlbertQC about this again yesterday evening and we converged on: don't bother, let's ignore the warning for now. In almost all test cases, it does not matter whether a column is nullable.

Copy link
Collaborator

@AndreasAlbertQC AndreasAlbertQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @gab23r ! It took @delsner @borchero and me a little while to coordinate and agree on how we want to proceed. Sorry for the delay, it's out first time navigating this particular question in dataframely :)

Here's the plan we agreed on:

  1. We commit to making the actual breaking change only in a major release. We documented this commitment in docs: Document our approach to breaking changes #35. The exact timing of the major release is not set yet, but I think it's realistic to expect this in a few months.
  2. For the future warning in this PR, please implement the suggestion I made above for a warning function + environment-based skip. In docs: Document our approach to breaking changes #35, we also documented a new DATAFRAMELY_NO_FUTURE_WARNINGS environment variable that we'd like to use for this. The reasoning is that we want users to be able to disable the warning also at deployment time without having to write new code.
  3. For the dataframely tests, please just disable the warning in this PR. There is no need to migrate the tests. Our reasoning is that while we generally never want to disable warnings, future warnings coming out of our own code are a special case because a) future warnings never mean something is broken right now, only might be in the future and b) our CI will naturally catch actual breakage once the breaking change comes in.

Let me know if you're still up to implement these changes, otherwise I'm of course happy to help. Thanks for your patience.

@gab23r
Copy link
Contributor Author

gab23r commented May 14, 2025

This sounds really good to me! I will implement it!

Copy link
Collaborator

@AndreasAlbertQC AndreasAlbertQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gab23r , almost there, I only have one more suggestion on the test.

Comment on lines +11 to +23
def test_integer_constructor_warns_about_nullable() -> None:
original_value = os.environ.get("DATAFRAMELY_NO_FUTURE_WARNINGS")
os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = "0"
try:
with pytest.warns(
FutureWarning, match="The 'nullable' argument was not explicitly set"
):
dy.Integer()
finally:
if original_value is not None:
os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = original_value
elif "DATAFRAMELY_NO_FUTURE_WARNINGS" in os.environ:
del os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_integer_constructor_warns_about_nullable() -> None:
original_value = os.environ.get("DATAFRAMELY_NO_FUTURE_WARNINGS")
os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = "0"
try:
with pytest.warns(
FutureWarning, match="The 'nullable' argument was not explicitly set"
):
dy.Integer()
finally:
if original_value is not None:
os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = original_value
elif "DATAFRAMELY_NO_FUTURE_WARNINGS" in os.environ:
del os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"]
def test_column_constructor_warns_about_nullable(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setenv("DATAFRAMELY_NO_FUTURE_WARNINGS", "")
with pytest.warns(
FutureWarning, match="The 'nullable' argument was not explicitly set"
):
dy.Integer()
@pytest.mark.parametrize("env_var", ["1", "True", "true"]
def test_future_warning_skip(monkeypatch: pytest.MonkeyPatch, env_var: str) -> None:
monkeypatch.setenv("DATAFRAMELY_NO_FUTURE_WARNINGS", env_var)
# Elevates FutureWarning to an exception
with warnings.catch_warnings():
warnings.simplefilter("error", FutureWarning)
dy.Integer()

How about splitting this test into two?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants