refactor: Prepare nullable column by default #31

gab23r · 2025-05-05T09:44:07Z

Motivation

towards #20

Changes

Changing to nullable=False by default would be a breaking change. For now, this set nullable to None and warn if nullable is not explicitly set.

Question, how to handle warning in tests, should I ignore them ? should I explicitly set nullable=True, nullable=False everywhere ?

codecov · 2025-05-05T09:45:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (54cbc75) to head (0ec0924).
Report is 16 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #31   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           36        39    +3     
  Lines         1788      1869   +81     
=========================================
+ Hits          1788      1869   +81

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AndreasAlbertQC · 2025-05-05T10:13:37Z

Question, how to handle warning in tests, should I ignore them ? should I explicitly set nullable=True, nullable=False everywhere ?

I think most of the tests would be fine with nullable=False, so adding the new value there now will mean removal again soon (and it's a lot of lines).

I would propose to add dataframely/_deprecation.py and do this:

import os
import warnings
from collections.abc import Callable
from functools import wraps

TRUTHY_VALUES = ["1", "true"]


def skip_if(env: str) -> Callable:
    """Decorator to skip warnings based on environment variable.

    If the environment variable is equivalent to any of TRUTHY_VALUES, the wrapped
    function is skipped.
    """
    def decorator(fun: Callable) -> Callable:
        @wraps(fun)
        def wrapper() -> None:
            should_skip = os.getenv(env, "").lower() in TRUTHY_VALUES
            if should_skip:
                return
            fun()
        return wrapper
    return decorator



@skip_if(env="DATAFRAMELY_IGNORE_NULLABLE_DEFAULT")
def warn_nullable_default_change() -> None:
     # the actual warning goes here

and then we can import warn_nullable_default_change for use in Column. I think this is a neat solution because:

It provides a pattern for future deprecations
We can just set the environment variable to True in tests
It provides the user with an easy way to turn off the warnings if they don't care

WDYT?

gab23r · 2025-05-05T12:08:15Z

I don't know, I have no strong feeling, but more i think about it more I think that we should go for the simplest solution: Ignoring the warning in the test from the ini_options.
I will go forward this this, but would be happy to revert if necessary.

AndreasAlbertQC · 2025-05-06T19:02:18Z

@gab23r I like my version better, but I'm biased :D Let's get a tie breaker from @delsner or @borchero

delsner · 2025-05-09T12:54:59Z

I'm in favor of @AndreasAlbertQC's suggestion as I think it's a nice pattern for future deprecations and it's easy to implement and control.

borchero · 2025-05-09T21:29:02Z

pyproject.toml

@@ -83,6 +83,7 @@ module = ["pyarrow.*"]
 addopts = "--import-mode=importlib"
 filterwarnings = [
  "ignore:datetime.datetime.utcfromtimestamp\\(\\) is deprecated.*:DeprecationWarning",
+  "ignore:The 'nullable' argument was not explicitly set.*:FutureWarning",


I don't want our tests to generate warnings. I know this is a bit of work and please let us know if we can support this effort but we should adapt all of our tests 😅

Ok, just to make sure we are align before starting the work. You want to add explicitly nullable=False everywhere in the test suite ? And remove them (or not) in the future breaking release as it will be the default ?

I talked to @AndreasAlbertQC about this again yesterday evening and we converged on: don't bother, let's ignore the warning for now. In almost all test cases, it does not matter whether a column is nullable.

AndreasAlbertQC

hey @gab23r ! It took @delsner @borchero and me a little while to coordinate and agree on how we want to proceed. Sorry for the delay, it's out first time navigating this particular question in dataframely :)

Here's the plan we agreed on:

We commit to making the actual breaking change only in a major release. We documented this commitment in docs: Document our approach to breaking changes #35. The exact timing of the major release is not set yet, but I think it's realistic to expect this in a few months.
For the future warning in this PR, please implement the suggestion I made above for a warning function + environment-based skip. In docs: Document our approach to breaking changes #35, we also documented a new DATAFRAMELY_NO_FUTURE_WARNINGS environment variable that we'd like to use for this. The reasoning is that we want users to be able to disable the warning also at deployment time without having to write new code.
For the dataframely tests, please just disable the warning in this PR. There is no need to migrate the tests. Our reasoning is that while we generally never want to disable warnings, future warnings coming out of our own code are a special case because a) future warnings never mean something is broken right now, only might be in the future and b) our CI will naturally catch actual breakage once the breaking change comes in.

Let me know if you're still up to implement these changes, otherwise I'm of course happy to help. Thanks for your patience.

gab23r · 2025-05-14T14:09:51Z

This sounds really good to me! I will implement it!

AndreasAlbertQC

Thanks @gab23r , almost there, I only have one more suggestion on the test.

AndreasAlbertQC · 2025-05-14T16:45:43Z

tests/test_deprecation.py

+def test_integer_constructor_warns_about_nullable() -> None:
+    original_value = os.environ.get("DATAFRAMELY_NO_FUTURE_WARNINGS")
+    os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = "0"
+    try:
+        with pytest.warns(
+            FutureWarning, match="The 'nullable' argument was not explicitly set"
+        ):
+            dy.Integer()
+    finally:
+        if original_value is not None:
+            os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = original_value
+        elif "DATAFRAMELY_NO_FUTURE_WARNINGS" in os.environ:
+            del os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"]


Suggested change

def test_integer_constructor_warns_about_nullable() -> None:

original_value = os.environ.get("DATAFRAMELY_NO_FUTURE_WARNINGS")

os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = "0"

try:

with pytest.warns(

FutureWarning, match="The 'nullable' argument was not explicitly set"

):

dy.Integer()

finally:

if original_value is not None:

os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"] = original_value

elif "DATAFRAMELY_NO_FUTURE_WARNINGS" in os.environ:

del os.environ["DATAFRAMELY_NO_FUTURE_WARNINGS"]

def test_column_constructor_warns_about_nullable(monkeypatch: pytest.MonkeyPatch) -> None:

monkeypatch.setenv("DATAFRAMELY_NO_FUTURE_WARNINGS", "")

with pytest.warns(

FutureWarning, match="The 'nullable' argument was not explicitly set"

):

dy.Integer()

@pytest.mark.parametrize("env_var", ["1", "True", "true"]

def test_future_warning_skip(monkeypatch: pytest.MonkeyPatch, env_var: str) -> None:

monkeypatch.setenv("DATAFRAMELY_NO_FUTURE_WARNINGS", env_var)

# Elevates FutureWarning to an exception

with warnings.catch_warnings():

warnings.simplefilter("error", FutureWarning)

dy.Integer()

How about splitting this test into two?

nullable default to None

2af9ac0

gab23r requested review from borchero, AndreasAlbertQC and delsner as code owners May 5, 2025 09:44

ignore warning

0ddcc74

borchero reviewed May 9, 2025

View reviewed changes

AndreasAlbertQC requested changes May 14, 2025

View reviewed changes

gabriel.g.robin added 2 commits May 14, 2025 14:33

use DATAFRAMELY_NO_FUTURE_WARNINGS

f10c852

add test future warning

0ec0924

AndreasAlbertQC requested changes May 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Prepare nullable column by default #31

refactor: Prepare nullable column by default #31

gab23r commented May 5, 2025

codecov bot commented May 5, 2025 •

edited

Loading

AndreasAlbertQC commented May 5, 2025

gab23r commented May 5, 2025

AndreasAlbertQC commented May 6, 2025

delsner commented May 9, 2025

borchero May 9, 2025

gab23r May 14, 2025

borchero May 14, 2025

AndreasAlbertQC left a comment

gab23r commented May 14, 2025

AndreasAlbertQC left a comment

AndreasAlbertQC May 14, 2025

refactor: Prepare nullable column by default #31

Are you sure you want to change the base?

refactor: Prepare nullable column by default #31

Conversation

gab23r commented May 5, 2025

Motivation

Changes

codecov bot commented May 5, 2025 • edited Loading

Codecov Report

AndreasAlbertQC commented May 5, 2025

gab23r commented May 5, 2025

AndreasAlbertQC commented May 6, 2025

delsner commented May 9, 2025

borchero May 9, 2025

Choose a reason for hiding this comment

gab23r May 14, 2025

Choose a reason for hiding this comment

borchero May 14, 2025

Choose a reason for hiding this comment

AndreasAlbertQC left a comment

Choose a reason for hiding this comment

gab23r commented May 14, 2025

AndreasAlbertQC left a comment

Choose a reason for hiding this comment

AndreasAlbertQC May 14, 2025

Choose a reason for hiding this comment

codecov bot commented May 5, 2025 •

edited

Loading