Skip to content

Stepwise LR scheduler #20211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

01AbhiSingh
Copy link
Contributor

@01AbhiSingh 01AbhiSingh commented Aug 18, 2024

What does this PR do?

Fixes #<17544>

Hii @awaelchli. Can you please verify the changes I made. If they are correct then i will take up and correct any failing tests also.

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

hii Reviewer checklist - [ ] Is this pull request ready for review? (if not, please submit in draft mode) - [ ] Check that all items from **Before submitting** are resolved - [ ] Make sure the title is self-explanatory and the description concisely explains the PR - [ ] Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20211.org.readthedocs.build/en/20211/

01AbhiSingh and others added 13 commits July 23, 2024 20:03
for more information, see https://pre-commit.ci

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
for more information, see https://pre-commit.ci

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 18, 2024
Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79%. Comparing base (ea59e40) to head (337c1c2).
Report is 43 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (ea59e40) and HEAD (337c1c2). Click for more details.

HEAD has 2247 uploads less than BASE
Flag BASE (ea59e40) HEAD (337c1c2)
cpu 525 24
lightning_fabric 67 0
pytest 266 0
python3.9 132 6
lightning 394 18
python3.10 66 3
python3.11 132 6
python3.12.7 195 9
gpu 2 0
pytorch2.1 99 9
pytest-full 261 24
pytorch2.2.2 33 3
pytorch_lightning 66 6
pytorch2.3 33 3
pytorch2.4.1 31 3
pytorch2.5.1 65 6
Additional details and impacted files
@@            Coverage Diff            @@
##           master   #20211     +/-   ##
=========================================
- Coverage      88%      79%     -9%     
=========================================
  Files         267      264      -3     
  Lines       23380    23325     -55     
=========================================
- Hits        20481    18366   -2115     
- Misses       2899     4959   +2060     

01AbhiSingh and others added 4 commits August 21, 2024 12:10

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@01AbhiSingh
Copy link
Contributor Author

Hii @Borda. Do I need to make any kind of changes in the PR ?

@lantiga
Copy link
Collaborator

lantiga commented Oct 7, 2024

This looks good, thank you for the contribution @01AbhiSingh

Ideally we could add a test to verify the behavior described in #17544. The current test suite can't detect the current change and this is usually a sign of insufficient coverage. Would you be willing to contribute such test?

@01AbhiSingh
Copy link
Contributor Author

Yes, sure let me look into it.

@01AbhiSingh
Copy link
Contributor Author

Hi @lantiga , Do you want a new test written from scratch or need me to make the necessary changes in a preexisting file? All the tests have been passed. If the changes need to be made in a preexisting file, it would be very helpful if you could point out the test in which I need to make the changes, as all the tests have been passed, and due to that, I can't find the test.

@lantiga
Copy link
Collaborator

lantiga commented Nov 12, 2024

hey @01AbhiSingh sorry for the wait

You can take inspiration from:

def test_lr_scheduler_epoch_step_frequency(mocked_sched, check_val_every_n_epoch, tmp_path):

and add a new test where scheduling goes across epoch boundaries. Maybe @falckt can help too?

@lantiga
Copy link
Collaborator

lantiga commented Dec 11, 2024

Hey @01AbhiSingh can you import LightningModule here?

https://github.com/Lightning-AI/pytorch-lightning/pull/20211/files#diff-3c3f104dbdd06271c9e6e6d4fdf61398458148412401dd55a9bac1e9b5f913a8R19

Change:

from lightning.pytorch import Trainer

to

from lightning.pytorch import Trainer, LightningModule

this should fix the failing test

@01AbhiSingh
Copy link
Contributor Author

Yeah, my bad. Forgot to add it even after seeing it. Done, please check.

@01AbhiSingh
Copy link
Contributor Author

https://github.com/Lightning-AI/pytorch-lightning/actions/runs/12291356552/job/34299991507?pr=20211#:~:text=FAILED%20utilities/test_data.py%3A%3Atest_update_dataloader_typerror_custom_exception%20%2D%20AssertionError%3A%20Regex%20pattern%20did%20not%20match.

This is the test that is currently failing.

def train_dataloader(self):
           # Create a simple dataset for testing
           x = torch.randn(21, 32)  # 7 batches of size 3
           y = torch.randn(21, 2)
           return DataLoader(TensorDataset(x, y), batch_size=3)

should I add this and try to run the test again ?

@lantiga
Copy link
Collaborator

lantiga commented Dec 12, 2024

Go for it : )

You can also run this kind of test locally with pytest tests/tests_pytorch/<test_file>.py::<name_of_test> to make things quicker on your end. This test in particular can be ran on any machine (and you can use Lightning Studios for free if you want to run on GPUs ofc)

@01AbhiSingh
Copy link
Contributor Author

01AbhiSingh commented Dec 12, 2024

Go for it : )

You can also run this kind of test locally with pytest tests/tests_pytorch/<test_file>.py::<name_of_test> to make things quicker on your end. This test in particular can be ran on any machine (and you can use Lightning Studios for free if you want to run on GPUs ofc)

I actually tried to run the test locally with the method you suggested but this error keeps showing up ERROR: file or directory not found: tests/tests_pytorch/test_optimizers.py anyway I am trying to solve this problem on my local env.

Edit: I've solved this problem, will now update the PR only when it's running perfectly on my local environment. Thanks :)

Another Edit 😝 : updated the PR please check

01AbhiSingh and others added 8 commits December 12, 2024 19:43

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
for more information, see https://pre-commit.ci
…pytorch-lightning into stepwiseLRscheduler
for more information, see https://pre-commit.ci
@01AbhiSingh
Copy link
Contributor Author

Test passing on my local environment but not in the PR in the repo.

@mergify mergify bot added the has conflicts label Feb 3, 2025
@01AbhiSingh
Copy link
Contributor Author

I think this time it is all done. Can you please check once ? @lantiga

Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a couple of comments

trainer.fit(model)

# Debug print statements
print(f"Mocked scheduler step calls: {mocked_sched.call_count}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the debug statements, I'd just convert them to asserts that compare the values with expected ones.

def training_step(self, batch, batch_idx):
# Add print statement to track batch index and global step
if hasattr(self, 'trainer'):
print(f"Batch idx: {batch_idx}, Global step: {self.trainer.global_step}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print statements in tests are not super helpful, just use asserts so the test will break if we don't get the expected value here.


# Assert that the scheduler was called the expected number of times
# Allow for a small difference due to environment or rounding discrepancies
assert abs(mocked_sched.call_count - expected_steps) <= 1, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why there should be rounding discrepancies. Shouldn't this be fully deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the test was passing in my local environment but not in the CI / CD pipeline for some reason. I forgot to change it later. Let me correct it asap.

@mergify mergify bot removed the has conflicts label Feb 3, 2025
Copy link

stale bot commented Apr 16, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Apr 16, 2025
Copy link

stale bot commented Apr 27, 2025

This pull request is going to be closed. Please feel free to reopen it or create a new one based on top of the 'master' branch.

@stale stale bot closed this Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package waiting on author Waiting on user action, correction, or update won't fix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants