Skip to content

feat: add WES models with unit tests #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Karanjot786
Copy link
Member

@Karanjot786 Karanjot786 commented Mar 18, 2025

This PR enhances type safety and data validation for WES by introducing Pydantic models conforming to GA4GH schemas. Comprehensive unit tests ensure integrity and functionality, building consistency with the existing TES models.


Changes

Added Models:

  • wes_models.py:
    • Includes WES-related Pydantic models such as:
      • WESState
      • WESOutputs
      • WESLog
      • WESTaskLog
      • WESRunRequest
      • WESData

Added Unit Tests:

  • test_wes_models.py:
    • Covers:
      • Valid instances
      • Validation errors
      • Edge cases

Updated Package Initialization:

  • __init__.py:
    • Added imports for WES models.
    • Defined __all__ to specify the public API.

Documentation

  • Comprehensive docstrings added for all public classes and methods in wes_models.py to improve readability and maintainability.

Linting and Testing

  • Resolved all linting errors with Ruff:
    • Ensured proper import ordering.
    • Addressed datetime validation to handle strings and datetime objects.
  • Achieved full coverage with successful unit tests.

Attribution

  • Co-authored by @SalihuDickson.
  • Approach based on existing TES models to maintain architectural consistency.

Notes

  • This PR focuses on adding WES models and related unit tests.
  • Follow-up tasks:
    • Integrate these models into wes_converter.py.
    • Add related tests in future PRs for focused reviews.

Summary by Sourcery

Adds Pydantic models for WES (Workflow Execution Service) based on GA4GH schemas, enhancing type safety and data validation. Includes unit tests to ensure model integrity and functionality, maintaining consistency with existing TES models.

New Features:

  • Introduces WES-related Pydantic models including WESState, WESOutputs, WESLog, WESTaskLog, WESRunRequest, and WESData.
  • Adds unit tests for WES models, covering valid instances, validation errors, and edge cases.
  • Defines all in init.py to specify the public API for WES models.
  • Adds comprehensive docstrings for all public classes and methods in wes_models.py to improve readability and maintainability.
  • Adds rfc3339-validator to validate datetime format

Tests:

  • Adds unit tests for WES models, covering valid instances, validation errors, and edge cases.

Copy link

sourcery-ai bot commented Mar 18, 2025

Reviewer's Guide by Sourcery

This PR introduces Pydantic models for WES components, mirroring the structure of existing TES models. It includes comprehensive unit tests to ensure proper validation and functionality. The package initialization was also updated to include the new WES models.

Class diagram for WES Models

classDiagram
    class WESState {
        UNKNOWN
        QUEUED
        INITIALIZING
        RUNNING
        PAUSED
        COMPLETE
        EXECUTOR_ERROR
        SYSTEM_ERROR
        CANCELLED
        CANCELING
        PREEMPTED
    }
    class WESOutputs {
        +str location
        +str name
    }
    class WESLog {
        +Optional[str] name
        +Optional[str] start_time
        +Optional[str] end_time
        +Optional[List[str]] cmd
        +Optional[str] stdout
        +Optional[str] stderr
        +Optional[int] exit_code
        +Optional[List[str]] system_logs
        +validate_datetime()
        +truncate_system_logs()
    }
    class WESTaskLog {
        +str id
        +Optional[str] tes_uri
        +str name
    }
    WESLog <|-- WESTaskLog
    class WESRunRequest {
        +dict[str, str] workflow_params
        +str workflow_type
        +str workflow_type_version
        +Optional[dict[str, str]] tags
        +Optional[dict[str, str]] workflow_engine_parameters
        +Optional[str] workflow_engine
        +Optional[str] workflow_engine_version
        +str workflow_url
        +validate_workflow_engine()
    }
    class WESData {
        +str run_id
        +Optional[WESRunRequest] request
        +Optional[WESState] state
        +Optional[WESLog] run_log
        +Optional[str] task_logs_url
        +Optional[List[Union[WESLog, WESTaskLog]]] task_logs
        +dict[str, str] outputs
        +check_deprecated_fields()
    }
    WESData -- WESRunRequest : has
    WESData -- WESState : has
    WESData -- WESLog : has
    WESData -- WESTaskLog : has

    note for WESLog "start_time and end_time must be in RFC 3339 format"
    note for WESData "task_logs is deprecated, use task_logs_url instead"
Loading

File-Level Changes

Change Details Files
Addition of Pydantic models for WES (Workflow Execution Service) components, mirroring the structure of existing TES (Task Execution Service) models.
  • Created WESState enum representing workflow states.
  • Defined WESOutputs model for workflow outputs.
  • Implemented WESLog model for workflow run logs.
  • Created WESTaskLog model for individual task logs.
  • Defined WESRunRequest model for workflow run requests.
  • Implemented WESData model to encapsulate WES run data.
crategen/models/wes_models.py
Implementation of unit tests for the newly added WES models, ensuring proper validation and functionality.
  • Developed tests for valid instances of each model.
  • Implemented tests for validation errors and edge cases.
  • Ensured comprehensive coverage of all model functionalities.
tests/unit/test_wes_models.py
Update package initialization to include the new WES models.
  • Added imports for all WES models.
  • Updated __all__ to include WES models in the public API.
crategen/models/__init__.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Karanjot786 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider using datetime.datetime objects instead of strings for start_time and end_time in the WESLog model to improve type safety.
  • The truncate_system_logs validator in WESLog could be simplified using value[:MAX_SYSTEM_LOGS] without the conditional.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 2 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

assert wes_data.run_log == run_log
assert wes_data.task_logs_url.startswith("https://")
assert wes_data.task_logs == [task_log]
assert all(url.startswith("https://") for url in wes_data.outputs.values())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider testing different URL schemes in outputs

The current test assumes all URLs in outputs start with "https://". It would be more robust to test with different URL schemes (e.g., "s3://", "file://") or even relative paths to ensure the model handles them correctly. This would improve test coverage for different storage scenarios.

@Karanjot786 Karanjot786 requested review from ahembal and uniqueg and removed request for ahembal March 18, 2025 06:13
@uniqueg
Copy link
Member

uniqueg commented Mar 19, 2025

Thanks a lot @Karanjot786 - please address the Sourcery AI comments, they are all reasonable, then ask me and @SalihuDickson for a re-review 🙏

@Karanjot786
Copy link
Member Author

@uniqueg @SalihuDickson - Could you please review the changes?

@uniqueg uniqueg requested a review from SalihuDickson March 24, 2025 06:16
Copy link
Member

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Karanjot786. Looks okay to me - but I would prefer for @SalihuDickson to have a look as well first, before merging.

@SalihuDickson
Copy link
Contributor

Thanks @Karanjot786. Looks okay to me - but I would prefer for @SalihuDickson to have a look as well first, before merging.

Sure, @uniqueg. Sorry for the delayed response. I've been extremely busy, but I'll take a look at it in the next day or two.

MAX_SYSTEM_LOGS = 2


class WESState(str, Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the class names i think its best to just use the exact names listed in the WES docs. This would ensure that the references match exactly and make all this much simpler to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll update the class names so they exactly match the names used in the official WES documentation. This change will improve readability and ensure consistency with the reference.


- **UNKNOWN**: The state of the workflow is unknown.
- **QUEUED**: The workflow is queued.
- **INITIALIZING**: The workflow is initializing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment I made on the class names applies here as well. In the instances where a short and clear description has already been made available in the docs, there is really no need to write something different, you should just copy the first line or first sentence, like you did with the UNKNOWN and QUEUED attributes above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll modify the descriptions for the enum values to use the first sentence from the docs directly. This way, the descriptions for states like UNKNOWN, QUEUED, and INITIALIZING will be consistent with the official definitions.

PREEMPTED = "PREEMPTED"


class WESOutputs(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please adjust the class name here as well, to match with the one listed in the reference docs.
  2. Where did you get the attributes for the output? The reference you cited seems to be made up. And the output specification does not match what is in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the reference for outputs seems off. I will review the official WES docs and adjust the class name and its attributes accordingly. The updated model will correctly reflect the output specifications as per the reference.


- **name** (`Optional[str]`): The task or workflow name.
- **cmd** (`Optional[list[str]]`): The command line that was executed.
- **start_time** (`Optional[str]`): When the command started executing, in RFC 3339 format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the description in the docs, we don't want conflicting descriptions. Do the same for all the descriptions for the attributes here that don't match.

But please note, you don't need to copy the entire description, just the first sentence should be okay. Unless the first sentence is not descriptive enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all models (e.g., WESLog, WESTaskLog, etc.), I’ll replace my current descriptions with the first sentence from the official documentation. This ensures no conflicts arise between our descriptions and the official docs.

name: str


class WESLog(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust the class name to match the reference in the docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the WESTaskLog class name to match exactly what is specified in the WES reference documentation.

name: str = Field(...)


class WESRunRequest(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class should match the name in the listed reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll verify name is required and that other fields conform exactly to the WES specification.


**Attributes:**

- **workflow_params** (`Optional[dict[str, str]]`): The workflow run parameterizations (JSON encoded),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the description say optional when it is required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll correct the docstring for workflow_params to reflect that it’s a required field.

return values


class WESData(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust the class name to match the name of the model as listed in the reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll ensure the root validators properly return values and follow best practices.

This attribute is deprecated.
- **outputs** (`dict[str, str]`): The outputs of the WES run.

**Reference:** https://ga4gh.github.io/workflow-execution-service-schemas/docs/#tag/run_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reference seems to be made up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll rename WESData to match the docs and revise the outputs description if necessary.


**Attributes:**

- **run_id** (`str`): The unique identifier for the WES run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the descriptions applies here as well. Check the descriptions for all the other attributes as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll confirm that run_id aligns with the official specs and update if needed.

Copy link
Contributor

@SalihuDickson SalihuDickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far @Karanjot786, please take a look at these changes and resubmit for review.

Do not forget to ask for clarification if there is something you don't understand or that you are not sure of. Making changes you're unsure of will only create more work for reviewers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should only be testing the validation functions. Most of these tests are testing pydantic itself, you don't want to be testing external packages, such tests are completely unnecessary and considered bad practice.

Please rewrite this module and submit for another review.

@Karanjot786
Copy link
Member Author

Hi @SalihuDickson and @uniqueg,

I've rewritten the module based on your feedback.

Update WES Models to Conform to GA4GH Spec

  • Renamed classes to match official documentation:

    • WESStateState
    • WESLogLog
    • WESTaskLogTaskLog
    • WESRunRequestRunRequest
    • Added new Run model to represent a workflow run.
  • Updated docstrings to use concise, one-sentence descriptions directly from the GA4GH WES documentation.

  • Removed truncation logic for system_logs since the spec does not impose any limits.

  • Ensured RFC 3339 datetime validation is correctly implemented in Log.

  • Added a root validator in RunRequest to enforce that workflow_engine is provided when workflow_engine_version is set.

  • Updated tests in test_wes_models.py to focus on our custom validation and model behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants