-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add WES models with unit tests #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis PR introduces Pydantic models for WES components, mirroring the structure of existing TES models. It includes comprehensive unit tests to ensure proper validation and functionality. The package initialization was also updated to include the new WES models. Class diagram for WES ModelsclassDiagram
class WESState {
UNKNOWN
QUEUED
INITIALIZING
RUNNING
PAUSED
COMPLETE
EXECUTOR_ERROR
SYSTEM_ERROR
CANCELLED
CANCELING
PREEMPTED
}
class WESOutputs {
+str location
+str name
}
class WESLog {
+Optional[str] name
+Optional[str] start_time
+Optional[str] end_time
+Optional[List[str]] cmd
+Optional[str] stdout
+Optional[str] stderr
+Optional[int] exit_code
+Optional[List[str]] system_logs
+validate_datetime()
+truncate_system_logs()
}
class WESTaskLog {
+str id
+Optional[str] tes_uri
+str name
}
WESLog <|-- WESTaskLog
class WESRunRequest {
+dict[str, str] workflow_params
+str workflow_type
+str workflow_type_version
+Optional[dict[str, str]] tags
+Optional[dict[str, str]] workflow_engine_parameters
+Optional[str] workflow_engine
+Optional[str] workflow_engine_version
+str workflow_url
+validate_workflow_engine()
}
class WESData {
+str run_id
+Optional[WESRunRequest] request
+Optional[WESState] state
+Optional[WESLog] run_log
+Optional[str] task_logs_url
+Optional[List[Union[WESLog, WESTaskLog]]] task_logs
+dict[str, str] outputs
+check_deprecated_fields()
}
WESData -- WESRunRequest : has
WESData -- WESState : has
WESData -- WESLog : has
WESData -- WESTaskLog : has
note for WESLog "start_time and end_time must be in RFC 3339 format"
note for WESData "task_logs is deprecated, use task_logs_url instead"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Karanjot786 - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider using
datetime.datetime
objects instead of strings forstart_time
andend_time
in theWESLog
model to improve type safety. - The
truncate_system_logs
validator inWESLog
could be simplified usingvalue[:MAX_SYSTEM_LOGS]
without the conditional.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 2 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
tests/unit/test_wes_models.py
Outdated
assert wes_data.run_log == run_log | ||
assert wes_data.task_logs_url.startswith("https://") | ||
assert wes_data.task_logs == [task_log] | ||
assert all(url.startswith("https://") for url in wes_data.outputs.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider testing different URL schemes in outputs
The current test assumes all URLs in outputs
start with "https://". It would be more robust to test with different URL schemes (e.g., "s3://", "file://") or even relative paths to ensure the model handles them correctly. This would improve test coverage for different storage scenarios.
Thanks a lot @Karanjot786 - please address the Sourcery AI comments, they are all reasonable, then ask me and @SalihuDickson for a re-review 🙏 |
@uniqueg @SalihuDickson - Could you please review the changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Karanjot786. Looks okay to me - but I would prefer for @SalihuDickson to have a look as well first, before merging.
Sure, @uniqueg. Sorry for the delayed response. I've been extremely busy, but I'll take a look at it in the next day or two. |
crategen/models/wes_models.py
Outdated
MAX_SYSTEM_LOGS = 2 | ||
|
||
|
||
class WESState(str, Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the class names i think its best to just use the exact names listed in the WES docs. This would ensure that the references match exactly and make all this much simpler to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll update the class names so they exactly match the names used in the official WES documentation. This change will improve readability and ensure consistency with the reference.
crategen/models/wes_models.py
Outdated
|
||
- **UNKNOWN**: The state of the workflow is unknown. | ||
- **QUEUED**: The workflow is queued. | ||
- **INITIALIZING**: The workflow is initializing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment I made on the class names applies here as well. In the instances where a short and clear description has already been made available in the docs, there is really no need to write something different, you should just copy the first line or first sentence, like you did with the UNKNOWN and QUEUED attributes above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll modify the descriptions for the enum values to use the first sentence from the docs directly. This way, the descriptions for states like UNKNOWN, QUEUED, and INITIALIZING will be consistent with the official definitions.
crategen/models/wes_models.py
Outdated
PREEMPTED = "PREEMPTED" | ||
|
||
|
||
class WESOutputs(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please adjust the class name here as well, to match with the one listed in the reference docs.
- Where did you get the attributes for the output? The reference you cited seems to be made up. And the output specification does not match what is in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the reference for outputs seems off. I will review the official WES docs and adjust the class name and its attributes accordingly. The updated model will correctly reflect the output specifications as per the reference.
crategen/models/wes_models.py
Outdated
|
||
- **name** (`Optional[str]`): The task or workflow name. | ||
- **cmd** (`Optional[list[str]]`): The command line that was executed. | ||
- **start_time** (`Optional[str]`): When the command started executing, in RFC 3339 format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the description in the docs, we don't want conflicting descriptions. Do the same for all the descriptions for the attributes here that don't match.
But please note, you don't need to copy the entire description, just the first sentence should be okay. Unless the first sentence is not descriptive enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all models (e.g., WESLog, WESTaskLog, etc.), I’ll replace my current descriptions with the first sentence from the official documentation. This ensures no conflicts arise between our descriptions and the official docs.
crategen/models/wes_models.py
Outdated
name: str | ||
|
||
|
||
class WESLog(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please adjust the class name to match the reference in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the WESTaskLog
class name to match exactly what is specified in the WES reference documentation.
crategen/models/wes_models.py
Outdated
name: str = Field(...) | ||
|
||
|
||
class WESRunRequest(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class should match the name in the listed reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll verify name
is required and that other fields conform exactly to the WES specification.
crategen/models/wes_models.py
Outdated
|
||
**Attributes:** | ||
|
||
- **workflow_params** (`Optional[dict[str, str]]`): The workflow run parameterizations (JSON encoded), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the description say optional when it is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll correct the docstring for workflow_params
to reflect that it’s a required field.
crategen/models/wes_models.py
Outdated
return values | ||
|
||
|
||
class WESData(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjust the class name to match the name of the model as listed in the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll ensure the root validators properly return values and follow best practices.
crategen/models/wes_models.py
Outdated
This attribute is deprecated. | ||
- **outputs** (`dict[str, str]`): The outputs of the WES run. | ||
|
||
**Reference:** https://ga4gh.github.io/workflow-execution-service-schemas/docs/#tag/run_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reference seems to be made up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll rename WESData
to match the docs and revise the outputs
description if necessary.
crategen/models/wes_models.py
Outdated
|
||
**Attributes:** | ||
|
||
- **run_id** (`str`): The unique identifier for the WES run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with the descriptions applies here as well. Check the descriptions for all the other attributes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll confirm that run_id
aligns with the official specs and update if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work so far @Karanjot786, please take a look at these changes and resubmit for review.
Do not forget to ask for clarification if there is something you don't understand or that you are not sure of. Making changes you're unsure of will only create more work for reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should only be testing the validation functions. Most of these tests are testing pydantic itself, you don't want to be testing external packages, such tests are completely unnecessary and considered bad practice.
Please rewrite this module and submit for another review.
Hi @SalihuDickson and @uniqueg, I've rewritten the module based on your feedback. Update WES Models to Conform to GA4GH Spec
|
This PR enhances type safety and data validation for WES by introducing Pydantic models conforming to GA4GH schemas. Comprehensive unit tests ensure integrity and functionality, building consistency with the existing TES models.
Changes
Added Models:
wes_models.py
:WESState
WESOutputs
WESLog
WESTaskLog
WESRunRequest
WESData
Added Unit Tests:
test_wes_models.py
:Updated Package Initialization:
__init__.py
:__all__
to specify the public API.Documentation
wes_models.py
to improve readability and maintainability.Linting and Testing
datetime
objects.Attribution
Notes
wes_converter.py
.Summary by Sourcery
Adds Pydantic models for WES (Workflow Execution Service) based on GA4GH schemas, enhancing type safety and data validation. Includes unit tests to ensure model integrity and functionality, maintaining consistency with existing TES models.
New Features:
Tests: