Skip to content

[ENG-7839] [ENG-7766] Add celery task to update Datacite metadata + Use new GV API endpoint to retrieve node's verified links #11117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Vlad0n20
Copy link
Contributor

@Vlad0n20 Vlad0n20 commented Apr 29, 2025

Purpose

  • Add celery task to update Datacite metadata
  • Use new GV API endpoint to retrieve node's verified links

Changes

See diff

QA Notes

N/A

Documentation

N/A

Side Effects

N/A

Ticket

https://openscience.atlassian.net/browse/ENG-7839
https://openscience.atlassian.net/browse/ENG-7766

@Vlad0n20 Vlad0n20 changed the base branch from develop to feature/verified-resource-linking April 29, 2025 14:27
@Vlad0n20 Vlad0n20 force-pushed the feature/ENG-7839 branch from af64e30 to 2633477 Compare May 1, 2025 12:12
@cslzchen cslzchen changed the title [ENG-7839] Add celery task to update metadata [ENG-7839] [ENG-7766] Add celery task to update Datacite metadata + Use new GV API endpoint to retrieve node's verified links May 1, 2025
Copy link
Collaborator

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟

Comment on lines 23 to 25
sentry.log_message('Updating DOI with verified links for guid',
extra_data={'guid': target_guid},
level=logging.INFO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we don't need to log to sentry about the update. We can simply do info or debug logging.

Comment on lines 33 to 35
sentry.log_message('DOI metadata with verified links updated for guid',
extra_data={'guid': target_guid},
level=logging.INFO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, use logging instead of sentry.

Comment on lines 36 to 40
except Exception as exc:
sentry.log_message('Failed to update DOI metadata with verified links',
extra_data={'guid': target_guid, 'error': str(exc)},
level=logging.ERROR)
raise self.retry(exc=exc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 60 to 61
if isinstance(metadata_record_xml, bytes):
metadata_record_xml = metadata_record_xml.decode('utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is the issue that has led you to do this extra check? Maybe our changes caused the build_metadata to behave differently?

And if we have to do this, I think we have some helpers called ensure_str and ensure_bytes that we can use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for local testing and forgot to remove it.

Comment on lines 60 to 66
def get_links(node_guid, requesting_user=None):
return iterate_gv_results(
ADDONS_ENDPOINT.format(addon_type=AddonType.LINK) + f'/{node_guid}/verified-links',
requesting_user=requesting_user,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have this implemented (which was newly merged yesterday morning, see https://github.com/CenterForOpenScience/osf.io/pull/11113/files#diff-8d04c47c31d3869aca32c85256dc4dc08e69fd836971299d95b8386503348668), please take a look and I think the new get_verified_links() should provide what you need.

from osf.models import AbstractNode

if isinstance(osf_item, AbstractNode):
gv_verified_link_list = osf_item.get_links()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, use get_verified_links()

for item in gv_verified_link_list:
verified_link, resource_type = item.get('target_url', None), item.get('resource_type', None)
if verified_link and resource_type:
if verified_link and isinstance(verified_link, str) and smells_like_iri(verified_link):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smells_like_iri() already checks for str type, so if verified_link and smells_like_iri(verified_link) should be good.

'resourceTypeGeneral': resource_type.title()
})
else:
logger.warning('skipping non-URL verified link "%s"', verified_link)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more useful if logged to sentry. And add which node it is.

It is probably nicer if we keep a list of skipped ones and send only one sentry message after the for loop.

@@ -2351,6 +2355,9 @@ def is_fork_of(self, other):
def is_registration_of(self, other):
return self.is_derived_from(other, 'registered_from')

def get_links(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, we should already have this after recent merge.

Comment on lines +2010 to +2003
from website.identifiers.tasks import update_doi_metadata_with_verified_links
if self.get_identifier('doi') and bool(self.IDENTIFIER_UPDATE_FIELDS.intersection(set(saved_fields))):
update_doi_metadata_with_verified_links(self._id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we have to do this? Is it because the identifier update is never automatically?

I think self.update_or_enqueue_on_node_updated(user_id, first_save, saved_fields) should have done this? Please double check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because, as I remembered, you asked me to do it in the previous PR for metadata updating

ihorsokhanexoft and others added 21 commits May 5, 2025 12:43
…nScience#11098)

## Purpose
When user has a deleted node where he is the only admin, osf admin can't gdpr delete this user because of this error


## Changes
Ignore this message for deleted nodes. Added a test

## Ticket
https://openscience.atlassian.net/browse/ENG-7557
…Admin App (CenterForOpenScience#11105)

## Purpose
Implement Force Archive for admin UI based on `osf/management/commands/force_archive.py`

## Changes
BE:
1. remove globals from force archive management command to make it thread-safe (usable in views)
2. Implement CheckArchiveStatusRegistrationsView and ForceArchiveRegistrationsView
3. RestartStuckRegistrationsView is replaced with ForceArchiveRegistrationsView

FE:
1. Added Check archive status button
2. Added Force Archive button
3. Updated Restart Stuck Registration button (now it'll open the same modal as Force Archive to choose parameters but dry-mode is not available)

## QA Notes
- Check archive status is displayed if node is registration
- Force Archive button is displayed if registration is not archived and not stuck
- Restart/Remove Stuck Registration are displayed if registration is not archived and stuck

## Ticket
https://openscience.atlassian.net/browse/ENG-7270
## Purpose
`Preprint` inherits from `VersionedGuidMixin` that inherits from `GuidMixin` thus `coerce_guid` tries to fetch guid from `guids` attribute instead of `versioned_guids`

## Changes
Added if statement

## Ticket
https://openscience.atlassian.net/browse/ENG-7798
## Purpose
Fix bug with citations for a new preprint version.
The problem was in the process of creating a version of the preprint, so also added the manage command to fix already created versions

## Changes
Fix bug with citations for a new preprint version

## Ticket
https://openscience.atlassian.net/browse/ENG-7263
double-quote-string-fixer
## Purpose
Fix preprint updating

## Changes
Add a manage command to fix why_no_data and has_data_links for broken preprints due to an unknown bug on FE (Futa described it in the comments to the ticket)

## Ticket
https://openscience.atlassian.net/browse/ENG-7503
clean up code we don't need anymore (and avoid error messages when CenterForOpenScience/SHARE#859 )

contains CenterForOpenScience#11012
…h date uploaded) via the admin app (CenterForOpenScience#11097)

## Purpose
Add "Create new version 1" button on admin UI

## Changes
- Add button to admin UI
- Extend existing preprint versioning functionality
- Move permission validation logic from serializer to model level to keep everything in one place
    - add required_permission decorator (with ignore_permission functionality) that checks if user has enough permission to call method

## QA Notes
Expected behavior:
1. Admin goes the preprint admin page
2. Clicks "Create new version 1", a modal asking to choose date will be displayed
3. If date is valid and file versions exists:
    - a new version 1 will be created
    - admin user will be redirected to v1
    - optionally, admin user can make new preprint published by clicking "Make published" button

(!) Preprint guid should point to the latest version, not the newly created one.
(!) Admin should be able to create new versions of a preprint even if  they are not a contributor.

## Ticket
https://openscience.atlassian.net/browse/ENG-7716
<!-- Before submit your Pull Request, make sure you picked
     the right branch:

     - For hotfixes, select "master" as the target branch
     - For new features, select "develop" as the target branch
     - For release feature fixes, select the relevant release branch (release/X.Y.Z) as the target branch -->
## Purpose
fix for non-contributor admin

## Changes
- add ignore_permission to set_license
- prevent creating new preprint version when serializer raises error

https://openscience.atlassian.net/browse/ENG-7716
## Purpose
Fix the errors in the management command

## Changes
Remove useless parameter



## Ticket
https://openscience.atlassian.net/browse/ENG-7263
…h date uploaded) via the admin app (CenterForOpenScience#11118)

## Purpose
improve exception handling and minor fixes

## Changes
- improve exception handling
- supplemental node permission error
- set subjects from relationship to keep valid hierarchy
- ignore unfinished/unpublished versions

## Ticket
https://openscience.atlassian.net/browse/ENG-7716
@Vlad0n20 Vlad0n20 force-pushed the feature/ENG-7839 branch from 49986c4 to dfdf156 Compare May 5, 2025 11:25
Copy link
Collaborator

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this PR now has develop code which hasn't been merged into feature/verified-resource-linking. It's very likely you rebased from or merged the latest develop. Please fix the commit.

I suggest you create a new PR from our feature branch and cherry pick your own commits. Otherwise, we may run into a case where your PR will reverse develop stuff you merged when the feature PR is merged into develop.

e.g. Here is current status

  • current feature branch: F->D1
  • current develop: D2->D1
  • your current feature branch: A->Modified D2->F->D1

The following two will have the same diff in GitHub:

  • What we need: expected feature branch: A->F->D1
  • What we should avoid: A->Modified D2 Removed->Modified D2->F->D1

@Vlad0n20 Vlad0n20 closed this May 6, 2025
@Vlad0n20 Vlad0n20 deleted the feature/ENG-7839 branch May 6, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants