Skip to content

HTTP-122 Retry for source lookup table #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

maciejmaciejko-gid
Copy link

Description

Retry feature for source lookup table.

PR Checklist

@grzegorz8 grzegorz8 requested a review from MarekMaj March 27, 2025 07:21
@grzegorz8 grzegorz8 changed the title Retry for source lookup table HTTP-122 Retry for source lookup table Mar 27, 2025
@grzegorz8 grzegorz8 linked an issue Mar 27, 2025 that may be closed by this pull request
@grzegorz8 grzegorz8 added enhancement New feature or request and removed enhancement New feature or request labels Mar 27, 2025
@grzegorz8
Copy link
Member

Hey @davidradl ! Please take a look at this PR as well. Your feedback would be appreciated.

@@ -452,33 +508,42 @@ be requested if the current time is later than the cached token expiry time minu
## Table API Connector Options
### HTTP TableLookup Source

| Option | Required | Description/Value |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very difficult to see what has changed. Please could you amend the change so it is minimal especially for this table

Copy link
Author

@maciejmaciejko-gid maciejmaciejko-gid Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zrzut ekranu 2025-03-28 o 13 55 02
Please turn on 'hide whitespace' option. The table was auto-formatted to keep the same column size for all rows.

CHANGELOG.md Outdated
@@ -2,6 +2,10 @@

## [Unreleased]

- Added support for auto-retry for source table. Auto retry on IOException and user-defined http codes - parameter `gid.connector.http.source.lookup.retry-codes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally one pr would be one line here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, set as one item with subitems

README.md Outdated
#### Retry strategy
User can choose retry strategy type for source table:
- fixed-delay - http request will be re-sent after specified delay
- exponential-delay - request will be re-sent with exponential backoff strategy, limited to max-retries attempts.
Copy link
Contributor

@davidradl davidradl Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the config option is lookup.max-retries (I suggest using the exact config parameter name) - do we need a separate config for max-retries for sinks?

It would be worth defining exactly what we mean by exponential backoff strategy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good idea. I added explanation

`setProperty' method from Sink's builder. The property names are:
- `gid.connector.http.sink.error.code` and `gid.connector.http.source.lookup.error.code` used to defined HTTP status code value that should be treated as error for example 404.
`setProperty' method from Sink's builder. The property name are:
- `gid.connector.http.sink.error.code` used to defined HTTP status code value that should be treated as error for example 404.
Many status codes can be defined in one value, where each code should be separated with comma, for example:
`401, 402, 403`. User can use this property also to define a type code mask. In that case, all codes from given HTTP response type will be treated as errors.
An example of such a mask would be `3XX, 4XX, 5XX`. In this case, all 300s, 400s and 500s status codes will be treated as errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is X the only mask character?

Copy link
Author

@maciejmaciejko-gid maciejmaciejko-gid Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of sink configuration, which wasn't changed. It allows only [1-5]XX or exact http code value. I reimplemented handling of http response code only for source table, which is connected with retry feature.

README.md Outdated
Many status codes can be defined in one value, where each code should be separated with comma, for example:
`401, 402, 403`. In this example, codes 401, 402 and 403 would not be interpreted as error codes.

### Source table
Http source requires success codes defined in parameter: `gid.connector.http.source.lookup.success-codes`. That list should contains all http status codes
which are considered as success response. It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: response -> responses

I am not sure what we mean by " It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed." Is 200 and 404 the defaults or recommended settings?

README.md Outdated
### Source table
Http source requires success codes defined in parameter: `gid.connector.http.source.lookup.success-codes`. That list should contains all http status codes
which are considered as success response. It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed.
Processing of 404 request's content may be skipped by adding it to parameter `gid.connector.http.source.lookup.ignored-response-codes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does skipped mean here - fail the job?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section was edited. Could you check if it's clear now?


## TODO

### HTTP TableLookup Source
- Think about Retry Policy for Http Request
Copy link
Contributor

@davidradl davidradl Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a TODO? I think the TODOs are reminders to developers - rather than a user checklist.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like reminders. I just removed implemented feature from that. Do you want me to remove the whole section TODO?

</dependency>
<!-- Add logging framework, to produce console output when running in the IDE. -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have the logging change in a separate PR - it is easier to track the history then please

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is needed anyways?
Its because of resilence4j?

One of the "rule of thumbs" when we were starting this connector was to try not add any external libraries to the connector, that my or may not clash with any user code -> i.e that is why we use Java's 11 http client.

You need the resilence4j for retry functionality right?
Which in essence is -> schedule a task on Java's scheduled thread executor and make sure to do a good job around error/exception handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedicated lib for retries might be an overkill now, but I think we can benefit in long term. The library provides Rate Limiter or Circuit Breaker. Both features might be worth adding. Or at least Rate Limiter.

Copy link
Author

@maciejmaciejko-gid maciejmaciejko-gid Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Why this change is needed anyways?
Its because of resilence4j?"

Yes, I had to change to compile project with resilence4j. Notice that Flink use the same API:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/

"You need the resilence4j for retry functionality right?"
I thought it's better to use mature library instead of reimplementing it. From the other side,as you said, it's additional dependency. Below part resilience4j dependencies based on mvn dependency:tree:

[INFO] +- io.github.resilience4j:resilience4j-retry:jar:1.7.1:compile
[INFO] |  +- io.vavr:vavr:jar:0.10.2:compile
[INFO] |  |  \- io.vavr:vavr-match:jar:0.10.2:compile
[INFO] |  \- io.github.resilience4j:resilience4j-core:jar:1.7.1:compile

Do you think it's ok to add them? Another option is to shadow them.

private final Retry retry;

@Builder
HttpClientWithRetry(HttpClient httpClient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what happens with OIDC, the short lived bearer token may need to be regenerated if the retries occur after the token has expired). Is this regeneration check done for the retries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request is created only once, but OIDC processor (responsible for setting bearer token in request) is called on every retry.

Copy link
Collaborator

@kristoffSC kristoffSC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR.
I have few questions though that would be great if you could address them:

Why you need 3rd party library to implement retry? Shouldn't be that hard to use vanilla Java for that.

Btw, do you know how it works under the hood?
Does retry blocks the Flink;s processing or is it asynchronous?

Why you choose to use exceptions to signal that retry is needed? Why not status object or a flag?

3:
There are quite few format changes, why? Could you double check if you have your IDE properly set?

Please start all test methods from "should".

</dependency>
<!-- Add logging framework, to produce console output when running in the IDE. -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is needed anyways?
Its because of resilence4j?

One of the "rule of thumbs" when we were starting this connector was to try not add any external libraries to the connector, that my or may not clash with any user code -> i.e that is why we use Java's 11 http client.

You need the resilence4j for retry functionality right?
Which in essence is -> schedule a task on Java's scheduled thread executor and make sure to do a good job around error/exception handling.

SOURCE_LOOKUP_HTTP_RETRY_CODES,
SOURCE_LOOKUP_HTTP_IGNORED_RESPONSE_CODES,

SOURCE_LOOKUP_CONNECTION_TIMEOUT // TODO: add request timeout from properties
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format change, why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added blank lines to make it more readable (groups like retry, oidc, cache, core etc.).

} catch (RetryHttpRequestException retryException) {
throw retryException.getCausedBy();
}
} catch (IOException | InterruptedException | HttpStatusCodeValidationFailedException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why IOException and InterruptedException are special here?
Why you need then on the send signature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are checked exceptions from HttpClient.send method. I don't want to repack them.

"Incorrect response code: " + response.statusCode(), response);
if (responseChecker.isTemporalError(response)) {
log.debug("Retrying... Received response with code {} for request {}", response.statusCode(), request);
throw new RetryHttpRequestException(validationFailedException);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need an exception to communicate a retry is needed?
I would say ok, if that would be an exception from Java's client but here you throwing your own exceptuion.

I'm really not a fan of "communication via exceptions". Exception are very costly... plus this looks not right to me.
Better would be some status object, flag etc.

Similar thing was discussed here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. You're right. It's better to retry based on response. I fixed the code.

readme refactor
HttpClientWithRetry reimplemented (retry based on code, without throwing exception)
'gid.connector.http.source.lookup.ignored-response-codes' = '404'
)
```
All 200s codes and 404 are considered as successful (`success-codes`). These responses won't cause retry or job failure. 404 response is also listed in `ignored-response-codes` parameter, what means content body will be ignored. Http with 404 code will produce just empty record. Notice that 404 has to be specified in both `success-codes` and `ignored-response-codes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to take that burden from the user in the future. 404 seems like a good candidate for a default config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarekMaj @grzegorz8
The 404 status may indicates that destination URL is wrong or resource doesn't exists. Marking 404 as successful response may hide configuration errors. Due to that I didn't set this as default value.
Do you think 404 should be successful by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is one way of simplifying the configuration burden. There still should be an option to overwrite that config by the user
Another idea: given that Notice that ignored-response-codes has to be a subset of success-codes., maybe defining code (for example 404) as ignored shouldn't require including the same code in success codes?

@maciejmaciejko-gid
Copy link
Author

@kristoffSC @davidradl @MarekMaj
Could you review this PR once again? I adjusted code after review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement lookup retry mechanism
5 participants