HTTP-122 Retry for source lookup table #148

maciejmaciejko-gid · 2025-03-26T17:19:22Z

Description

Retry feature for source lookup table.

PR Checklist

Tests added
[X ] Changelog updated

README.md

src/test/java/com/getindata/connectors/http/internal/status/HttpCodesParserTest.java

README.md

CHANGELOG.md

README.md

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java

README.md

grzegorz8 · 2025-03-28T09:54:36Z

Hey @davidradl ! Please take a look at this PR as well. Your feedback would be appreciated.

dev/checkstyle.xml

davidradl · 2025-03-28T11:55:42Z

README.md

@@ -452,33 +508,42 @@ be requested if the current time is later than the cached token expiry time minu
 ## Table API Connector Options
 ### HTTP TableLookup Source

-| Option                                                        | Required | Description/Value                                                                                                                                                                                                                                                                                                                                                 |


this is very difficult to see what has changed. Please could you amend the change so it is minimal especially for this table

Please turn on 'hide whitespace' option. The table was auto-formatted to keep the same column size for all rows.

davidradl · 2025-03-28T11:59:52Z

CHANGELOG.md

@@ -2,6 +2,10 @@

 ## [Unreleased]

+- Added support for auto-retry for source table. Auto retry on IOException and user-defined http codes - parameter `gid.connector.http.source.lookup.retry-codes`.


normally one pr would be one line here

ok, set as one item with subitems

README.md

davidradl · 2025-03-28T12:12:29Z

README.md

+#### Retry strategy
+User can choose retry strategy type for source table:
+- fixed-delay - http request will be re-sent after specified delay
+- exponential-delay - request will be re-sent with exponential backoff strategy, limited to max-retries attempts.


I see that the config option is lookup.max-retries (I suggest using the exact config parameter name) - do we need a separate config for max-retries for sinks?

It would be worth defining exactly what we mean by exponential backoff strategy.

ok, good idea. I added explanation

README.md

davidradl · 2025-03-28T12:16:51Z

README.md

-`setProperty' method from Sink's builder. The property names are:
- `gid.connector.http.sink.error.code` and `gid.connector.http.source.lookup.error.code` used to defined HTTP status code value that should be treated as error for example 404.
+`setProperty' method from Sink's builder. The property name are:
+- `gid.connector.http.sink.error.code` used to defined HTTP status code value that should be treated as error for example 404.
 Many status codes can be defined in one value, where each code should be separated with comma, for example:
 `401, 402, 403`. User can use this property also to define a type code mask. In that case, all codes from given HTTP response type will be treated as errors.
 An example of such a mask would be `3XX, 4XX, 5XX`. In this case, all 300s, 400s and 500s status codes will be treated as errors.


is X the only mask character?

It's part of sink configuration, which wasn't changed. It allows only [1-5]XX or exact http code value. I reimplemented handling of http response code only for source table, which is connected with retry feature.

README.md

davidradl · 2025-03-28T12:18:52Z

README.md

   Many status codes can be defined in one value, where each code should be separated with comma, for example:
  `401, 402, 403`. In this example, codes 401, 402 and 403 would not be interpreted as error codes.

+### Source table
+Http source requires success codes defined in parameter: `gid.connector.http.source.lookup.success-codes`. That list should contains all http status codes
+which are considered as success response. It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed.


nit: response -> responses

I am not sure what we mean by " It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed." Is 200 and 404 the defaults or recommended settings?

davidradl · 2025-03-28T12:19:28Z

README.md

+### Source table
+Http source requires success codes defined in parameter: `gid.connector.http.source.lookup.success-codes`. That list should contains all http status codes
+which are considered as success response. It may be 200 (ok) as well as 404 (not found). The first one is standard response and its content should be deserialized/parsed.
+Processing of 404 request's content may be skipped by adding it to parameter `gid.connector.http.source.lookup.ignored-response-codes`.


What does skipped mean here - fail the job?

The section was edited. Could you check if it's clear now?

README.md

davidradl · 2025-03-28T12:22:05Z

README.md


 ## TODO

 ### HTTP TableLookup Source
- Think about Retry Policy for Http Request


why is this a TODO? I think the TODOs are reminders to developers - rather than a user checklist.

Yes, it looks like reminders. I just removed implemented feature from that. Do you want me to remove the whole section TODO?

davidradl · 2025-03-28T12:23:29Z

pom.xml

    </dependency>
+    <!-- Add logging framework, to produce console output when running in the IDE. -->


can we have the logging change in a separate PR - it is easier to track the history then please

Why this change is needed anyways?
Its because of resilence4j?

One of the "rule of thumbs" when we were starting this connector was to try not add any external libraries to the connector, that my or may not clash with any user code -> i.e that is why we use Java's 11 http client.

You need the resilence4j for retry functionality right?
Which in essence is -> schedule a task on Java's scheduled thread executor and make sure to do a good job around error/exception handling.

Dedicated lib for retries might be an overkill now, but I think we can benefit in long term. The library provides Rate Limiter or Circuit Breaker. Both features might be worth adding. Or at least Rate Limiter.

"Why this change is needed anyways?
Its because of resilence4j?"

Yes, I had to change to compile project with resilence4j. Notice that Flink use the same API:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/

"You need the resilence4j for retry functionality right?"
I thought it's better to use mature library instead of reimplementing it. From the other side,as you said, it's additional dependency. Below part resilience4j dependencies based on mvn dependency:tree:

[INFO] +- io.github.resilience4j:resilience4j-retry:jar:1.7.1:compile [INFO] | +- io.vavr:vavr:jar:0.10.2:compile [INFO] | | \- io.vavr:vavr-match:jar:0.10.2:compile [INFO] | \- io.github.resilience4j:resilience4j-core:jar:1.7.1:compile

Do you think it's ok to add them? Another option is to shadow them.

davidradl · 2025-03-28T12:26:41Z

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java

+    private final Retry retry;
+
+    @Builder
+    HttpClientWithRetry(HttpClient httpClient,


I am wondering what happens with OIDC, the short lived bearer token may need to be regenerated if the retries occur after the token has expired). Is this regeneration check done for the retries?

The request is created only once, but OIDC processor (responsible for setting bearer token in request) is called on every retry.

...n/java/com/getindata/connectors/http/internal/table/lookup/HttpLookupTableSourceFactory.java

kristoffSC

Thanks for this PR.
I have few questions though that would be great if you could address them:

Why you need 3rd party library to implement retry? Shouldn't be that hard to use vanilla Java for that.

Btw, do you know how it works under the hood?
Does retry blocks the Flink;s processing or is it asynchronous?

Why you choose to use exceptions to signal that retry is needed? Why not status object or a flag?

3:
There are quite few format changes, why? Could you double check if you have your IDE properly set?

Please start all test methods from "should".

dev/checkstyle.xml

kristoffSC · 2025-03-30T18:58:09Z

pom.xml

    </dependency>
+    <!-- Add logging framework, to produce console output when running in the IDE. -->


Why this change is needed anyways?
Its because of resilence4j?

One of the "rule of thumbs" when we were starting this connector was to try not add any external libraries to the connector, that my or may not clash with any user code -> i.e that is why we use Java's 11 http client.

You need the resilence4j for retry functionality right?
Which in essence is -> schedule a task on Java's scheduled thread executor and make sure to do a good job around error/exception handling.

kristoffSC · 2025-03-30T19:06:25Z

...n/java/com/getindata/connectors/http/internal/table/lookup/HttpLookupTableSourceFactory.java

+                SOURCE_LOOKUP_HTTP_RETRY_CODES,
+                SOURCE_LOOKUP_HTTP_IGNORED_RESPONSE_CODES,
+
+                SOURCE_LOOKUP_CONNECTION_TIMEOUT        // TODO: add request timeout from properties


Format change, why?

I added blank lines to make it more readable (groups like retry, oidc, cache, core etc.).

...n/java/com/getindata/connectors/http/internal/table/lookup/HttpLookupTableSourceFactory.java

CHANGELOG.md

kristoffSC · 2025-03-30T19:15:54Z

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java

+            } catch (RetryHttpRequestException retryException) {
+                throw retryException.getCausedBy();
+            }
+        } catch (IOException | InterruptedException | HttpStatusCodeValidationFailedException e) {


Why IOException and InterruptedException are special here?
Why you need then on the send signature?

These are checked exceptions from HttpClient.send method. I don't want to repack them.

kristoffSC · 2025-03-30T19:19:38Z

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java

+                "Incorrect response code: " + response.statusCode(), response);
+        if (responseChecker.isTemporalError(response)) {
+            log.debug("Retrying... Received response with code {} for request {}", response.statusCode(), request);
+            throw new RetryHttpRequestException(validationFailedException);


Why we need an exception to communicate a retry is needed?
I would say ok, if that would be an exception from Java's client but here you throwing your own exceptuion.

I'm really not a fan of "communication via exceptions". Exception are very costly... plus this looks not right to me.
Better would be some status object, flag etc.

Similar thing was discussed here.

Good catch. You're right. It's better to retry based on response. I fixed the code.

src/test/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetryTest.java

readme refactor HttpClientWithRetry reimplemented (retry based on code, without throwing exception)

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java

MarekMaj · 2025-04-01T09:29:14Z

README.md

+  'gid.connector.http.source.lookup.ignored-response-codes' = '404'
+)
+```
+All 200s codes and 404 are considered as successful (`success-codes`). These responses won't cause retry or job failure. 404 response is also listed in `ignored-response-codes` parameter, what means content body will be ignored. Http with 404 code will produce just empty record. Notice that 404 has to be specified in both `success-codes` and `ignored-response-codes`.


I think we may want to take that burden from the user in the future. 404 seems like a good candidate for a default config

@maciejmaciejko-gid ?

@MarekMaj @grzegorz8
The 404 status may indicates that destination URL is wrong or resource doesn't exists. Marking 404 as successful response may hide configuration errors. Due to that I didn't set this as default value.
Do you think 404 should be successful by default?

That is one way of simplifying the configuration burden. There still should be an option to overwrite that config by the user
Another idea: given that Notice that ignored-response-codes has to be a subset of success-codes., maybe defining code (for example 404) as ignored shouldn't require including the same code in success codes?

maciejmaciejko-gid · 2025-04-22T09:14:36Z

@kristoffSC @davidradl @MarekMaj
Could you review this PR once again? I adjusted code after review.

maciejmaciejko-gid added 2 commits March 26, 2025 16:35

retry for source lookup table

610d4b9

changelog, readme, refactor

16ffc35

maciejmaciejko-gid requested review from grzegorz8 and kristoffSC March 26, 2025 17:19

grzegorz8 requested a review from MarekMaj March 27, 2025 07:21

grzegorz8 changed the title ~~Retry for source lookup table~~ HTTP-122 Retry for source lookup table Mar 27, 2025

grzegorz8 linked an issue Mar 27, 2025 that may be closed by this pull request

Implement lookup retry mechanism #122

Open

grzegorz8 added enhancement New feature or request and removed enhancement New feature or request labels Mar 27, 2025

grzegorz8 mentioned this pull request Mar 27, 2025

HTTP-122 Add lookup retries #129

Closed

grzegorz8 requested changes Mar 27, 2025

View reviewed changes

maciejmaciejko-gid added 5 commits March 27, 2025 13:38

refactor after review

a547bde

increasing code coverage

e0c7bcc

Merge branch 'main' into feature/retries

b67567a

increasing code coverage

680723b

exposing retry metrics

9485f27

grzegorz8 reviewed Mar 28, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

logging optimisation

853d23d

grzegorz8 reviewed Mar 28, 2025

View reviewed changes

src/main/java/com/getindata/connectors/http/internal/retry/HttpClientWithRetry.java Outdated Show resolved Hide resolved

grzegorz8 reviewed Mar 28, 2025

View reviewed changes

README.md Show resolved Hide resolved

maciejmaciejko-gid added 2 commits March 28, 2025 10:59

removed unused metrics

0c5097b

handling of max-retries param, readme fix, metric names convention

75eaa2d

grzegorz8 approved these changes Mar 28, 2025

View reviewed changes