Skip to content

CSHARP-994 Add AsyncEnumerable support to RowSet #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

verdie-g
Copy link
Contributor

@verdie-g verdie-g commented Apr 12, 2025

Any async method in Mapper can block the thread if there are more than one page in RowSet. This can easily starve the thread pool and obliterate the performance of your app.

The proper fix it to make RowSet implement IAsyncEnumerable. The latter is only available in .NET Core 3 though. I could make the driver target this version but it reached end of life 7 years ago. According to versionsof.net/core, the oldest active .NET version is 8 so that's what I picked.

The PR does not compile right now because the .NET 8 target generates a bunch of obsolete warnings (upgraded to errors) about System.Runtime.ConstrainedExecution and TLS1.X. Also, tests are missing.

I would like to get some first feedbacks before resuming the work, especially around adding this new target framework. I believe it's time to add new targets as .NET Standard is now 8 years old.

@joao-r-reis
Copy link
Collaborator

I agree that making RowSet implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets so if we're more concerned about Mapper methods internally enumerating through RowSet in a "sync" manner then we could change that without having to rely on IAsyncEnumerable, we would just have to make the mapper iterate "manually" through the rowset in an async way. This would be the cleanest way to fix this without getting into an extended discussion about moving to newer .NET targets.

@joao-r-reis
Copy link
Collaborator

joao-r-reis commented Apr 13, 2025

Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1 as that would allow us to keep supporting .NET platforms that support .NET standard while at the same time being able to leverage features like IAsyncEnumerable

@joao-r-reis
Copy link
Collaborator

Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well (async streams are not supported on .NET Framework)

@verdie-g
Copy link
Contributor Author

we would just have to make the mapper iterate "manually" through the rowset in an async way.

Task<IEnumerable<T>> FetchAsync<T>(CqlQueryOptions queryOptions = null);

This is tricky because FetchAsync returns an IEnumerable. So, it tranfers the responsibility to enumerate the rows to the caller. And since they only have a synchronous API, all they can do is to block the thread.

In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this

var rows  = await _session.ExecuteAsync(stmt, profile);

while (!rows.IsFullyFetched)
{
    await rows.FetchMoreResultsAsync();
}

return rows.Select(r =>
{
    var x = r.GetValue<int>(0);
    var y = r.GetValue<string>(1);
    var z = r.GetValue<byte[]>(2);

    return new MyObject(x, y, z);
}).ToArray();

This seems to be fix the thread starvation too and is more friendly to the GC. Though, all the rows must be in memory at some point.

But anyway it's just a work-around that most developers won't know about.

Also the mapper not relying on IAsyncEnumerable will also make it so that issue is fixed in .NET Framework as well

One good thing about that IEnumerable API is that all the rows don't have to be loaded in memory at the same time. The developer can chose to load them at their own pace. I'm not sure we can achieve the same in async without IAsyncEnumerable.

implement IAsyncEnumerable would be ideal but that will require some discussion about .NET targets
Irt .NET targets it would make more sense to move from .NET Standard 2.0 to .NET Standard 2.1

I believe a serious discussion should be had around which .NET version to target. I'm sure you have many clients still using .NET Framework but double targeting the LTS could allow developers using more recent features of .NET while still support old ones.

@joao-r-reis
Copy link
Collaborator

joao-r-reis commented Apr 15, 2025

In my company, I have applied some work arounds where the performance impact was alarming. The first one was to greatly increase the page size so all the rows would fit on a single page. That greatly impacted the GC of the client but it also impacted the server and I had to find another work around when the team operating cassandra came to us. The second one was this

Have you tried manual paging as described in the manual? That should fix the issue but it's a bit more code than relying on automatic paging. This is generally the best approach if you're concerned about thread starvation and GC pressure.


I think we can move from netstandard2.0 to netstandard2.1 and use conditional compilation to make RowSet implement IAsyncEnumerable in netstandard2.1, if you're interested in working on this I'd say go for it

@verdie-g
Copy link
Contributor Author

Have you tried manual paging as described in the manual?

Thx, I'll give it a try!

I think we can move from netstandard2.0 to netstandard2.1

Done

@verdie-g verdie-g marked this pull request as ready for review April 15, 2025 23:57
@joao-r-reis joao-r-reis changed the title CSHARP-994: fix thread starvation in Mapper CSHARP-994 Add AsyncEnumerable support to RowSet Apr 16, 2025
Copy link
Collaborator

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change 👍 , I've left a few comments that need to be addressed.

Justfyi I'm not sure when we will be able to get a new release out so in the mean time it might be better for you to use manual paging on your app to fix your current issues with thread starvation

Assert.AreEqual(pageSize, ids.Count);
//Retrieve the next page
var rs2 = Session.Execute(ps.Bind().SetAutoPage(false).SetPagingState(rs.PagingState));
Assert.Null(rs2.PagingState);
var ids2 = rs2.Select(r => r.GetValue<Guid>("id")).ToList();
var ids2 = Enumerable.Select(rs2, r => r.GetValue<Guid>("id")).ToList();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RowSet now implements both IEnumerable and IAsyncEnumerable which both have a select method. I double checked if it was an issue with my AsyncEnumerable implementation but if you create a new .NET 10 project (that has AsyncEnumerable built-in) and have a class that implements both interface, you'll have this error too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the driver project only targets netstandard 2.1 so it will never have the built in AsyncEnumerable right? Or is this an issue if a client application targeting net10 installs the driver which uses netstandard2.1 and then tries to call Select on the RowSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the driver project only targets netstandard 2.1 so it will never have the built in AsyncEnumerable right?

Correct, but I provided my own implementation of AsyncEnumerable. It's internal but internals are visible to the test assemblies.

Or is this an issue if a client application targeting net10 installs the driver which uses netstandard2.1 and then tries to call Select on the RowSet?

Yes this could be an issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this an issue if a client application targeting net10 installs the driver which uses netstandard2.1 and then tries to call Select on the RowSet?

Yes this could be an issue.

The custom AsyncEnumerable implementation is internal, how would it affect applications installing it?

Copy link
Collaborator

@joao-r-reis joao-r-reis Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the driver project only targets netstandard 2.1 so it will never have the built in AsyncEnumerable right?

Correct, but I provided my own implementation of AsyncEnumerable. It's internal but internals are visible to the test assemblies.

But the test assemblies don't target net10 currently so it shouldn't be an issue now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will indeed not impact projects referencing Cassandra because it's internal. It will impact Cassandra + test projects because they all have access to the this internal class.

.Select(r => new object[] { r.GetValue<int>(0), r.GetValue<string>(1) })
.OrderBy(arr => (int)arr[0])
.ToArray();
var result = Enumerable.Select(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -128,7 +128,7 @@ public void RandomValuesTest()

var selectQuery = $"SELECT id, timeuuid_sample, {GetToDateFunction()}(timeuuid_sample) FROM {AllTypesTableName} LIMIT 10000";
Assert.DoesNotThrow(() =>
Session.Execute(selectQuery).Select(r => r.GetValue<TimeUuid>("timeuuid_sample")).ToArray());
Enumerable.Select(Session.Execute(selectQuery), r => r.GetValue<TimeUuid>("timeuuid_sample")).ToArray());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using System.Threading.Tasks;

#if !NETSTANDARD2_1_OR_GREATER
#pragma warning disable CS1998 // Async method lacks 'await' operators and will run synchronously
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because FromRowSetAsync is marked as async but never awaits if it's NETFRAMEWORK?
I'd prefer to have FromRowSet and FromRowSetAsync separate so someone doesn't get tricked in the future into using this assuming it's always async when it's actually not in the case of NETFRAMEWORK

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make us have more conditions in the places where this method is being called but hopefully they aren't that many

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because FromRowSetAsync is marked as async but never awaits if it's NETFRAMEWORK?

Exactly.

I'd prefer to have FromRowSet and FromRowSetAsync separate

Done

@@ -24,6 +24,10 @@
using Cassandra.SessionManagement;
using Cassandra.Tasks;

#if !NETSTANDARD2_1_OR_GREATER
#pragma warning disable CS1998 // Async method lacks 'await' operators and will run synchronously
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same scenario as the previous one, I prefer to have methods separate so it's more clear when things are async and when they aren't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, since they are public methods, I didn't add new methods but made their implementations differ depending on the target framework.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this #pragma if you make the async word as part of the conditional block in the methods where this applies.

e.g.

            return ExecuteAsyncAndAdapt(cql, 
#if !NETFRAMEWORK
            async (s, rs) =>
            {
                var row = await rs.FirstOrDefaultAsync().ConfigureAwait(false);
#else
            (s, rs) =>
            {
                var row = rs.FirstOrDefault();
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's what I went for. It's a little more tricky because the return statement should also be changed (return VS return Task.FromResult)

@verdie-g verdie-g requested a review from joao-r-reis April 17, 2025 03:39
@joao-r-reis
Copy link
Collaborator

Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?

@joao-r-reis
Copy link
Collaborator

They even have the workaround listed there for net10 users

@verdie-g
Copy link
Contributor Author

Reading through this page I see there is a community maintained implementation of these methods, shouldn't we use this package instead?

I think it's important to keep the dependencies to a minimum. Here we are talking about ~100 lines of very easy code so I would greatly prefer that over an extra dependency.

@verdie-g
Copy link
Contributor Author

@joao-r-reis is that PR still on your radar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants