Skip to content

Race condition if two worker instances attempt to initialize clean database in parallel #496

Open
@benjie

Description

@benjie

Only affects first run of worker against your DB, so if you already have worker running you're fine.

If you run 10 graphile-worker instances against a clean database all at the same time there's a good chance that in this code:

worker/src/migrate.ts

Lines 117 to 137 in 436e299

for (let attempts = 0; attempts < 2; attempts++) {
try {
const {
rows: [row],
} = await event.client.query(
`select current_setting('server_version_num') as server_version_num,
(select id from ${escapedWorkerSchema}.migrations order by id desc limit 1) as id,
(select id from ${escapedWorkerSchema}.migrations where breaking is true order by id desc limit 1) as biggest_breaking_id;`,
);
latestMigration = row.id;
latestBreakingMigration = row.biggest_breaking_id;
event.postgresVersion = checkPostgresVersion(row.server_version_num);
} catch (e) {
if (attempts === 0 && (e.code === "42P01" || e.code === "42703")) {
await installSchema(compiledSharedOptions, event);
} else {
throw e;
}
}
}

two or more will generate the requisite 42P01/42703 error, and both will attempt to install the schema, and only one can succeed. The other will throw an error and exit.

There's two bugs here:

  1. the installSchema should be wrapped with try/catch so that errors thrown there will trigger a retry of the whole loop
  2. there's no sleep(), so it'll instantly retry which can be painful - we should use randomized back-off

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions