Skip to content

fix(glue): Support create_table for S3 Tables federated databases#3058

Open
jamesbornholt wants to merge 2 commits intoapache:mainfrom
jamesbornholt:fix-glue-create-table-s3tables
Open

fix(glue): Support create_table for S3 Tables federated databases#3058
jamesbornholt wants to merge 2 commits intoapache:mainfrom
jamesbornholt:fix-glue-create-table-s3tables

Conversation

@jamesbornholt
Copy link

Rationale for this change

GlueCatalog.create_table() fails for S3 Tables because it tries to write Iceberg metadata to a warehouse-derived location before registering the table in Glue. S3 Tables manages storage internally, so the location is not known until the table exists in the service.

Detect S3 Tables federated databases (FederatedDatabase.ConnectionType == "aws:s3tables") and use a two-phase create: first create a minimal table entry in Glue to have S3 Tables allocate storage, then write Iceberg metadata to the managed location and update the Glue table with the metadata pointer. On failure, clean up the table entry.

This follows the same "pre-create then update" pattern used by the Java GlueCatalog when LakeFormation is enabled
(GlueTableOperations.createGlueTempTableIfNecessary).

Are these changes tested?

moto doesn't mock with enough fidelity to test this directly. I've tested locally against both S3 Tables and vanilla Glue Iceberg and confirmed CreateTable still works on both paths.

Are there any user-facing changes?

No.

GlueCatalog.create_table() fails for S3 Tables because it tries to
write Iceberg metadata to a warehouse-derived location before
registering the table in Glue. S3 Tables manages storage internally,
so the location is not known until the table exists in the service.

Detect S3 Tables federated databases (FederatedDatabase.ConnectionType
== "aws:s3tables") and use a two-phase create: first create a minimal
table entry in Glue to have S3 Tables allocate storage, then write
Iceberg metadata to the managed location and update the Glue table
with the metadata pointer. On failure, clean up the table entry.

This follows the same "pre-create then update" pattern used by the
Java GlueCatalog when LakeFormation is enabled
(GlueTableOperations.createGlueTempTableIfNecessary).
@jamesbornholt jamesbornholt force-pushed the fix-glue-create-table-s3tables branch from 5633f4e to 9708909 Compare February 17, 2026 07:31
@kevinjqliu
Copy link
Contributor

thanks for the PR! do you know of a way to test this change?

Add three moto-based tests for the S3 Tables code path in GlueCatalog:
- create_table succeeds and uses the table warehouse location
- create_table rejects an explicit location for S3 Tables databases
- create_table raises TableAlreadyExistsError for duplicate tables

Moto does not support FederatedDatabase or S3 Tables storage allocation,
so the tests patch FakeDatabase.as_dict to return FederatedDatabase and
FakeTable.__init__ to inject a StorageDescriptor.Location.
@jamesbornholt jamesbornholt force-pushed the fix-glue-create-table-s3tables branch from 0c4f375 to 4f9299e Compare February 17, 2026 18:23
@jamesbornholt
Copy link
Author

I took a stab at writing some tests. It requires monkeypatching moto to emulate Glue's behavior here, which is not super elegant, but it captures the right behavior.

@kevinjqliu
Copy link
Contributor

@geruh could you take a look when you get a chance?

Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @jamesbornholt! I left a few comments. At a high level, Looks like we can already both load and commit against s3t tables federated to glue. This would just fill the create_table gap.

Initially I was a bit hesitant about adding support for this use case as it requires 4/5 api calls and in the non s3t federated case it adds an additional call. Although I don't think there is any way around this.

WDYT?

return False
database = database_response["Database"]
federated = database.get("FederatedDatabase", {})
return (federated.get("ConnectionType") or "").lower() == "aws:s3tables"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we extract this


def _create_table_s3tables(
self,
identifier: str | Identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have some redundancy with these params passed in: ident, db_name, tbl_name maybe we can derive?

except Exception:
# Clean up the table created in step 1.
try:
self.glue.delete_table(DatabaseName=database_name, Name=table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

)
raise

return self.load_table(identifier=identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably avoid the last load table here with the update table response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants