fix(glue): Support create_table for S3 Tables federated databases#3058
fix(glue): Support create_table for S3 Tables federated databases#3058jamesbornholt wants to merge 2 commits intoapache:mainfrom
Conversation
GlueCatalog.create_table() fails for S3 Tables because it tries to write Iceberg metadata to a warehouse-derived location before registering the table in Glue. S3 Tables manages storage internally, so the location is not known until the table exists in the service. Detect S3 Tables federated databases (FederatedDatabase.ConnectionType == "aws:s3tables") and use a two-phase create: first create a minimal table entry in Glue to have S3 Tables allocate storage, then write Iceberg metadata to the managed location and update the Glue table with the metadata pointer. On failure, clean up the table entry. This follows the same "pre-create then update" pattern used by the Java GlueCatalog when LakeFormation is enabled (GlueTableOperations.createGlueTempTableIfNecessary).
5633f4e to
9708909
Compare
|
thanks for the PR! do you know of a way to test this change? |
Add three moto-based tests for the S3 Tables code path in GlueCatalog: - create_table succeeds and uses the table warehouse location - create_table rejects an explicit location for S3 Tables databases - create_table raises TableAlreadyExistsError for duplicate tables Moto does not support FederatedDatabase or S3 Tables storage allocation, so the tests patch FakeDatabase.as_dict to return FederatedDatabase and FakeTable.__init__ to inject a StorageDescriptor.Location.
0c4f375 to
4f9299e
Compare
|
I took a stab at writing some tests. It requires monkeypatching moto to emulate Glue's behavior here, which is not super elegant, but it captures the right behavior. |
|
@geruh could you take a look when you get a chance? |
geruh
left a comment
There was a problem hiding this comment.
Thanks for raising this @jamesbornholt! I left a few comments. At a high level, Looks like we can already both load and commit against s3t tables federated to glue. This would just fill the create_table gap.
Initially I was a bit hesitant about adding support for this use case as it requires 4/5 api calls and in the non s3t federated case it adds an additional call. Although I don't think there is any way around this.
WDYT?
| return False | ||
| database = database_response["Database"] | ||
| federated = database.get("FederatedDatabase", {}) | ||
| return (federated.get("ConnectionType") or "").lower() == "aws:s3tables" |
|
|
||
| def _create_table_s3tables( | ||
| self, | ||
| identifier: str | Identifier, |
There was a problem hiding this comment.
nit: we have some redundancy with these params passed in: ident, db_name, tbl_name maybe we can derive?
| except Exception: | ||
| # Clean up the table created in step 1. | ||
| try: | ||
| self.glue.delete_table(DatabaseName=database_name, Name=table_name) |
| ) | ||
| raise | ||
|
|
||
| return self.load_table(identifier=identifier) |
There was a problem hiding this comment.
we could probably avoid the last load table here with the update table response
Rationale for this change
GlueCatalog.create_table() fails for S3 Tables because it tries to write Iceberg metadata to a warehouse-derived location before registering the table in Glue. S3 Tables manages storage internally, so the location is not known until the table exists in the service.
Detect S3 Tables federated databases (FederatedDatabase.ConnectionType == "aws:s3tables") and use a two-phase create: first create a minimal table entry in Glue to have S3 Tables allocate storage, then write Iceberg metadata to the managed location and update the Glue table with the metadata pointer. On failure, clean up the table entry.
This follows the same "pre-create then update" pattern used by the Java GlueCatalog when LakeFormation is enabled
(GlueTableOperations.createGlueTempTableIfNecessary).
Are these changes tested?
moto doesn't mock with enough fidelity to test this directly. I've tested locally against both S3 Tables and vanilla Glue Iceberg and confirmed CreateTable still works on both paths.
Are there any user-facing changes?
No.