Description
Expected behavior
- SQLAlchemy reflection on an Iceberg table should not emit warnings.
partition
should not be returned as an index on an Iceberg table.
Actual behavior
-
When performing reflection on an Iceberg table, SQLAlchemy's
Inspector._reflect_indexes()
method displays a series of warnings, one for each of the columns in the Iceberg table's$partitions
metadata table:/Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'partition' was not located in columns for table 'drivestats' metadata.reflect(bind=engine) /Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'record_count' was not located in columns for table 'drivestats' metadata.reflect(bind=engine) /Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'file_count' was not located in columns for table 'drivestats' metadata.reflect(bind=engine) /Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'total_size' was not located in columns for table 'drivestats' metadata.reflect(bind=engine) /Users/ppatterson/src/trino-indexes/test.py:8: SAWarning: index key 'data' was not located in columns for table 'drivestats' metadata.reflect(bind=engine)
-
After reflection, the table object for the Iceberg table contains an index named
partition
with no columns.
The above problems are caused by _get_partitions()
(called by TrinoDialect.get_indexes()
), which assumes that, if SELECT * FROM {schema}."{table_name}$partitions"
succeeds, then the table is a Hive table.
This assumption is not correct - the query also succeeds for Iceberg tables, but the returned columns are not indexes on the Iceberg table. The client should probably query system.metadata.catalogs
to get the connector_name
for the catalog and check that it is "hive"
.
Steps To Reproduce
The following test app causes the warnings to be shown. pip install trino[sqlalchemy]
before running it.
from sqlalchemy import create_engine, MetaData
# "iceberg" is an Iceberg catalog containing a single namespace, "drivestats",
# which contains a single table, "drivestats".
engine = create_engine('trino://admin@localhost:8080/iceberg/drivestats')
metadata = MetaData()
print('Calling reflect()')
metadata.reflect(engine)
print('\nListing tables:')
# What tables can we see?
for t in metadata.sorted_tables:
print(f'Table name: {t.name}')
# What indexes are on the table?
for idx in metadata.tables[t.name].indexes:
print(f'\tIndex name: {idx.name}')
# What are the index columns?
for c in idx.columns:
print(f'\t\tColumn name: {c}')
Expected output:
Calling reflect()
Listing tables:
Table name: drivestats
Current output:
[Note that the warnings are sent to stderr
, so they typically appear out of sequence with the app output, which is sent to stdout
.]
Calling reflect()
Listing tables:
Table name: drivestats
Index name: partition
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'partition' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'record_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'file_count' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'total_size' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
/Users/ppatterson/src/trino-indexes/test.py:10: SAWarning: index key 'data' was not located in columns for table 'drivestats'
metadata.reflect(bind=engine)
Log output
n/a
Operating System
RHEL 9.5 (Plow) in Docker
Trino Python client version
0.332.0
Trino Server version
467
Python version
Python 3.12.2
Are you willing to submit PR?
- Yes I am willing to submit a PR!