Skip to content

AttributeError: 'NoneType' object has no attribute 'to_epsg' #181

Open
@weiji14

Description

@weiji14

Hi there, just encountering some issues while trying to stack Sentinel-1 GRD data from Planetary Computer at weiji14/zen3geo#62, and I'm wondering if it's an issue on the STAC metadata side (c.f. #152), or if it's something to fix in stackstac.

Here's a MCWE, using pystac=1.4.0, planetary-computer=0.4.7, stackstac=0.4.3:

import stackstac
import pystac
import planetary_computer
import xarray as xr

# %%
item_urls: list = [
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220320T230514_20220320T230548_042411_050E99",
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220308T230513_20220308T230548_042236_0508AF",
]

# Load the individual item metadata and sign the assets
items: list[pystac.Item] = [pystac.Item.from_file(item_url) for item_url in item_urls]
signed_items: list[pystac.Item] = [planetary_computer.sign(item) for item in items]

# Stack Sentinel-1 GRD files
dataarray: xr.DataArray = stackstac.stack(
    items=signed_items,
    epsg=32647,
    resolution=30,
    bounds_latlon=[99.933681, -0.009951, 100.065765, 0.147054],  # W, S, E, N
)
assert dataarray.crs == "epsg:32647"
print(dataarray)

The output xarray repr looks ok like so. Dimensions are (time:2, band:2, y:579, x:491).

<xarray.DataArray 'stackstac-950602eb423dd1d439106f6794699f05' (time: 2,
                                                                band: 2,
                                                                y: 579, x: 491)>
dask.array<fetch_raster_window, shape=(2, 2, 579, 491), dtype=float64, chunksize=(1, 1, 579, 491), chunktype=numpy.ndarray>
Coordinates: (12/37)
  * time                                   (time) datetime64[ns] 2022-03-08T2...
    id                                     (time) <U62 'S1A_IW_GRDH_1SDV_2022...
  * band                                   (band) <U2 'vh' 'vv'
  * x                                      (x) float64 6.039e+05 ... 6.186e+05
  * y                                      (y) float64 1.626e+04 ... -1.08e+03
    end_datetime                           (time) <U32 '2022-03-08 23:05:48.2...
    ...                                     ...
    s1:resolution                          <U4 'high'
    s1:product_timeliness                  <U8 'Fast-24h'
    sat:relative_orbit                     int64 164
    description                            (band) <U145 'Amplitude of signal ...
    title                                  (band) <U41 'VH: vertical transmit...
    epsg                                   int64 32647
Attributes:
    spec:        RasterSpec(epsg=32647, bounds=(603870, -1110, 618600, 16260)...
    crs:         epsg:32647
    transform:   | 30.00, 0.00, 603870.00|\n| 0.00,-30.00, 16260.00|\n| 0.00,...
    resolution:  30

but when I try to run dataarray.compute(), this AttributeError: 'NoneType' object has no attribute 'to_epsg' message popped up. Here's the full traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 dataarray.compute()

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataarray.py:993, in DataArray.compute(self, **kwargs)
    974 """Manually trigger loading of this array's data from disk or a
    975 remote source into memory and return a new array. The original is
    976 left unaltered.
   (...)
    990 dask.compute
    991 """
    992 new = self.copy(deep=False)
--> 993 return new.load(**kwargs)

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataarray.py:967, in DataArray.load(self, **kwargs)
    949 def load(self: T_DataArray, **kwargs) -> T_DataArray:
    950     """Manually trigger loading of this array's data from disk or a
    951     remote source into memory and return this array.
    952 
   (...)
    965     dask.compute
    966     """
--> 967     ds = self._to_temp_dataset().load(**kwargs)
    968     new = self._from_temp_dataset(ds)
    969     self._variable = new._variable

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataset.py:733, in Dataset.load(self, **kwargs)
    730 import dask.array as da
    732 # evaluate all the dask arrays simultaneously
--> 733 evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    735 for k, data in zip(lazy_data, evaluated_data):
    736     self.variables[k].data = data

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/base.py:602, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    599     keys.append(x.__dask_keys__())
    600     postcomputes.append(x.__dask_postcompute__())
--> 602 results = schedule(dsk, keys, **kwargs)
    603 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
     86     elif isinstance(pool, multiprocessing.pool.Pool):
     87         pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
     90     pool.submit,
     91     pool._max_workers,
     92     dsk,
     93     keys,
     94     cache=cache,
     95     get_id=_thread_get_id,
     96     pack_exception=pack_exception,
     97     **kwargs,
     98 )
    100 # Cleanup pools associated to dead threads
    101 with pools_lock:

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    509         _execute_task(task, data)  # Re-execute locally
    510     else:
--> 511         raise_exception(exc, tb)
    512 res, worker_id = loads(res_info)
    513 state["cache"][key] = res

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb)
    317 if exc.__traceback__ is not tb:
    318     raise exc.with_traceback(tb)
--> 319 raise exc

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    222 try:
    223     task, data = loads(task_info)
--> 224     result = _execute_task(task, data)
    225     id = get_id()
    226     result = dumps((result, id))

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
    115     func, args = arg[0], arg[1:]
    116     # Note: Don't assign the subtask results to a variable. numpy detects
    117     # temporaries by their reference count and can execute certain
    118     # operations in-place.
--> 119     return func(*(_execute_task(a, cache) for a in args))
    120 elif not ishashable(arg):
    121     return arg

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/optimization.py:990, in SubgraphCallable.__call__(self, *args)
    988 if not len(args) == len(self.inkeys):
    989     raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:149, in get(dsk, out, cache)
    147 for key in toposort(dsk):
    148     task = dsk[key]
--> 149     result = _execute_task(task, cache)
    150     cache[key] = result
    151 result = _execute_task(out, cache)

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
    115     func, args = arg[0], arg[1:]
    116     # Note: Don't assign the subtask results to a variable. numpy detects
    117     # temporaries by their reference count and can execute certain
    118     # operations in-place.
--> 119     return func(*(_execute_task(a, cache) for a in args))
    120 elif not ishashable(arg):
    121     return arg

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/to_dask.py:185, in fetch_raster_window(reader_table, slices, dtype, fill_value)
    178 # Only read if the window we're fetching actually overlaps with the asset
    179 if windows.intersect(current_window, asset_window):
    180     # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
    181     # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
    182     # would end up copied to even more threads.
    183 
    184     # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185     data = reader.read(current_window)
    187     if all_empty:
    188         # Turn `output` from a broadcast-trick array to a real array, so it's writeable
    189         if (
    190             np.isnan(data)
    191             if np.isnan(fill_value)
    192             else np.equal(data, fill_value)
    193         ).all():
    194             # Unless the data we just read is all empty anyway

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:385, in AutoParallelRioReader.read(self, window, **kwargs)
    384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385     reader = self.dataset
    386     try:
    387         result = reader.read(
    388             window=window,
    389             masked=True,
   (...)
    392             **kwargs,
    393         )

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:381, in AutoParallelRioReader.dataset(self)
    379 with self._dataset_lock:
    380     if self._dataset is None:
--> 381         self._dataset = self._open()
    382     return self._dataset

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:348, in AutoParallelRioReader._open(self)
    340     raise RuntimeError(
    341         f"Assets must have exactly 1 band, but file {self.url!r} has {ds.count}. "
    342         "We can't currently handle multi-band rasters (each band has to be "
    343         "a separate STAC asset), so you'll need to exclude this asset from your analysis."
    344     )
    346 # Only make a VRT if the dataset doesn't match the spatial spec we want
    347 if self.spec.vrt_params != {
--> 348     "crs": ds.crs.to_epsg(),
    349     "transform": ds.transform,
    350     "height": ds.height,
    351     "width": ds.width,
    352 }:
    353     with self.gdal_env.open_vrt:
    354         vrt = WarpedVRT(
    355             ds,
    356             sharing=False,
    357             resampling=self.resampling,
    358             **self.spec.vrt_params,
    359         )

AttributeError: 'NoneType' object has no attribute 'to_epsg'

So I verified that the stacked Sentinel-1 dataarray does have a crs attribute like 'epsg:32647', but it seems like ds is something else? I did try to step through the code at

if self.spec.vrt_params != {
"crs": ds.crs.to_epsg(),
"transform": ds.transform,
"height": ds.height,
"width": ds.width,
}:
with self.gdal_env.open_vrt:
vrt = WarpedVRT(
ds,
sharing=False,
resampling=self.resampling,
**self.spec.vrt_params,
)
else:
logger.info(f"Skipping VRT for {self.url!r}")
vrt = None

but am having trouble understaing what ds is representing, or how ds.crs can be set to a proper coordinate reference system value. My guess that the crs needs to be set in the STAC Item metadata? Or perhaps the if-statement needs to be revised.

Oh, and just to make sure the Sentinel-1 GRD STAC Items are readable, I did try reading it using rioxarray=0.11.1:

import rioxarray

url: str = signed_items[1].assets["vv"].href
dataarray = rioxarray.open_rasterio(filename=url, overview_level=5)
dataarray = dataarray.compute()
print(dataarray)
# <xarray.DataArray (band: 1, y: 361, x: 396)>
# array([[[  0, 224, 259, ...,  45,   0,   0],
#         [  0, 243, 286, ...,  44,   0,   0],
#         [  0, 274, 248, ...,  43,   0,   0],
#         ...,
#         [  0,   0,   0, ...,  34,  36,  36],
#         [  0,   0,   0, ...,  33,  35,  34],
#         [  0,   0,   0, ...,  17,  17,  17]]], dtype=uint16)
# Coordinates:
#   * band         (band) int64 1
#     spatial_ref  int64 0
# Dimensions without coordinates: y, x
# Attributes:
#     _FillValue:    0.0
#     scale_factor:  1.0
#     add_offset:    0.0

Here's the output from rioxarray.show_version() for completeness to show my GDAL and other geo-library versions:

rioxarray (0.11.1) deps:
  rasterio: 1.3.0
    xarray: 2022.3.0
      GDAL: 3.5.0
      GEOS: 3.10.2
      PROJ: 9.0.0
 PROJ DATA: ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/rasterio/proj_data
 GDAL DATA: None

Other python deps:
     scipy: 1.9.0
    pyproj: 3.3.1

System:
    python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
executable: ~/mambaforge/envs/zen3geo/bin/python
   machine: Linux-5.17.0-1016-oem-x86_64-with-glibc2.35

P.S. Thanks for this amazing library, really like the design and it's been a pleasure to use 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions