Description
Hi there, just encountering some issues while trying to stack Sentinel-1 GRD data from Planetary Computer at weiji14/zen3geo#62, and I'm wondering if it's an issue on the STAC metadata side (c.f. #152), or if it's something to fix in stackstac
.
Here's a MCWE, using pystac=1.4.0
, planetary-computer=0.4.7
, stackstac=0.4.3
:
import stackstac
import pystac
import planetary_computer
import xarray as xr
# %%
item_urls: list = [
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220320T230514_20220320T230548_042411_050E99",
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-grd/items/S1A_IW_GRDH_1SDV_20220308T230513_20220308T230548_042236_0508AF",
]
# Load the individual item metadata and sign the assets
items: list[pystac.Item] = [pystac.Item.from_file(item_url) for item_url in item_urls]
signed_items: list[pystac.Item] = [planetary_computer.sign(item) for item in items]
# Stack Sentinel-1 GRD files
dataarray: xr.DataArray = stackstac.stack(
items=signed_items,
epsg=32647,
resolution=30,
bounds_latlon=[99.933681, -0.009951, 100.065765, 0.147054], # W, S, E, N
)
assert dataarray.crs == "epsg:32647"
print(dataarray)
The output xarray repr looks ok like so. Dimensions are (time:2, band:2, y:579, x:491).
<xarray.DataArray 'stackstac-950602eb423dd1d439106f6794699f05' (time: 2,
band: 2,
y: 579, x: 491)>
dask.array<fetch_raster_window, shape=(2, 2, 579, 491), dtype=float64, chunksize=(1, 1, 579, 491), chunktype=numpy.ndarray>
Coordinates: (12/37)
* time (time) datetime64[ns] 2022-03-08T2...
id (time) <U62 'S1A_IW_GRDH_1SDV_2022...
* band (band) <U2 'vh' 'vv'
* x (x) float64 6.039e+05 ... 6.186e+05
* y (y) float64 1.626e+04 ... -1.08e+03
end_datetime (time) <U32 '2022-03-08 23:05:48.2...
... ...
s1:resolution <U4 'high'
s1:product_timeliness <U8 'Fast-24h'
sat:relative_orbit int64 164
description (band) <U145 'Amplitude of signal ...
title (band) <U41 'VH: vertical transmit...
epsg int64 32647
Attributes:
spec: RasterSpec(epsg=32647, bounds=(603870, -1110, 618600, 16260)...
crs: epsg:32647
transform: | 30.00, 0.00, 603870.00|\n| 0.00,-30.00, 16260.00|\n| 0.00,...
resolution: 30
but when I try to run dataarray.compute()
, this AttributeError: 'NoneType' object has no attribute 'to_epsg'
message popped up. Here's the full traceback:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 dataarray.compute()
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataarray.py:993, in DataArray.compute(self, **kwargs)
974 """Manually trigger loading of this array's data from disk or a
975 remote source into memory and return a new array. The original is
976 left unaltered.
(...)
990 dask.compute
991 """
992 new = self.copy(deep=False)
--> 993 return new.load(**kwargs)
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataarray.py:967, in DataArray.load(self, **kwargs)
949 def load(self: T_DataArray, **kwargs) -> T_DataArray:
950 """Manually trigger loading of this array's data from disk or a
951 remote source into memory and return this array.
952
(...)
965 dask.compute
966 """
--> 967 ds = self._to_temp_dataset().load(**kwargs)
968 new = self._from_temp_dataset(ds)
969 self._variable = new._variable
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/core/dataset.py:733, in Dataset.load(self, **kwargs)
730 import dask.array as da
732 # evaluate all the dask arrays simultaneously
--> 733 evaluated_data = da.compute(*lazy_data.values(), **kwargs)
735 for k, data in zip(lazy_data, evaluated_data):
736 self.variables[k].data = data
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/base.py:602, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
599 keys.append(x.__dask_keys__())
600 postcomputes.append(x.__dask_postcompute__())
--> 602 results = schedule(dsk, keys, **kwargs)
603 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
86 elif isinstance(pool, multiprocessing.pool.Pool):
87 pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
90 pool.submit,
91 pool._max_workers,
92 dsk,
93 keys,
94 cache=cache,
95 get_id=_thread_get_id,
96 pack_exception=pack_exception,
97 **kwargs,
98 )
100 # Cleanup pools associated to dead threads
101 with pools_lock:
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
509 _execute_task(task, data) # Re-execute locally
510 else:
--> 511 raise_exception(exc, tb)
512 res, worker_id = loads(res_info)
513 state["cache"][key] = res
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb)
317 if exc.__traceback__ is not tb:
318 raise exc.with_traceback(tb)
--> 319 raise exc
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
222 try:
223 task, data = loads(task_info)
--> 224 result = _execute_task(task, data)
225 id = get_id()
226 result = dumps((result, id))
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
115 func, args = arg[0], arg[1:]
116 # Note: Don't assign the subtask results to a variable. numpy detects
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/optimization.py:990, in SubgraphCallable.__call__(self, *args)
988 if not len(args) == len(self.inkeys):
989 raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:149, in get(dsk, out, cache)
147 for key in toposort(dsk):
148 task = dsk[key]
--> 149 result = _execute_task(task, cache)
150 cache[key] = result
151 result = _execute_task(out, cache)
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
115 func, args = arg[0], arg[1:]
116 # Note: Don't assign the subtask results to a variable. numpy detects
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/to_dask.py:185, in fetch_raster_window(reader_table, slices, dtype, fill_value)
178 # Only read if the window we're fetching actually overlaps with the asset
179 if windows.intersect(current_window, asset_window):
180 # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
181 # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
182 # would end up copied to even more threads.
183
184 # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185 data = reader.read(current_window)
187 if all_empty:
188 # Turn `output` from a broadcast-trick array to a real array, so it's writeable
189 if (
190 np.isnan(data)
191 if np.isnan(fill_value)
192 else np.equal(data, fill_value)
193 ).all():
194 # Unless the data we just read is all empty anyway
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:385, in AutoParallelRioReader.read(self, window, **kwargs)
384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385 reader = self.dataset
386 try:
387 result = reader.read(
388 window=window,
389 masked=True,
(...)
392 **kwargs,
393 )
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:381, in AutoParallelRioReader.dataset(self)
379 with self._dataset_lock:
380 if self._dataset is None:
--> 381 self._dataset = self._open()
382 return self._dataset
File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/stackstac/rio_reader.py:348, in AutoParallelRioReader._open(self)
340 raise RuntimeError(
341 f"Assets must have exactly 1 band, but file {self.url!r} has {ds.count}. "
342 "We can't currently handle multi-band rasters (each band has to be "
343 "a separate STAC asset), so you'll need to exclude this asset from your analysis."
344 )
346 # Only make a VRT if the dataset doesn't match the spatial spec we want
347 if self.spec.vrt_params != {
--> 348 "crs": ds.crs.to_epsg(),
349 "transform": ds.transform,
350 "height": ds.height,
351 "width": ds.width,
352 }:
353 with self.gdal_env.open_vrt:
354 vrt = WarpedVRT(
355 ds,
356 sharing=False,
357 resampling=self.resampling,
358 **self.spec.vrt_params,
359 )
AttributeError: 'NoneType' object has no attribute 'to_epsg'
So I verified that the stacked Sentinel-1 dataarray
does have a crs
attribute like 'epsg:32647'
, but it seems like ds
is something else? I did try to step through the code at
stackstac/stackstac/rio_reader.py
Lines 347 to 362 in 9106708
but am having trouble understaing what ds
is representing, or how ds.crs
can be set to a proper coordinate reference system value. My guess that the crs needs to be set in the STAC Item metadata? Or perhaps the if-statement needs to be revised.
Oh, and just to make sure the Sentinel-1 GRD STAC Items are readable, I did try reading it using rioxarray=0.11.1
:
import rioxarray
url: str = signed_items[1].assets["vv"].href
dataarray = rioxarray.open_rasterio(filename=url, overview_level=5)
dataarray = dataarray.compute()
print(dataarray)
# <xarray.DataArray (band: 1, y: 361, x: 396)>
# array([[[ 0, 224, 259, ..., 45, 0, 0],
# [ 0, 243, 286, ..., 44, 0, 0],
# [ 0, 274, 248, ..., 43, 0, 0],
# ...,
# [ 0, 0, 0, ..., 34, 36, 36],
# [ 0, 0, 0, ..., 33, 35, 34],
# [ 0, 0, 0, ..., 17, 17, 17]]], dtype=uint16)
# Coordinates:
# * band (band) int64 1
# spatial_ref int64 0
# Dimensions without coordinates: y, x
# Attributes:
# _FillValue: 0.0
# scale_factor: 1.0
# add_offset: 0.0
Here's the output from rioxarray.show_version()
for completeness to show my GDAL and other geo-library versions:
rioxarray (0.11.1) deps:
rasterio: 1.3.0
xarray: 2022.3.0
GDAL: 3.5.0
GEOS: 3.10.2
PROJ: 9.0.0
PROJ DATA: ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/rasterio/proj_data
GDAL DATA: None
Other python deps:
scipy: 1.9.0
pyproj: 3.3.1
System:
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
executable: ~/mambaforge/envs/zen3geo/bin/python
machine: Linux-5.17.0-1016-oem-x86_64-with-glibc2.35
P.S. Thanks for this amazing library, really like the design and it's been a pleasure to use 😄