Skip to content

Commit 91619c9

Browse files
Incorporate Vandana's feedback about narrative
1 parent 41b8c5f commit 91619c9

File tree

1 file changed

+62
-51
lines changed

1 file changed

+62
-51
lines changed

tutorials/cloud_access/euclid-cloud-access.md

+62-51
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ By the end of this tutorial, you will:
2323

2424
+++
2525

26-
## Introduction
26+
## 1. Introduction
2727
Euclid launched in July 2023 as a European Space Agency (ESA) mission with involvement by NASA. The primary science goals of Euclid are to better understand the composition and evolution of the dark Universe. The Euclid mission is providing space-based imaging and spectroscopy as well as supporting ground-based imaging to achieve these primary goals. These data will be archived by multiple global repositories, including IRSA, where they will support transformational work in many areas of astrophysics.
2828

2929
Euclid Quick Release 1 (Q1) consists of consists of ~30 TB of imaging, spectroscopy, and catalogs covering four non-contiguous fields: Euclid Deep Field North (22.9 sq deg), Euclid Deep Field Fornax (12.1 sq deg), Euclid Deep Field South (28.1 sq deg), and LDN1641.
@@ -32,8 +32,12 @@ Euclid Q1 data were released on-premises at IPAC and in the cloud via Amazon Web
3232

3333
+++
3434

35-
## Imports
36-
- TODO: fill the imports explaination
35+
## 2. Imports
36+
- `s3fs` for browsing S3 buckets
37+
- `astropy` for handling coordinates, units, FITS I/O, tables, images, etc.
38+
- `astroquery` for querying Euclid data products from IRSA
39+
- `matplotlib` for visualization
40+
- `json` for decoding JSON strings
3741

3842
```{code-cell} ipython3
3943
# Uncomment the next line to install dependencies if needed.
@@ -54,52 +58,53 @@ from matplotlib import pyplot as plt
5458
import json
5559
```
5660

57-
## Browse Euclid QR1 Bucket
61+
## 3. Browse Euclid Q1 cloud-hosted data
5862

5963
```{code-cell} ipython3
60-
BUCKET_NAME='nasa-irsa-euclid-q1' # internal to IPAC until public release (use LAN or VPN w/ Tunnel-all)
64+
BUCKET_NAME = 'nasa-irsa-euclid-q1'
6165
```
6266

67+
[s3fs](https://s3fs.readthedocs.io/en/latest/) provides a filesystem-like python interface for AWS S3 buckets. First we create a s3 client:
68+
6369
```{code-cell} ipython3
6470
s3 = s3fs.S3FileSystem(anon=True)
6571
```
6672

67-
TODO: link s3fs docs
73+
Then we list the `q1` directory that contains Euclid Q1 data products:
6874

6975
```{code-cell} ipython3
7076
s3.ls(f'{BUCKET_NAME}/q1')
7177
```
7278

73-
## Find images for a coordinate search
74-
75-
+++
76-
77-
### Locate MER images in the bucket
79+
Let's navigate to MER images (available as FITS files):
7880

7981
```{code-cell} ipython3
80-
s3.ls(f'{BUCKET_NAME}/q1/MER')
82+
s3.ls(f'{BUCKET_NAME}/q1/MER')[:10] # ls only top 10 to limit the long output
8183
```
8284

8385
```{code-cell} ipython3
84-
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211')
86+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211') # pick any tile ID from above
8587
```
8688

8789
```{code-cell} ipython3
88-
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211/VIS')
90+
s3.ls(f'{BUCKET_NAME}/q1/MER/102018211/VIS') # pick any instrument from above
8991
```
9092

91-
As per doc specification, we need `MER/{tile_id}/{instrument}/EUC_MER_BGSUB-MOSAIC*.fits` for displaying background-subtracted mosiac images. But these images are stored under TILE IDs so first we need to find TILE ID for a coordinate search we are interested in. We will use astroquery (in next section) to retrieve FITS file paths for our coordinates.
93+
As per "Browsable Directories" section in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf), we need `MER/{tile_id}/{instrument}/EUC_MER_BGSUB-MOSAIC*.fits` for displaying background-subtracted mosiac images. But these images are stored under TILE IDs so first we need to find TILE ID for a coordinate search we are interested in. We will use astroquery (in next section) to retrieve FITS file paths for our coordinates by doing spatial search.
9294

9395
+++
9496

95-
### Get image file paths for a coordinate search of interest
97+
## 4. Do a spatial search for MER mosaics
98+
99+
Pick a target and search radius:
96100

97101
```{code-cell} ipython3
98-
coord = SkyCoord.from_name("TYC 4429-1677-1")
102+
target_name = 'TYC 4429-1677-1'
103+
coord = SkyCoord.from_name(target_name)
99104
search_radius = 10 * u.arcsec
100105
```
101106

102-
List all Simple Image Access (SIA) collections for IRSA.
107+
List all Simple Image Access (SIA) collections for IRSA:
103108

104109
```{code-cell} ipython3
105110
collections = Irsa.list_collections(servicetype='SIA')
@@ -112,25 +117,29 @@ Filter to only those containing "euclid":
112117
collections[['euclid' in v for v in collections['collection']]]
113118
```
114119

115-
As per "Data Products Overview" in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf) and the above table, we identify that MER Mosiacs are available as follows:
120+
As per "Data Products Overview" in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf) and above table, we identify that MER Mosiacs are available as the following collection:
116121

117122
```{code-cell} ipython3
118123
img_collection = 'euclid_DpdMerBksMosaic'
119124
```
120125

126+
Now query this collection for our target's coordinates and search radius:
127+
121128
```{code-cell} ipython3
122129
img_tbl = Irsa.query_sia(pos=(coord, search_radius), collection=img_collection).to_table()
123130
img_tbl
124131
```
125132

126-
Now we narrow it down to the images with science dataproduct subtype and Euclid facility:
133+
Let's narrow it down to the images with science dataproduct subtype and Euclid facility:
127134

128135
```{code-cell} ipython3
129-
euclid_sci_img_tbl = img_tbl[[row['facility_name']=='Euclid' and row['dataproduct_subtype']=='science' for row in img_tbl]]
136+
euclid_sci_img_tbl = img_tbl[[row['facility_name']=='Euclid'
137+
and row['dataproduct_subtype']=='science'
138+
for row in img_tbl]]
130139
euclid_sci_img_tbl
131140
```
132141

133-
We can see there's a `cloud_access` column that gives us the location info of the image files we are interested in. So let's extract the S3 bucket file path from it.
142+
We can see there's a `cloud_access` column that gives us the location info of the image files we are interested in. So let's extract the S3 bucket file path from it:
134143

135144
```{code-cell} ipython3
136145
def get_s3_fpath(cloud_access):
@@ -156,7 +165,7 @@ def get_filter_name(instrument, bandpass):
156165
[get_filter_name(row['instrument_name'], row['energy_bandpassname']) for row in euclid_sci_img_tbl]
157166
```
158167

159-
## Retrieve image cutouts from the cloud
168+
## 5. Efficiently retrieve mosaic cutouts
160169
These image files are very big (~1.4GB), so we use astropy's lazy-loading capability of FITS for better performance. (See [Obtaining subsets from cloud-hosted FITS files](https://docs.astropy.org/en/stable/io/fits/usage/cloud.html#fits-io-cloud).)
161170

162171
```{code-cell} ipython3
@@ -194,37 +203,40 @@ for idx, ax in enumerate(axes.flat):
194203
plt.tight_layout()
195204
```
196205

197-
## Find objects for the coordinates of our interest
198-
199-
+++
200-
201-
### Locate MER catalogs in the bucket
206+
## 6. Find the MER catalog for a given tile
207+
Let's navigate to MER catalog in the Euclid Q1 bucket:
202208

203209
```{code-cell} ipython3
204210
s3.ls(f'{BUCKET_NAME}/q1/catalogs')
205211
```
206212

207213
```{code-cell} ipython3
208-
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG')
214+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG')[:10] # ls only top 10 to limit the long output
209215
```
210216

211217
```{code-cell} ipython3
212-
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG/102018211')
218+
mer_tile_id = 102160339 # from the image paths for the target we picked
219+
s3.ls(f'{BUCKET_NAME}/q1/catalogs/MER_FINAL_CATALOG/{mer_tile_id}')
213220
```
214221

215-
As per doc specification, we need `catalogs/MER_FINAL_CATALOG/{tile_id}/EUC_MER_FINAL-CAT*.fits` for listing the objects catalogued. But we only need to find objects in our coordinates of interest so we will use astroquery to do a spatial search in MER catalog (combined for all tiles).
222+
As per "Browsable Directiories" section in [user guide](https://irsa.ipac.caltech.edu/data/Euclid/docs/euclid_archive_at_irsa_user_guide.pdf), we can use `catalogs/MER_FINAL_CATALOG/{tile_id}/EUC_MER_FINAL-CAT*.fits` for listing the objects catalogued. We can read the identified FITS file as table and do filtering on ra, dec columns to find object ID(s) only for the target we picked. But it will be an expensive operation so we will instead use astroquery (in next section) to do a spatial search in the MER catalog provided by IRSA.
223+
224+
```{note}
225+
Once the catalogs are available as Parquet files in the cloud, we can efficiently do spatial filtering directly on the cloud-hosted file to identify object ID(s) for our target. But for the time being, we can use catalog VO services through astroquery to do the same.
226+
```
216227

217228
+++
218229

219-
### Get object IDs for the coordinates of our interest
230+
## 7. Find the MER Object ID for our target
231+
First, list the Euclid catalogs provided by IRSA:
220232

221233
```{code-cell} ipython3
222-
tbl_catalogs = Irsa.list_catalogs(full=True).to_table()
223-
len(tbl_catalogs)
234+
catalogs = Irsa.list_catalogs(full=True).to_table()
235+
len(catalogs)
224236
```
225237

226238
```{code-cell} ipython3
227-
tbl_catalogs[['euclid' in v for v in tbl_catalogs['schema_name']]]
239+
catalogs[['euclid' in v for v in catalogs['schema_name']]]
228240
```
229241

230242
From this table, we can extract the MER catalog name. We also see several other interesting catalogs, let's also extract spectral file association catalog for retrieving spectra later.
@@ -234,17 +246,13 @@ euclid_mer_catalog = 'euclid_q1_mer_catalogue'
234246
euclid_spec_association_catalog = 'euclid.objectid_spectrafile_association_q1'
235247
```
236248

237-
Now, we do a TAP search with spatial constraints for our coordinates. We use cone of 5 arcsec around our source to pinpoint its object ID in Euclid catalog.
249+
Now, we do a region search within a cone of 5 arcsec around our target to pinpoint its object ID in Euclid catalog:
238250

239251
```{code-cell} ipython3
240-
search_radius = (5 * u.arcsec).to('deg')
252+
search_radius = 5 * u.arcsec
241253
242-
adql_query = f"SELECT * \
243-
FROM {euclid_mer_catalog} \
244-
WHERE CONTAINS(POINT('ICRS', ra, dec), \
245-
CIRCLE('ICRS', {coord.ra.deg}, {coord.dec.deg}, {search_radius.value})) = 1"
246-
247-
mer_catalog_tbl = Irsa.query_tap(query=adql_query).to_table()
254+
mer_catalog_tbl = Irsa.query_region(coordinates=coord, spatial='Cone',
255+
catalog=euclid_mer_catalog, radius=search_radius)
248256
mer_catalog_tbl
249257
```
250258

@@ -253,8 +261,8 @@ object_id = int(mer_catalog_tbl['object_id'][0])
253261
object_id
254262
```
255263

256-
## Find spectra for the coordinates of our interest
257-
Using the object ID(s) we extracted above, we can narrow down the spectral file association catalog to identify spectra file path(s).
264+
## 8. Find the spectrum of an object in the MER catalog
265+
Using the object ID(s) we extracted above, we can narrow down the spectral file association catalog to identify spectra file path(s). So we do the following TAP search:
258266

259267
```{code-cell} ipython3
260268
adql_query = f"SELECT * FROM {euclid_spec_association_catalog} \
@@ -264,7 +272,11 @@ spec_association_tbl = Irsa.query_tap(adql_query).to_table()
264272
spec_association_tbl
265273
```
266274

267-
We can see the `uri` column that gives us location of spectra file on IBE, we can map it to S3 bucket key to retrieve spectra file from the cloud. This is a very big FITS spectra file with multiple extensions where each extension contains spectrum of one object. The `hdu` column gives us the extension number for our object. So let's extract both of these.
275+
```{warning}
276+
If you picked a target other than what this notebook uses, it's possible that there is no spectrum associated for your target's object ID. In that case, `spec_association_tbl` will contain 0 rows.
277+
```
278+
279+
In above table, we can see that the `uri` column gives us location of spectra file on IBE. We can map it to S3 bucket key to retrieve spectra file from the cloud. This is a very big FITS spectra file with multiple extensions where each extension contains spectrum of one object. The `hdu` column gives us the extension number for our object. So let's extract both of these.
268280

269281
```{code-cell} ipython3
270282
spec_fpath_key = spec_association_tbl['uri'][0].replace('ibe/data/euclid/', '')
@@ -276,8 +288,7 @@ object_hdu_idx = int(spec_association_tbl['hdu'][0])
276288
object_hdu_idx
277289
```
278290

279-
## Retrieve spectrum from the cloud
280-
Again we use astropy's lazy-loading capability of FITS to only retrieve the spectrum table of our object from the S3 bucket.
291+
Again, we use astropy's lazy-loading capability of FITS to only retrieve the spectrum table of our object from the S3 bucket.
281292

282293
```{code-cell} ipython3
283294
with fits.open(f's3://{BUCKET_NAME}/{spec_fpath_key}', fsspec_kwargs={'anon': True}) as hdul:
@@ -294,13 +305,13 @@ plt.plot(spec_tbl['WAVELENGTH'], spec_tbl['SIGNAL'])
294305
plt.xlabel(spec_tbl['WAVELENGTH'].unit.to_string('latex_inline'))
295306
plt.ylabel(spec_tbl['SIGNAL'].unit.to_string('latex_inline'))
296307
297-
plt.title(f'Euclid Object ID: {object_id}');
308+
plt.title(f'Spectrum of Target: {target_name}\n(Euclid Object ID: {object_id})');
298309
```
299310

300311
## About this Notebook
301312

302-
**Author:** Jaladh Singhal (IRSA Developer) in conjunction with Tiffany Meshkat, Vandana Desai, Brigitta Sipőcz, and the IPAC Science Platform team
313+
**Author:** Jaladh Singhal (IRSA Developer) in conjunction with Vandana Desai, Brigitta Sipőcz, Tiffany Meshkat and the IPAC Science Platform team
303314

304-
**Updated:** 2025-03-13
315+
**Updated:** 2025-03-17
305316

306317
**Contact:** the [IRSA Helpdesk](https://irsa.ipac.caltech.edu/docs/help_desk.html) with questions or reporting problems.

0 commit comments

Comments
 (0)