Quickstart#
To get started with LSDB we will demonstrate a very common workflow. It consists of crossmatching a small set of objects of your interest with a large survey catalog stored in HiPSCat format (Gaia), applying cuts to the data, and saving the final result.
The first thing you need to do is to import our package.
[1]:
import lsdb
Create a Pandas Dataframe with the equatorial coordinates (right ascension and declination, in degrees) for your objects.
[2]:
import pandas as pd
# The coordinates (ra, dec) for our objects of interest
objects = [(180.080612, 9.507394), (179.884664, 10.479632), (179.790319, 9.551745)]
objects_df = pd.DataFrame(objects, columns=["ra", "dec"])
objects_df
[2]:
ra | dec | |
---|---|---|
0 | 180.080612 | 9.507394 |
1 | 179.884664 | 10.479632 |
2 | 179.790319 | 9.551745 |
Now that the data is in a DataFrame you can create an LSDB catalog to have it in HiPSCat format.
[3]:
my_object_catalog = lsdb.from_dataframe(objects_df, catalog_name="my_object_catalog", catalog_type="object")
my_object_catalog
[3]:
ra | dec | Norder | Dir | Npix | |
---|---|---|---|---|---|
npartitions=1 | |||||
6917529027641081856 | double[pyarrow] | double[pyarrow] | uint8[pyarrow] | uint64[pyarrow] | uint64[pyarrow] |
18446744073709551615 | ... | ... | ... | ... | ... |
Next, read the catalog we want to crossmatch with. Because we are downloading it from a web source we need to install an additional package (aiohttp). If your catalog happens to be present in local storage you can call read_hipscat
directly.
[4]:
!pip install aiohttp --quiet
In this tutorial we will read a small 1 degree cone region of Gaia DR3, one that should contain our objects. While LSDB typically reads into memory only the minimal amount of data it needs for our workflow, manually providing it with spatial information helps it identify which files to search for on disk.
[5]:
from lsdb.core.search import ConeSearch
gaia_path = "https://epyc.astro.washington.edu/~lincc-frameworks/hipscat_surveys/gaia_dr3/gaia"
gaia = lsdb.read_hipscat(gaia_path, search_filter=ConeSearch(ra=180, dec=10, radius_arcsec=0.5 * 3600))
gaia
[5]:
solution_id | designation | source_id | random_index | ref_epoch | ra | ra_error | dec | dec_error | parallax | parallax_error | parallax_over_error | pm | pmra | pmra_error | pmdec | pmdec_error | ra_dec_corr | ra_parallax_corr | ra_pmra_corr | ra_pmdec_corr | dec_parallax_corr | dec_pmra_corr | dec_pmdec_corr | parallax_pmra_corr | parallax_pmdec_corr | pmra_pmdec_corr | astrometric_n_obs_al | astrometric_n_obs_ac | astrometric_n_good_obs_al | astrometric_n_bad_obs_al | astrometric_gof_al | astrometric_chi2_al | astrometric_excess_noise | astrometric_excess_noise_sig | astrometric_params_solved | astrometric_primary_flag | nu_eff_used_in_astrometry | pseudocolour | pseudocolour_error | ra_pseudocolour_corr | dec_pseudocolour_corr | parallax_pseudocolour_corr | pmra_pseudocolour_corr | pmdec_pseudocolour_corr | astrometric_matched_transits | visibility_periods_used | astrometric_sigma5d_max | matched_transits | new_matched_transits | matched_transits_removed | ipd_gof_harmonic_amplitude | ipd_gof_harmonic_phase | ipd_frac_multi_peak | ipd_frac_odd_win | ruwe | scan_direction_strength_k1 | scan_direction_strength_k2 | scan_direction_strength_k3 | scan_direction_strength_k4 | scan_direction_mean_k1 | scan_direction_mean_k2 | scan_direction_mean_k3 | scan_direction_mean_k4 | duplicated_source | phot_g_n_obs | phot_g_mean_flux | phot_g_mean_flux_error | phot_g_mean_flux_over_error | phot_g_mean_mag | phot_bp_n_obs | phot_bp_mean_flux | phot_bp_mean_flux_error | phot_bp_mean_flux_over_error | phot_bp_mean_mag | phot_rp_n_obs | phot_rp_mean_flux | phot_rp_mean_flux_error | phot_rp_mean_flux_over_error | phot_rp_mean_mag | phot_bp_rp_excess_factor | phot_bp_n_contaminated_transits | phot_bp_n_blended_transits | phot_rp_n_contaminated_transits | phot_rp_n_blended_transits | phot_proc_mode | bp_rp | bp_g | g_rp | radial_velocity | radial_velocity_error | rv_method_used | rv_nb_transits | rv_nb_deblended_transits | rv_visibility_periods_used | rv_expected_sig_to_noise | rv_renormalised_gof | rv_chisq_pvalue | rv_time_duration | rv_amplitude_robust | rv_template_teff | rv_template_logg | rv_template_fe_h | rv_atm_param_origin | vbroad | vbroad_error | vbroad_nb_transits | grvs_mag | grvs_mag_error | grvs_mag_nb_transits | rvs_spec_sig_to_noise | phot_variable_flag | l | b | ecl_lon | ecl_lat | in_qso_candidates | in_galaxy_candidates | non_single_star | has_xp_continuous | has_xp_sampled | has_rvs | has_epoch_photometry | has_epoch_rv | has_mcmc_gspphot | has_mcmc_msc | in_andromeda_survey | classprob_dsc_combmod_quasar | classprob_dsc_combmod_galaxy | classprob_dsc_combmod_star | teff_gspphot | teff_gspphot_lower | teff_gspphot_upper | logg_gspphot | logg_gspphot_lower | logg_gspphot_upper | mh_gspphot | mh_gspphot_lower | mh_gspphot_upper | distance_gspphot | distance_gspphot_lower | distance_gspphot_upper | azero_gspphot | azero_gspphot_lower | azero_gspphot_upper | ag_gspphot | ag_gspphot_lower | ag_gspphot_upper | ebpminrp_gspphot | ebpminrp_gspphot_lower | ebpminrp_gspphot_upper | libname_gspphot | Norder | Dir | Npix | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
npartitions=1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7782220156096217088 | int64[pyarrow] | string[pyarrow] | int64[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | int64[pyarrow] | int64[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | bool[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | int64[pyarrow] | double[pyarrow] | int64[pyarrow] | int64[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | bool[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | int64[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | string[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | bool[pyarrow] | bool[pyarrow] | int64[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | bool[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | double[pyarrow] | string[pyarrow] | uint8[pyarrow] | uint64[pyarrow] | uint64[pyarrow] |
18446744073709551615 | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Let’s crossmatch them! As a result we will have a catalog with the objects from Gaia that match our initial objects of interest, according to a specified maximum distance. We will use the default K nearest neighbors algorithm with k=1
and a maximum separation distance of 1 arcsecond.
[6]:
result = my_object_catalog.crossmatch(gaia, n_neighbors=1, radius_arcsec=1 * 3600, require_right_margin=False)
/home/docs/checkouts/readthedocs.org/user_builds/lsdb/envs/latest/lib/python3.10/site-packages/lsdb/dask/crossmatch_catalog_data.py:117: RuntimeWarning: Right catalog does not have a margin cache. Results may be incomplete and/or inaccurate.
warnings.warn(
Now let’s say we wish to apply a cut to our data to get all the objects with a mean magnitude in the G band greater than 18 and a total number of observations AL greater than 200. We can build a query expression and filter the catalog. Because of the crossmatch, the name of the Gaia columns for the query need to contain the name of the catalog as a suffix, _gaia
.
[7]:
result = result.query("phot_g_mean_mag_gaia > 18 and astrometric_n_obs_al_gaia > 200")
Our workflow takes advantage of Dask’s “lazy” evaluation. This means that we have been defining a set of tasks which will only be executed at our command. When that happens, data will be read into disk and operations will be distributed among the several workers for parallel computation. To trigger this we will call compute
on the catalog that resulted from the crossmatch. The final result will be presented in a Pandas DataFrame.
[8]:
result.compute()
[8]:
ra_my_object_catalog | dec_my_object_catalog | Norder_my_object_catalog | Dir_my_object_catalog | Npix_my_object_catalog | solution_id_gaia | designation_gaia | source_id_gaia | random_index_gaia | ref_epoch_gaia | ... | ag_gspphot_lower_gaia | ag_gspphot_upper_gaia | ebpminrp_gspphot_gaia | ebpminrp_gspphot_lower_gaia | ebpminrp_gspphot_upper_gaia | libname_gspphot_gaia | Norder_gaia | Dir_gaia | Npix_gaia | _dist_arcsec | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_hipscat_index | |||||||||||||||||||||
7800225727302860800 | 180.080612 | 9.507394 | 0 | 0 | 6 | 1636148068921376768 | Gaia DR3 3900112810537137536 | 3900112810537137536 | 272226905 | 2016.0 | ... | 0.0022 | 0.0161 | 0.0039 | 0.0012 | 0.0088 | MARCS | 2 | 0 | 108 | 34.558381 |
1 rows × 161 columns
You will want to save your resulting catalog to disk, especially if it is too large to fit in memory or if you will need to use it later on. The catalog exposes the to_hipscat
API for that, you just need to provide it with a path for the target base directory
and a catalog name
.
[9]:
result.to_hipscat("lsdb_catalogs", catalog_name="my_object_catalog_x_gaia")