lsdb#

Subpackages#

Submodules#

Package Contents#

Classes#

Catalog

LSDB Catalog DataFrame to perform analysis of sky catalogs and efficient

class Catalog(ddf: dask.dataframe.core.DataFrame, ddf_pixel_map: lsdb.types.DaskDFPixelMap, hc_structure: hipscat.catalog.Catalog, margin: lsdb.catalog.margin_catalog.MarginCatalog | None = None)[source]#

Bases: lsdb.catalog.dataset.healpix_dataset.HealpixDataset

LSDB Catalog DataFrame to perform analysis of sky catalogs and efficient spatial operations.

hc_structure#

hipscat.Catalog object representing the structure and metadata of the HiPSCat catalog

hc_structure: hipscat.catalog.Catalog#
head(n: int = 5) pandas.DataFrame[source]#

Returns a few rows of data for previewing purposes.

Parameters:

n (int) – The number of desired rows.

Returns:

A pandas DataFrame with up to n of data.

query(expr: str) Catalog[source]#

Filters catalog using a complex query expression

Parameters:

expr (str) – Query expression to evaluate. The column names that are not valid Python variables names should be wrapped in backticks, and any variable values can be injected using f-strings. The use of ‘@’ to reference variables is not supported. More information about pandas query strings is available here.

Returns:

A catalog that contains the data from the original catalog that complies with the query expression

assign(**kwargs) Catalog[source]#

Assigns new columns to a catalog

Parameters:

**kwargs – Arguments to pass to the assign method. This dictionary should contain the column names as keys and either a function or a 1-D Dask array as their corresponding value.

Returns:

The catalog containing both the old columns and the newly created columns

crossmatch(other: Catalog, suffixes: Tuple[str, str] | None = None, algorithm: Type[lsdb.core.crossmatch.abstract_crossmatch_algorithm.AbstractCrossmatchAlgorithm] | lsdb.core.crossmatch.crossmatch_algorithms.BuiltInCrossmatchAlgorithm = BuiltInCrossmatchAlgorithm.KD_TREE, output_catalog_name: str | None = None, **kwargs) Catalog[source]#

Perform a cross-match between two catalogs

The pixels from each catalog are aligned via a PixelAlignment, and cross-matching is performed on each pair of overlapping pixels. The resulting catalog will have partitions matching an inner pixel alignment - using pixels that have overlap in both input catalogs and taking the smallest of any overlapping pixels.

The resulting catalog will be partitioned using the left catalog’s ra and dec, and the index for each row will be the same as the index from the corresponding row in the left catalog’s index.

Parameters:
  • other (Catalog) – The right catalog to cross-match against

  • suffixes (Tuple[str, str]) – A pair of suffixes to be appended to the end of each column name when they are joined. Default: uses the name of the catalog for the suffix

  • algorithm (BuiltInCrossmatchAlgorithm | Type[AbstractCrossmatchAlgorithm]) –

    The algorithm to use to perform the crossmatch. Can be either a string to specify one of the built-in cross-matching methods, or a custom method defined by subclassing AbstractCrossmatchAlgorithm.

    Built-in methods:
    • kd_tree: find the k-nearest neighbors using a kd_tree

    Custom function:

    To specify a custom function, write a class that subclasses the AbstractCrossmatchAlgorithm class, and overwrite the crossmatch function.

    The function should be able to perform a crossmatch on two pandas DataFrames from a HEALPix pixel from each catalog. It should return a dataframe with the combined set of columns from the input dataframes with the appropriate suffixes and, eventually, a set of extra columns generated by the crossmatch algorithm. These columns are specified in {AbstractCrossmatchAlgorithm.extra_columns}, with their respective data types, by means of an empty pandas dataframe. As an example, the KdTreeCrossmatch algorithm outputs a “_dist_arcsec” column with the distance between data points. Its extra_columns attribute is specified as follows:

    pd.DataFrame({"_dist_arcsec": pd.Series(dtype=np.dtype("float64"))})
    

    The class will have been initialized with the following parameters, which the crossmatch function should use:

    • left: pd.DataFrame,

    • right: pd.DataFrame,

    • left_order: int,

    • left_pixel: int,

    • right_order: int,

    • right_pixel: int,

    • left_metadata: hc.catalog.Catalog,

    • right_metadata: hc.catalog.Catalog,

    • right_margin_hc_structure: hc.margin.MarginCatalog,

    • suffixes: Tuple[str, str]

    You may add any additional keyword argument parameters to the crossmatch function definition, and the user will be able to pass them in as kwargs in the Catalog.crossmatch method.

  • output_catalog_name (str) – The name of the resulting catalog. Default: {left_name}_x_{right_name}

Returns:

A Catalog with the data from the left and right catalogs merged with one row for each pair of neighbors found from cross-matching.

The resulting table contains all columns from the left and right catalogs with their respective suffixes and, whenever specified, a set of extra columns generated by the crossmatch algorithm.

Perform a cone search to filter the catalog

Filters to points within radius great circle distance to the point specified by ra and dec in degrees. Filters partitions in the catalog to those that have some overlap with the cone.

Parameters:
  • ra (float) – Right Ascension of the center of the cone in degrees

  • dec (float) – Declination of the center of the cone in degrees

  • radius_arcsec (float) – Radius of the cone in arcseconds

  • fine (bool) – True if points are to be filtered, False if not. Defaults to True.

Returns:

A new Catalog containing the points filtered to those within the cone, and the partitions that overlap the cone.

box(ra: Tuple[float, float] | None = None, dec: Tuple[float, float] | None = None, fine: bool = True) Catalog[source]#

Performs filtering according to right ascension and declination ranges.

Filters to points within the region specified in degrees. Filters partitions in the catalog to those that have some overlap with the region.

Parameters:
  • ra (Tuple[float, float]) – The right ascension minimum and maximum values.

  • dec (Tuple[float, float]) – The declination minimum and maximum values.

  • fine (bool) – True if points are to be filtered, False if not. Defaults to True.

Returns:

A new catalog containing the points filtered to those within the region, and the partitions that have some overlap with it.

Perform a polygonal search to filter the catalog.

Filters to points within the polygonal region specified in ra and dec, in degrees. Filters partitions in the catalog to those that have some overlap with the region.

Parameters:
  • vertices (List[Tuple[float, float]) – The list of vertices of the polygon to filter pixels with, as a list of (ra,dec) coordinates, in degrees.

  • fine (bool) – True if points are to be filtered, False if not. Defaults to True.

Returns:

A new catalog containing the points filtered to those within the polygonal region, and the partitions that have some overlap with it.

Find rows by ids (or other value indexed by a catalog index).

Filters partitions in the catalog to those that could contain the ids requested. Filters to points that have matching values in the id field.

NB: This requires a previously-computed catalog index table.

Parameters:
  • ids – Values to search for.

  • catalog_index (HCIndexCatalog) – A pre-computed hipscat index catalog.

  • fine (bool) – True if points are to be filtered, False if not. Defaults to True.

Returns:

A new Catalog containing the points filtered to those matching the ids.

Filter catalog by order of HEALPix

Parameters:
  • min_order (int) – Minimum HEALPix order to select. Defaults to 0.

  • max_order (int) – Maximum HEALPix order to select. Defaults to maximum catalog order.

Returns:

A new Catalog containing only the pixels of orders specified (inclusive)

Find rows by reusable search algorithm.

Filters partitions in the catalog to those that match some rough criteria. Filters to points that match some finer criteria.

Parameters:
  • search (AbstractSearch) – Instance of AbstractSearch.

  • fine (bool) – True if points are to be filtered, False if not. Defaults to True.

Returns:

A new Catalog containing the points filtered to those matching the search parameters.

merge(other: Catalog, how: str = 'inner', on: str | List | None = None, left_on: str | List | None = None, right_on: str | List | None = None, left_index: bool = False, right_index: bool = False, suffixes: Tuple[str, str] | None = None) dask.dataframe.core.DataFrame[source]#

Performs the merge of two catalog Dataframes

More information about pandas merge is available here.

Parameters:
  • other (Catalog) – The right catalog to merge with.

  • how (str) – How to handle the merge of the two catalogs. One of {‘left’, ‘right’, ‘outer’, ‘inner’}, defaults to ‘inner’.

  • on (str | List) – Column or index names to join on. Defaults to the intersection of columns in both Dataframes if on is None and not merging on indexes.

  • left_on (str | List) – Column to join on the left Dataframe. Lists are supported if their length is one.

  • right_on (str | List) – Column to join on the right Dataframe. Lists are supported if their length is one.

  • left_index (bool) – Use the index of the left Dataframe as the join key. Defaults to False.

  • right_index (bool) – Use the index of the right Dataframe as the join key. Defaults to False.

  • suffixes (Tuple[str, str]) – A pair of suffixes to be appended to the end of each column name when they are joined. Defaults to using the name of the catalog for the suffix.

Returns:

A new Dask Dataframe containing the data points that result from the merge of the two catalogs.

join(other: Catalog, left_on: str | None = None, right_on: str | None = None, through: lsdb.catalog.association_catalog.AssociationCatalog | None = None, suffixes: Tuple[str, str] | None = None, output_catalog_name: str | None = None) Catalog[source]#

Perform a spatial join to another catalog

Joins two catalogs together on a shared column value, merging rows where they match. The operation only joins data from matching partitions, and does not join rows that have a matching column value but are in separate partitions in the sky. For a more general join, see the merge function.

Parameters:
  • other (Catalog) – the right catalog to join to

  • left_on (str) – the name of the column in the left catalog to join on

  • right_on (str) – the name of the column in the right catalog to join on

  • suffixes (Tuple[str,str]) – suffixes to apply to the columns of each table

  • output_catalog_name (str) – The name of the resulting catalog to be stored in metadata

Returns:

A new catalog with the columns from each of the input catalogs with their respective suffixes added, and the rows merged on the specified columns.