Converter

config.py

Module to work with the converter configuration.

get_bands_config(bases_config)[source]

get_unique_id(config: DataFrame, _id: int) → DataFrame[source]

Access the group of rows in the configuration for the given ID.

Parameters:

config (DataFrame) – A DataFrame containing the configuration columns and values.
_id (int) – An identifier of a set of configurations.

Returns:

A DataFrame containing the configuration values.

Return type:

DataFrame

parse_config(xml_file: Path | str) → namedtuple[source]

Parses a configuration file in XML format and returns a named tuple with the settings. It returns the values for all the tags present in the file.

Parameters:: xml_file (Union[Path, str]) – The path to the XML file to parse.
Returns:: A named tuple containing the configuration settings.
Return type:: namedtuple

converter.py

Module for the converter functionality.

convert(input_object: list | ~pathlib.Path | str, sampling: ~numpy.ndarray | None = numpy.linspace(0.0, 60.0, 600), truncation: bool = False, with_correlation: bool = False, output_path: ~pathlib.Path | str = '.', output_file: str = 'output_spectra', output_format: str | None = None, save_file: bool = True, username: str | None = None, password: str | None = None) -> (<class 'pandas.core.frame.DataFrame'>, <class 'numpy.ndarray'>)[source]

Conversion utility: converts the input internally calibrated mean spectra from the continuous representation to a: sampled form. The sampling grid can be defined by the user, alternatively a default will be adopted. Optionally, the continuous representation can be truncated dropping the bases functions (and corresponding coefficients) that were considered not to be significant considering the errors on the reconstructed mean spectra.

Parameters:

input_object (list/Path/str) – Path to the file containing the mean spectra as downloaded from the archive in their continuous representation, a list of sources ids (string or long), or a pandas DataFrame.
sampling (ndarray) – 1D array containing the desired sampling in pseudo-wavelengths.
truncation (bool) – Toggle truncation of the set of bases. The level of truncation to be applied is defined by the recommended value in the input files.
with_correlation (bool) – Whether correlation information should be generated.
output_path (Path/str) – Path where to save the output data.
output_file (str) – Name of the output file without extension (e.g. ‘my_file’).
output_format (str) – Desired output format. If no format is given, the output file format will be the same as the input file (e.g. ‘csv’).
save_file (bool) – Whether to save the output in a file. If false, output_format and output_file will be ignored.
username (str) – Cosmos username, only suggested when input_object is a list or ADQL query.
password (str) – Cosmos password, only suggested when input_object is a list or ADQL query.

Returns:

tuple containing:: DataFrame: The values for all sampled spectra. ndarray: The sampling used to convert the input spectra (user-provided or default).

Return type:

(tuple)

Raises:

ValueError – If the sampling is out of the expected boundaries.

get_design_matrices(sampling: ndarray, bases_config: DataFrame) → dict[source]

Get the design matrices corresponding to the input bases.

Parameters:

sampling (ndarray) – 1D array containing the sampling grid.
bases_config (NamedTuple) – An object containing the configuration for all sets of basis functions.

Returns:

The design matrices for the input list of bases.

Return type:

dict

get_unique_basis_ids(parsed_input_data: DataFrame) → set[source]

Get the IDs of the unique basis required to sample all spectra in the input files.

Parameters:: parsed_input_data (DataFrame) – Pandas DataFrame populated with the content of the file containing the mean spectra in continuous representation.
Returns:: A set containing all the required unique basis function IDs.
Return type:: set