The package currently includes four different functionalities: a calibrator, a converter, a synthetic photometry generator, and a plotter. Two further functionalities, a simulator and a linefinder, are under development and will be released soon.

Input data types

Note: The following information does not apply to the plotter functionality.

The functions in GaiaXPy can receive different kinds of inputs. The ones currently implemented are files, lists , ADQL queries and pandas DataFrames.


The functions accept input files with the extensions: csv, ecsv, fits, and xml. These are files that contain XP continuous raw data as extracted from the Gaia Archive.


Lists are accepted only by calibrate, convert, and generate. These lists have to correspond to a list of source IDs. Both lists of strings and lists of long are accepted.

When a list is passed to one of the tools, the function will internally request the required data for the given sources from the Gaia Archive.

Passing Cosmos credentials (username and password) is optional.

ADQL queries

ADQL queries are accepted only by the calibrate, convert, and generate. Queries need to be passed as strings (e.g.: "select TOP 100 source_id from gaiadr3.gaia_source where has_xp_continuous = 'True'").

Passing Cosmos credentials (username and password) is optional.


DataFrames can be accepted by all the tools available and will work as far as the names of the columns in the DataFrame match the columns used in the files extracted from the Gaia Archive.

Some tools require to have the columns bp_coefficient_correlations and rp_coefficient_correlations. The data in these two columns will be converted to matrices using the function array_to_symmetric_matrix if they come as arrays as it happens for the data served from the archive. No changes will be made if the data in these columns already correspond to matrices.

Generic usage

This section shows how to pass different types of input to a generic function in the package, which represents any of the available functions (calibrate, convert, etc.), and some considerations on output and storage.


In the following code snippet generic_function should be replaced by the name of the function you wish to invoke (calibrate, convert, etc.).


from gaiaxpy import generic_function

# Passing a file
input_file = 'path/to/input/file.extension'
output_data = generic_function(input_file)

# Passing a DataFrame
import pandas as pd
input_file = 'path/to/input/file.extension'
read_df = pd.read_csv(input_file, float_precision='round_trip')
# The data can be modified as far as the names of the columns and the types remain the same.
output_data = generic_function(read_df)

# Passing a list
sources = [1234567890, 0987654321] # Or ['1234567890', '0987654321'] as strings
output_data = generic_function(sources)


Depending on the function being executed, the output can be just one variable for the data; or two, one for the data and another one for the sampling.

from gaiaxpy import generic_function

input_file = 'path/to/input/file.extension'

# Returning one output variable
output_data = generic_function(input_file)

# Returning two variables if it corresponds
output_data, sampling = generic_function(input_file)


The functions have the option save_file which is set to True by default.

The output file has the same extension as the input file unless the user chooses a different output format. In the case of elements that do not have an extension like lists and DataFrames, csv is used by default. The option output_format allows to store the data in the formats avro, csv, ecsv, fits, and xml.

Depending on the format chosen to store the data, the functions will create one or two files. The formats fits and xml will create one file that contains both the data and the sampling. However, the formats avro and csv will generate two files, one for each of the output variables. In this case, the name of the sampling file will include the suffix _sampling.

from gaiaxpy import generic_function

input_file = 'path/to/input/file.extension'
output_data = generic_function(input_file, output_path='my/path', output_file='my_output_name', output_format='fits')

If the function accepts a sampling, it has to correspond to a NumPy array and be passed through the option sampling.

import numpy as np
from gaiaxpy import generic_function

input_file = 'path/to/input/file.extension'
output_data, output_sampling = generic_function(input_file, sampling=np.linspace(0, 100, 1000))


If an output file with the same name as an existing one is created, the data of the previous file will be automatically overwritten.

Note on TOPCAT

TOPCAT can read the FITS and XML output files of the calibrator, converter, and generator. It is possible to plot their contents using TOPCAT.

The functionality that allows to generate these plots is the XYArray Layer Control.

A tutorial on how to work with TOPCAT is available here.


The function calibrate returns a DataFrame of calibrated spectra and a NumPy array with the sampling. The default output file name is 'output_spectra', but the user can choose a different one.

import numpy
from gaiaxpy import calibrate

mean_spectrum_file = 'path/to/mean_spectrum_with_correlation.csv'
calibrated_df, sampling = calibrate(mean_spectrum_file, sampling=np.arange(336, 1021, 2), save_file=False)

The default sampling is np.arange(336, 1021, 2); however, in order to improve the resolution at the blue end, the log-scale sampling numpy.geomspace(330, 1049.9999999999, 361) is proposed as an alternative.

All the available options can be found in calibrate.


The function convert returns a DataFrame where each row corresponds to a converted spectrum, and a NumPy array with the sampling.

from gaiaxpy import convert

mean_spectrum_file = 'path/to/mean_spectrum_with_correlation.csv'
converted_data, sampling = convert(mean_spectrum_file, save_file=False)

There is also a default sampling which is numpy.linspace(0, 60, 600).

from gaiaxpy import convert

mean_spectrum_file = 'path/to/mean_spectrum_with_correlation.csv'
converted_data, sampling = convert(mean_spectrum_file, sampling=numpy.linspace(0, 70, 1000), output_file='my_output_name', output_format='.xml')

All the available options can be found in convert.

Synthetic photometry generator

The synthetic photometry utility uses the method generate to return a DataFrame with the generated synthetic photometry results. Magnitudes, fluxes and flux errors are computed for each filter. The synthetic fluxes are given in units of W nm -1 m -2 for photometric systems on VEGAMAG and W Hz -1 m -2 for systems in AB. See also Gaia Collaboration, Montegriffo et al. 2022, Gaia Data Release 3: The Galaxy in your preferred colours. Synthetic photometry from Gaia low-resolution spectra.

from gaiaxpy import generate, PhotometricSystem

mean_spectrum_file = 'path/to/mean_spectrum_with_correlation.csv'
phot_system = PhotometricSystem.JKC
generated_data = generate(mean_spectrum_file, phot_system, save_file=False)

This table lists the available systems providing references for the passband definitions. The last column indicates the presence of a standardised version of the same set of filters (see Gaia Collaboration, Montegriffo et al. 2022 for details). The asterisk for the HST WFC3 UVIS and ACS WFC systems indicates that only a small selection (f438w, f606w, f814w) of the bands in these two systems have been standardised using the HUGS catalogue (Nardiello, D., et al. 2018, The Hubble Space Telescope UV Legacy Survey of Galactic Globular Clusters - XVII. Public Catalogue Release, 481, 3382–3393). These are available as HST_HUGS in GaiaXPy. No ultraviolet band is provided in the standardised version of the Stromgren system (this is also indicated with an asterisk).

The complete list of the systems included in the package can also be obtained as follows:

from gaiaxpy import PhotometricSystem


Photometric systems requests

Users can request the addition of other photometric systems by raising an issue via GitHub. The main conditions for adding a new system are the following:

  • Only passbands that are fully enclosed in the Gaia BP/RP wavelength range [330, 1050] nm can be reproduced.

  • Requests need to be properly justified. An example: it would be pointless to include a specific set of passbands that is used at a given telescope to approximate the JKC or SDSS systems. Synthetic magnitudes/fluxes (standardised or non-standardised) in these systems can be already obtained with GaiaXPy. On the other hand, it would be useful to include a set of passbands adopted by an existing or forthcoming survey that intends to provide magnitudes in its own “natural” photometric system, or a set aimed at tracing a specific feature/characteristic of the available XP spectra, not covered by already included passbands.

  • The newly added systems will be publicly available to all GaiaXPy users.

  • The new system to be added is specified as follows:

    • one CSV file per passband, containing the following columns: wavelength in nm or Angstrom, total response in arbitrary units.

    • it must be clearly specified if the transmission curves are photonic curves or energy curves (see, e.g., Bessell & Murphy 2012).

    • it must be clearly specified if the desired magnitudes are VEGAMAG or AB mag.

    • a reference for the source of all the above info (especially the transmission curves) must be provided.

All the available options for this method can be found in generate.

Downloadable SVO systems

The Spanish Virtual Observatory (SVO) provides additional files that can be downloaded from their webpage and then loaded into GaiaXPy version 2.0.0 or later.

These files contain additional photometric systems from which synthetic photometry can be generated in the same way it is done with the built-in GaiaXPy systems.

A tutorial on how to use this functionality is available in the Tutorials section of the main GaiaXPy webpage.


This functionality allows to plot the output of the calibrator and converter. It receives the output DataFrame and the output_sampling.

from gaiaxpy import plot_spectra
plot_spectra(output_data, sampling=output_sampling, multi=False, show_plot=True, output_path='/path')

The parameter multi set as True plots all the results in the image, whereas False generates one plot per spectrum in the data. The parameter show_plot shows the images if it is set as True. If a output_path is provided, the plots are automatically saved.

All the available options are described in plot_spectra.