oceanum.datamesh.Connector#

class oceanum.datamesh.Connector(token=None, service='https://datamesh.oceanum.io', _gateway=None, user=None, session_duration=None, verify=True)[source]#

Datamesh connector class.

All datamesh operations are methods of this class

Attributes

host

Datamesh host

Methods

__init__(token=None, service='https://datamesh.oceanum.io', _gateway=None, user=None, session_duration=None, verify=True)[source]#

Datamesh connector constructor

Parameters:
  • token (string) – Your datamesh access token. Defaults to os.environ.get(“DATAMESH_TOKEN”, None).

  • service (string) – The datamesh service url. Defaults to os.environ.get(“DATAMESH_SERVICE”, “https://datamesh.oceanum.io”).

  • user (string, optional) – Optional user identifier to be sent in the header for datamesh authentication. Defaults to None.

  • session_duration (float, optional) – The desired length of time for acquired datamesh sessions in seconds. Will be 3600 seconds by default.

  • verify (bool, optional) – Whether to verify the datamesh server certificate. Defaults to True.

Raises:

ValueError – Missing or invalid arguments

delete_datasource(datasource_id)[source]#

Delete a datasource from datamesh. This will delete the datamesh registration and any stored data.

Parameters:

datasource_id (string) – Unique datasource id

Returns:

Return True for successfully deleted datasource

Return type:

boolean

async delete_datasource_async(datasource_id)[source]#

Asynchronously delete a datasource from datamesh. This will delete the datamesh registration and any stored data.

Parameters:

datasource_id (string) – Unique datasource id

Returns:

Return True for successfully deleted datasource

Return type:

boolean

get_catalog(search=None, timefilter=None, geofilter=None, limit=None)[source]#

Get datamesh catalog

Parameters:
  • search (string, optional) – Search string for filtering datasources

  • timefilter (Union[oceanum.datamesh.query.TimeFilter, list], Optional) – Time filter as valid Query TimeFilter or list of [start,end]

  • geofilter (Union[oceanum.datamesh.query.GeoFilter, dict, shapely.geometry], Optional) – Spatial filter as valid Query Geofilter or geojson geometry as dict or shapely Geometry

  • limit (int, optional) – Limit the number of datasources returned. Defaults to None.

Returns:

A datamesh catalog instance

Return type:

oceanum.datamesh.Catalog

async get_catalog_async(search=None, timefilter=None, geofilter=None)[source]#

Get datamesh catalog asynchronously

Parameters:
  • search (string, optional) – Search string for filtering datasources

  • timefilter (Union[oceanum.datamesh.query.TimeFilter, list], Optional) – Time filter as valid Query TimeFilter or list of [start,end]

  • geofilter (Union[oceanum.datamesh.query.GeoFilter, dict, shapely.geometry], Optional) – Spatial filter as valid Query Geofilter or geojson geometry as dict or shapely Geometry

Returns:

A datamesh catalog instance

Return type:

Coroutine<oceanum.datamesh.Catalog>

get_datasource(datasource_id)[source]#

Get a Datasource instance from the datamesh. This does not load the actual data.

Parameters:

datasource_id (string) – Unique datasource id

Returns:

A datasource instance

Return type:

oceanum.datamesh.Datasource

Raises:

DatameshConnectError – Datasource cannot be found or is not authorized for the datamesh key

async get_datasource_async(datasource_id)[source]#

Get a Datasource instance from the datamesh asynchronously. This does not load the actual data.

Parameters:
  • datasource_id (string) – Unique datasource id

  • loop – event loop. default=None will use asyncio.get_running_loop()

  • executorconcurrent.futures.Executor instance. default=None will use the default executor

Returns:

A datasource instance

Return type:

Coroutine<oceanum.datamesh.Datasource>

Raises:

DatameshConnectError – Datasource cannot be found or is not authorized for the datamesh key

load_datasource(datasource_id, parameters={}, use_dask=False)[source]#

Load a datasource into the work environment. For datasources which load into DataFrames or GeoDataFrames, this returns an in memory instance of the DataFrame. For datasources which load into an xarray Dataset, an open zarr backed dataset is returned.

Parameters:
  • datasource_id (string) – Unique datasource id

  • parameters (dict) – Additional datasource parameters

  • use_dask (bool, optional) – Load datasource as a dask enabled datasource if possible. Defaults to False.

Returns:

The datasource container

Return type:

Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset]

async load_datasource_async(datasource_id, parameters={}, use_dask=False)[source]#

Load a datasource asynchronously into the work environment

Parameters:
  • datasource_id (string) – Unique datasource id

  • use_dask (bool, optional) – Load datasource as a dask enabled datasource if possible. Defaults to False.

  • loop – event loop. default=None will use asyncio.get_running_loop()

  • executorconcurrent.futures.Executor instance. default=None will use the default executor

Returns:

The datasource container

Return type:

coroutine<Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset]>

query(query=None, *, use_dask=False, cache_timeout=0, **query_keys)[source]#

Make a datamesh query

Parameters:

query (Union[oceanum.datamesh.Query, dict]) – Datamesh query as a query object or a valid query dictionary

Kwargs:

use_dask (bool, optional): Load datasource as a dask enabled datasource if possible. Defaults to False. cache_timeout (int, optional): Local cache timeout in seconds. Defaults to 0 (no local cache). Only applies if use_dask=False. Will return an identical query from a local cache if available with an age of less than cache_timeout seconds. Does not check for more recent data on the server. **query_keys: Keywords form of query, for example datamesh.query(datasource=”my_datasource”)

Returns:

The datasource container

Return type:

Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset]

async query_async(query, *, use_dask=False, cache_timeout=0, **query_keys)[source]#

Make a datamesh query asynchronously

Parameters:

query (Union[oceanum.datamesh.Query, dict]) – Datamesh query as a query object or a valid query dictionary

Kwargs:

use_dask (bool, optional): Load datasource as a dask enabled datasource if possible. Defaults to False. cache_timeout (int, optional): Local cache timeout in seconds. Defaults to 0 (no local cache). Only applies if use_dask=False. Will return an identical query from a local cache if available with an age of less than cache_timeout seconds. Does not check for more recent data on the server. loop: event loop. default=None will use asyncio.get_running_loop() executor: concurrent.futures.Executor instance. default=None will use the default executor **query_keys: Keywords form of query, for example datamesh.query(datasource=”my_datasource”)

Returns:

The datasource container

Return type:

Coroutine<Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset]>

update_metadata(datasource_id, **properties)[source]#

Update the metadata of a datasource in datamesh

Parameters:
  • datasource_id (string) – Unique datasource id

  • **properties – Additional properties for the datasource - see oceanum.datamesh.Datasource constructor

Returns:

The datasource instance that was updated

Return type:

oceanum.datamesh.Datasource

async update_metadata_async(datasource_id, **properties)[source]#

Update the metadata of a datasource in datamesh asynchronously

Parameters:
  • datasource_id (string) – Unique datasource id

  • **properties – Additional properties for the datasource - see oceanum.datamesh.Datasource constructor

Returns:

The datasource instance that was updated

Return type:

Coroutine<oceanum.datamesh.Datasource>

write_datasource(datasource_id, data, geometry=None, geom=None, append=None, overwrite=False, index=None, crs=None, **properties)[source]#

Write a datasource to datamesh from the work environment

Parameters:
  • datasource_id (string) – Unique datasource id

  • data (Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset, None]) – The data to be written to datamesh. If data is None, just update metadata properties.

  • geom (oceanum.datasource.Geometry, optional) – GeoJSON geometry of the datasource in WGS84 if crs=None else in the specified crs. If not provided the geometry will be infered from the data if possible. default=None

  • coordinates (Dict[oceanum.datasource.Coordinates,str], optional) – Coordinate mapping for xarray datasets. default=None

  • append (string, optional) – Coordinate to append on. default=None

  • overwrite (bool, optional) – Overwrite existing datasource. default=False

  • crs (Union[string,int], optional) – Coordinate reference system for the datasource if not WGS84. The geom argument is also assumed to be in this CRS. default=None

  • **properties – Additional properties for the datasource - see oceanum.datamesh.Datasource

Returns:

The datasource instance that was written to

Return type:

oceanum.datamesh.Datasource

async write_datasource_async(datasource_id, data, append=None, overwrite=False, **properties)[source]#

Write a datasource to datamesh from the work environment asynchronously

Parameters:
  • datasource_id (string) – Unique datasource id

  • data (Union[pandas.DataFrame, geopandas.GeoDataFrame, xarray.Dataset, None]) – The data to be written to datamesh. If data is None, just update metadata properties.

  • geom (oceanum.datasource.Geometry) – GeoJSON geometry of the datasource

  • append (string, optional) – Coordinate to append on. default=None

  • overwrite (bool, optional) – Overwrite existing datasource. default=False

  • **properties – Additional properties for the datasource - see oceanum.datamesh.Datasource constructor

Returns:

The datasource instance that was written to

Return type:

Coroutine<oceanum.datamesh.Datasource>