lamindb.core.DataFrameAnnotator¶
- class lamindb.core.DataFrameAnnotator(df, columns=FieldAttr(Feature.name), categoricals=None, using=None, verbosity='hint', organism=None)¶
Bases:
object
Annotation flow for a DataFrame object.
- Parameters:
df (
DataFrame
) – The DataFrame object to annotate.columns (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field attribute for the feature column.categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping column names to registry_field.using (
str
|None
, default:None
) – The reference instance containing registries to validate against.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.
Examples
>>> import bionty as bt >>> annotate = ln.Annotate.from_df( df, categoricals={"cell_type_ontology_id": bt.CellType.ontology_id, "donor_id": ln.ULabel.name} )
Attributes¶
- fields: dict¶
Return the columns fields to validate against.
Methods¶
- add_new_from(key, organism=None, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_new_from_columns(organism=None, **kwargs)¶
Add validated & new column names to its registry.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_validated_from(key, organism=None)¶
Add validated categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame.organism (
str
|None
, default:None
) – The organism name.
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(using=None)¶
Lookup categories.
- Parameters:
using (
str
|None
, default:None
) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using” parameter of the validator. if “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(description=None, **kwargs)¶
Save the validated DataFrame and metadata.
- Parameters:
description (
str
|None
, default:None
) – Description of the DataFrame object.**kwargs – Object level metadata.
- Return type:
- Returns:
A saved artifact record.
- save_collection(artifact, name, description=None, reference=None, reference_type=None)¶
Save a collection from artifact/artifacts.
- Parameters:
artifact (
Artifact
|Iterable
[Artifact
]) – One or several saved Artifacts.name (
str
) – Title of the publication.description (
str
|None
, default:None
) – Description of the publication.reference (
str
|None
, default:None
) – Accession number (e.g. GSE#, E-MTAB#, etc.).reference_type (
str
|None
, default:None
) – Source type (e.g. GEO, ArrayExpress, SRA, etc.).
- Return type:
- validate(organism=None)¶
Validate variables and categorical observations.
- Return type:
bool
- Returns:
Whether the DataFrame is validated.