Query & search registries

Find & access data using registries.

Setup

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'r6QrYNk2tYOJNCeE8OGU' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/r6QrYNk2tYOJNCeE8OGU.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'kYLiZUoFQoyEHYUNZScq' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/kYLiZUoFQoyEHYUNZScq.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'lhO04hY93JAfnFQXgKwV' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/lhO04hY93JAfnFQXgKwV.fastq.gz'
Artifact(uid='lhO04hY93JAfnFQXgKwV', description='My fastq', suffix='.fastq.gz', type='dataset', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-06-19 16:17:13 UTC')

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-06-19 16:17:11 UTC')

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 r6QrYNk2tYOJNCeE8OGU None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.125355+00:00
2 kYLiZUoFQoyEHYUNZScq None The iris collection None .parquet dataset DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.265187+00:00
3 lhO04hY93JAfnFQXgKwV None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.272914+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 kYLiZUoFQoyEHYUNZScq None The iris collection None .parquet dataset DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.265187+00:00

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
1 549CThD1JsS9ucxN None Igg2 intestinal intestine cluster. None None notebook None None None None 1 2024-06-19 16:17:18.417329+00:00
9 GUIgBqSZVIzVbI0i None Intestine IgG4 IgA study IgG3 result. None None notebook None None None None 1 2024-06-19 16:17:18.418670+00:00
13 X1oFmAFdiellRw57 None Intestine Brunner's gland IgG3 candidate IgA S... None None notebook None None None None 1 2024-06-19 16:17:18.419302+00:00
22 r8uOjrJEiNSWqLvy None Igg4 Ovaries Vulva intestine visualize IgD Cho... None None notebook None None None None 1 2024-06-19 16:17:18.420753+00:00
42 TBepG1gJSqKcHtxQ None Iga IgG4 result Bowman's gland intestine IgG4 ... None None notebook None None None None 1 2024-06-19 16:17:18.423941+00:00

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 r6QrYNk2tYOJNCeE8OGU None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.125355+00:00

less than/ greater than

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 kYLiZUoFQoyEHYUNZScq None The iris collection None .parquet dataset DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.265187+00:00
3 lhO04hY93JAfnFQXgKwV None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.272914+00:00

or

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 r6QrYNk2tYOJNCeE8OGU None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.125355+00:00
3 lhO04hY93JAfnFQXgKwV None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.272914+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 r6QrYNk2tYOJNCeE8OGU None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.125355+00:00
3 lhO04hY93JAfnFQXgKwV None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.272914+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 lhO04hY93JAfnFQXgKwV None My fastq None .fastq.gz dataset None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.272914+00:00
2 kYLiZUoFQoyEHYUNZScq None The iris collection None .parquet dataset DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.265187+00:00
1 r6QrYNk2tYOJNCeE8OGU None My image None .jpg dataset None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-19 16:17:13.125355+00:00

contains

ln.Transform.filter(name__contains="search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
8 DgPauDDek6ing5ET None Ige IgG IgG3 IgG1 Joints research Choroid plexus. None None notebook None None None None 1 2024-06-19 16:17:18.418513+00:00
10 kmU7BpSb8D6hU09H None Research visualize research classify. None None notebook None None None None 1 2024-06-19 16:17:18.418828+00:00
12 8PeyNQl6OssoaKtz None Bowman'S Gland research IgD result investigate... None None notebook None None None None 1 2024-06-19 16:17:18.419144+00:00
27 6dDm2YSAI9iRPkcG None Hyalocyte Basal cell (stem cell) IgG1 research... None None notebook None None None None 1 2024-06-19 16:17:18.421551+00:00
30 WZygMSTXG8wnOTmS None Igg4 IgG4 research Ovaries cluster IgD IgG2. None None notebook None None None None 1 2024-06-19 16:17:18.422047+00:00
31 fpS6OO10CUafEfta None Joints Outer root sheath IgG4 IgG4 research. None None notebook None None None None 1 2024-06-19 16:17:18.422205+00:00
35 Fai70OPERcURJ1NU None Research IgD IgG3 IgG4 efficiency efficiency. None None notebook None None None None 1 2024-06-19 16:17:18.422838+00:00
44 YcQPJ3OfBcekuYxW None Olfactory Ensheathing Cells efficiency IgG3 re... None None notebook None None None None 1 2024-06-19 16:17:18.424256+00:00
52 Lwt3m7ftFsKvFsRo None Investigate investigate IgA Planum semilunatum... None None notebook None None None None 1 2024-06-19 16:17:18.425513+00:00
68 pFaHp3msWeEqt4iO None Research study Joints visualize IgG IgG2. None None notebook None None None None 1 2024-06-19 16:17:18.428042+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
8 DgPauDDek6ing5ET None Ige IgG IgG3 IgG1 Joints research Choroid plexus. None None notebook None None None None 1 2024-06-19 16:17:18.418513+00:00
10 kmU7BpSb8D6hU09H None Research visualize research classify. None None notebook None None None None 1 2024-06-19 16:17:18.418828+00:00
12 8PeyNQl6OssoaKtz None Bowman'S Gland research IgD result investigate... None None notebook None None None None 1 2024-06-19 16:17:18.419144+00:00
27 6dDm2YSAI9iRPkcG None Hyalocyte Basal cell (stem cell) IgG1 research... None None notebook None None None None 1 2024-06-19 16:17:18.421551+00:00
30 WZygMSTXG8wnOTmS None Igg4 IgG4 research Ovaries cluster IgD IgG2. None None notebook None None None None 1 2024-06-19 16:17:18.422047+00:00
31 fpS6OO10CUafEfta None Joints Outer root sheath IgG4 IgG4 research. None None notebook None None None None 1 2024-06-19 16:17:18.422205+00:00
35 Fai70OPERcURJ1NU None Research IgD IgG3 IgG4 efficiency efficiency. None None notebook None None None None 1 2024-06-19 16:17:18.422838+00:00
44 YcQPJ3OfBcekuYxW None Olfactory Ensheathing Cells efficiency IgG3 re... None None notebook None None None None 1 2024-06-19 16:17:18.424256+00:00
52 Lwt3m7ftFsKvFsRo None Investigate investigate IgA Planum semilunatum... None None notebook None None None None 1 2024-06-19 16:17:18.425513+00:00
68 pFaHp3msWeEqt4iO None Research study Joints visualize IgG IgG2. None None notebook None None None None 1 2024-06-19 16:17:18.428042+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
10 kmU7BpSb8D6hU09H None Research visualize research classify. None None notebook None None None None 1 2024-06-19 16:17:18.418828+00:00
35 Fai70OPERcURJ1NU None Research IgD IgG3 IgG4 efficiency efficiency. None None notebook None None None None 1 2024-06-19 16:17:18.422838+00:00
68 pFaHp3msWeEqt4iO None Research study Joints visualize IgG IgG2. None None notebook None None None None 1 2024-06-19 16:17:18.428042+00:00
125 veUslvVshfEwCZxN None Research Joints IgA visualize. None None notebook None None None None 1 2024-06-19 16:17:18.439955+00:00
184 qY8yEpzUfHsenHfF None Research Brunner's gland Joints IgA. None None notebook None None None None 1 2024-06-19 16:17:18.451583+00:00
229 nqet7PB0VbrowWLe None Research IgA Heat-sensitive sensory neurons Br... None None notebook None None None None 1 2024-06-19 16:17:18.460955+00:00
277 RExU4WoIdO7gbMCS None Research IgE Ovaries IgG4. None None notebook None None None None 1 2024-06-19 16:17:18.468234+00:00
278 XkGP6liA4US87xC3 None Research efficiency IgG3 IgG3. None None notebook None None None None 1 2024-06-19 16:17:18.468383+00:00
281 dTEwcclYMZlwCol9 None Research intestinal candidate IgD IgG4 intesti... None None notebook None None None None 1 2024-06-19 16:17:18.468828+00:00
315 A9Rvyc360aOAGY2s None Research Schwann cells intestinal investigate ... None None notebook None None None None 1 2024-06-19 16:17:18.476506+00:00
379 gjP3hspK2mEGOd4A None Research Hyalocyte research intestinal IgA. None None notebook None None None None 1 2024-06-19 16:17:18.486090+00:00
388 QYomthPtw2WZF201 None Research Planum semilunatum epithelial cell of... None None notebook None None None None 1 2024-06-19 16:17:18.490041+00:00
472 oXY8Hk8yBOTW55WG None Research Bowman's gland IgG3 cluster IgG4. None None notebook None None None None 1 2024-06-19 16:17:18.505119+00:00
499 dKmu5NDLiDyq10Dy None Research Bowman's gland IgG4 Basal cell (stem ... None None notebook None None None None 1 2024-06-19 16:17:18.509206+00:00
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/kYLiZUoFQoyEHYUNZScq.parquet', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/lhO04hY93JAfnFQXgKwV.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/r6QrYNk2tYOJNCeE8OGU.jpg']