Using Ontologies and Identifiers with NWB
Neurophysiology data is full of terms that mean something specific outside of your file: the
species of a subject, the institution that collected the data, the researchers who ran the
experiment, or the brain region a probe was implanted in. Writing these as free text ("mouse",
"the Allen Institute", "V1") is easy to do but hard to compute on — different files spell
the same thing different ways, and a reader has no authoritative reference for what exactly was
meant.
External resources solve this by linking a term in your file to a standardized entry in an
external ontology, registry, or atlas — for example linking the species
"Mus musculus" to its entry in the NCBI Taxonomy. This makes your annotations unambiguous,
machine-readable, and interoperable: tools can group, search, and compare data across files and
labs because everyone points at the same canonical identifier.
In NWB, these links are stored using HDMF’s HERD (HDMF External Resources Data) structure,
which records, for each annotation, the term as it appears in your file together with a compact
identifier (entity_id) and a resolvable URL (entity_uri) for the external entry.
How to add external resources to an NWB file
There are two complementary ways to connect NWB data to external terms, both provided by HDMF:
HERD lets you attach references to existing values in a file — recording that a given attribute or column value corresponds to a specific external term. See the HERD tutorial for a walkthrough of
HERD.add_ref.TermSet lets you validate values as you write them, constraining a field to terms drawn from a chosen ontology. See the TermSet tutorial, and the PyNWB How to Configure Term Validations tutorial for configuring term validation across a file.
The rest of this page covers a question that comes up with both approaches: once you have picked
an external term, what exactly should go in the entity_id and entity_uri fields?
Choosing entity_id and entity_uri
When you annotate data with an external resource using
HERD.add_ref, each reference records two
fields that identify the external term:
entity_idA compact identifier (a CURIE) of the form
prefix:identifier(e.g.NCBITaxon:10090). Theprefixnames the registry or ontology and theidentifieris the term’s accession within it.entity_uriThe full URL that the
entity_idresolves to — a persistent, dereferenceable web address for that exact term.
Recommended practice
Use a CURIE for
entity_id. Prefer an identifier whoseprefixis registered with bioregistry.io. The Bioregistry is a comprehensive registry of prefixes that maps each CURIE to a canonical, resolvable URL, which avoids the ambiguity of the many overlapping identifier schemes (e.g.NCBITaxonvs.taxonomyvs.NCBI_TAXON).Use the resolved URL for
entity_uri. Theentity_urishould be the URL that the CURIE resolves to. You can look this up by resolving the CURIE through the Bioregistry: visitinghttps://bioregistry.io/<entity_id>(for examplehttps://bioregistry.io/NCBITaxon:10090) redirects to the canonical provider URL, which is the value to store inentity_uri.
Keeping entity_id and entity_uri consistent in this way means a reader can both
recognize the registry from the compact entity_id and dereference the entity_uri to land
on an authoritative description of the term.
Commonly used registries
All of the registries below are registered with the Bioregistry. The entity_uri column shows
the canonical URL the example entity_id resolves to.
Prefix |
Use for |
Common NWB field(s) |
Example |
Example |
|---|---|---|---|---|
|
Species |
|
|
|
|
Organizations / institutions |
|
|
|
|
People (researchers) |
|
|
|
|
Brain regions (cross-species) |
Brain-region location fields [1] |
|
|
|
Brain regions (Allen Mouse Brain Atlas) |
Brain-region location fields [1] |
|
|
|
Brain regions (Allen Human Brain Atlas) |
Brain-region location fields [1] |
|
|
|
Dandisets |
(identifies the dataset as a whole) |
|
|
Example
# the species of the subject, mapped to NCBI Taxonomy
herd.add_ref(
container=nwbfile.subject,
attribute="species",
key="Mus musculus",
entity_id="NCBITaxon:10090",
entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090",
)
Resources without individually resolvable URLs
Some resources do not provide a dereferenceable URL for each individual term. For example, many brain atlases (such as the macaque D99 atlas) publish a single document or download for the whole atlas rather than one persistent URL per region.
In that case:
Put the URL of the resource as a whole in
entity_uri(e.g. the atlas’s landing or download page).Put the resource’s identifier for the specific term — for example, the brain area ID used by the atlas — in
entity_id.
This keeps every reference dereferenceable to something authoritative (the resource) while still recording the precise term identifier, even when a per-term URL does not exist.
# a region from an atlas that has no per-region URL: identify the region by its
# atlas-specific ID and point entity_uri at the atlas itself
herd.add_ref(
container=electrodes_table,
attribute="location",
key="area_42",
entity_id="42",
entity_uri="https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/nonhuman/macaque_tempatl/atlas_d99v2.html",
)
See also
HERD for the full API, and
HERD.add_ref for adding references.