Using Ontologies and Identifiers with NWB

Neurophysiology data is full of terms that mean something specific outside of your file: the species of a subject, the institution that collected the data, the researchers who ran the experiment, or the brain region a probe was implanted in. Writing these as free text ("mouse", "the Allen Institute", "V1") is easy to do but hard to compute on — different files spell the same thing different ways, and a reader has no authoritative reference for what exactly was meant.

External resources solve this by linking a term in your file to a standardized entry in an external ontology, registry, or atlas — for example linking the species "Mus musculus" to its entry in the NCBI Taxonomy. This makes your annotations unambiguous, machine-readable, and interoperable: tools can group, search, and compare data across files and labs because everyone points at the same canonical identifier.

In NWB, these links are stored using HDMF’s HERD (HDMF External Resources Data) structure, which records, for each annotation, the term as it appears in your file together with a compact identifier (entity_id) and a resolvable URL (entity_uri) for the external entry.

How to add external resources to an NWB file

There are two complementary ways to connect NWB data to external terms, both provided by HDMF:

  • HERD lets you attach references to existing values in a file — recording that a given attribute or column value corresponds to a specific external term. See the HERD tutorial for a walkthrough of HERD.add_ref.

  • TermSet lets you validate values as you write them, constraining a field to terms drawn from a chosen ontology. See the TermSet tutorial, and the PyNWB How to Configure Term Validations tutorial for configuring term validation across a file.

The rest of this page covers a question that comes up with both approaches: once you have picked an external term, what exactly should go in the entity_id and entity_uri fields?

Choosing entity_id and entity_uri

When you annotate data with an external resource using HERD.add_ref, each reference records two fields that identify the external term:

entity_id

A compact identifier (a CURIE) of the form prefix:identifier (e.g. NCBITaxon:10090). The prefix names the registry or ontology and the identifier is the term’s accession within it.

entity_uri

The full URL that the entity_id resolves to — a persistent, dereferenceable web address for that exact term.

Commonly used registries

All of the registries below are registered with the Bioregistry. The entity_uri column shows the canonical URL the example entity_id resolves to.

Prefix

Use for

Common NWB field(s)

Example entity_id

Example entity_uri

NCBITaxon

Species

Subject.species

NCBITaxon:10090

http://purl.obolibrary.org/obo/NCBITaxon_10090

ROR

Organizations / institutions

NWBFile.institution

ROR:013meh722

https://ror.org/013meh722

ORCID

People (researchers)

NWBFile.experimenter

ORCID:0000-0002-1825-0097

https://orcid.org/0000-0002-1825-0097

UBERON

Brain regions (cross-species)

Brain-region location fields [1]

UBERON:0001950

http://purl.obolibrary.org/obo/UBERON_0001950

MBA

Brain regions (Allen Mouse Brain Atlas)

Brain-region location fields [1]

MBA:385

https://purl.brain-bican.org/ontology/mbao/MBA_385

HBA

Brain regions (Allen Human Brain Atlas)

Brain-region location fields [1]

HBA:4005

https://purl.brain-bican.org/ontology/hbao/HBA_4005

DANDI

Dandisets

(identifies the dataset as a whole)

DANDI:000015

https://dandiarchive.org/dandiset/000015

Example

# the species of the subject, mapped to NCBI Taxonomy
herd.add_ref(
    container=nwbfile.subject,
    attribute="species",
    key="Mus musculus",
    entity_id="NCBITaxon:10090",
    entity_uri="http://purl.obolibrary.org/obo/NCBITaxon_10090",
)

Resources without individually resolvable URLs

Some resources do not provide a dereferenceable URL for each individual term. For example, many brain atlases (such as the macaque D99 atlas) publish a single document or download for the whole atlas rather than one persistent URL per region.

In that case:

  • Put the URL of the resource as a whole in entity_uri (e.g. the atlas’s landing or download page).

  • Put the resource’s identifier for the specific term — for example, the brain area ID used by the atlas — in entity_id.

This keeps every reference dereferenceable to something authoritative (the resource) while still recording the precise term identifier, even when a per-term URL does not exist.

# a region from an atlas that has no per-region URL: identify the region by its
# atlas-specific ID and point entity_uri at the atlas itself
herd.add_ref(
    container=electrodes_table,
    attribute="location",
    key="area_42",
    entity_id="42",
    entity_uri="https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/nonhuman/macaque_tempatl/atlas_d99v2.html",
)

See also

HERD for the full API, and HERD.add_ref for adding references.