The Spec API

pynwb defines a spec API, which are classes to help generate a valid NWB extension. The NWB Specification Language defines a structure for data and metadata using Groups, Datasets, Attributes, and Links. These structures are mapped onto NWBGroupSpec, NWBDatasetSpec, NWBAttributeSpec, and NWBLinkSpec, respectively. Here, we describe in detail each of these classes, and demonstrate how to use them to create custom neurodata types.

Group Specifications

Most neurodata types are Groups, which act like a directory or folder within the NWB file. A Group can have within it Datasets, Attributes, Links, and/or other Groups. Groups are specified with the NWBGroupSpec class, which provides a python API for specifying the structure for an NWB Group.

from pynwb.spec import NWBGroupSpec

spec = NWBGroupSpec(
    neurodata_type_def='MyType',
    neurodata_type_inc='NWBDataInterface',
    doc='A custom NWB type',
    name='quux',
    attributes=[...],
    datasets=[...],
    groups=[...],
    links=[...]
)

neurodata_type_def and neurodata_type_inc define the neurodata type with the following rules:

neurodata_type_def declares the name of the neurodata type.
neurodata_type_inc indicates what data type you are extending (Groups must extend Groups, and Datasets must extend Datasets).
To define a new neurodata type that does not extend an existing type, use neurodata_type_inc=NWBContainer for a group or neurodata_type_inc=NWBData for a dataset. NWBContainer and NWBData are base types for NWB.
To use a type that has already been defined, use neurodata_type_inc and not neurodata_type_def.
You can define a group that is not a neurodata type by omitting both neurodata_type_def and neurodata_type_inc.

Tip

Although you have the option not to, there are several advantages to defining new groups and neurodata types. Neurodata types can be reused in multiple places in the schema, and can be linked to, while groups that are not neurodata types cannot. You can also have multiple neurodata type groups of the same type in the same group, whereas groups that are not neurodata types are limited to 0 or 1. Most of the time, we would recommend making a group a neurodata type. It is also generally better to extend your neurodata type from an existing type. Look through the NWB schema to see if a core neurodata type would work as a base for your new type. If no existing type works, consider extending NWBDataInterface, which allows you to add the object to a processing module.

Tip

New neurodata types should always be declared at the top level of the schema rather than nesting type definitions. I.e., when creating a new neurodata type it should be placed at the top level of your schema and then included at the appropriate location via neurodata_type_inc. This approach greatly simplifies management of types.

For more information about the options available when specifying a Group, see the API docs for NWBGroupSpec.

Dataset Specifications

All larger blocks of numeric or text data should be stored in Datasets. Specifying datasets is done with NWBDatasetSpec.

from pynwb.spec import NWBDatasetSpec

spec = NWBDatasetSpec(
    doc='A custom NWB type',
    name='qux',
    shape=(None, None),
    attributes=[...]
)

neurodata_type_def, neurodata_type_inc, doc, name, default_name, linkable, quantity, and attributes all work the same as they do in NWBGroupSpec, described in the previous section.

dtype defines the type of the data, which can be a basic type, compound type, or reference type. See a list of dtype options as part of the specification language docs. Basic types can be defined as string objects and more complex types via NWBDtypeSpec and RefSpec.

shape is a specification defining the allowable shapes for the dataset. See the shape specification as part of the specification language docs. None is mapped to null. Is no shape is provided, it is assumed that the dataset is only a single element.

If the dataset is a single element (scalar) that represents meta-data, consider using an Attribute (see below) to store the data more efficiently instead. However, note that a Dataset can have Attributes, whereas an Attribute cannot have Attributes of its own. dims provides labels for each dimension of shape.

Using datasets to specify tables

Row-based tables can be specified using NWBDtypeSpec. To specify a table, provide a list of NWBDtypeSpec objects to the dtype argument.

from pynwb.spec import NWBDatasetSpec, NWBDtypeSpec

spec = NWBDatasetSpec(
    doc='A custom NWB type',
    name='qux',
    attributes=[
        NWBAttributeSpec('baz', 'a value for baz', 'text'),
    ],
    dtype=[
        NWBDtypeSpec('foo', 'column for foo', 'int'),
        NWBDtypeSpec('bar', 'a column for bar', 'float'),
    ],
)

Tip

Column-based tables are also possible and more flexible. See the documentation for DynamicTable.

Attribute Specifications

Attributes are small metadata objects describing the nature and/or intended usage of a Group or Dataset. Attributes are defined in the attributes field of a NWBGroupSpec or NWBDatasetSpec. attributes takes a list of NWBAttributeSpec objects.

from pynwb.spec import NWBAttributeSpec

spec = NWBAttributeSpec(
    name='bar',
    doc='a value for bar',
    dtype='float'
)

NWBAttributeSpec has arguments very similar to NWBDatasetSpec. A key difference is that an attribute cannot be a neurodata type, i.e., the neurodata_type_def and neurodata_type_inc keys are not allowed. The only way to match an object with a spec is through the name of the attribute so name is required. You cannot have multiple attributes on a single group/dataset that correspond to the same NWBAttributeSpec, since these would have to have the same name. Therefore, instead of specifying number of quantity, you have a required field which takes a boolean value. Another key difference between datasets and attributes is that attributes cannot have attributes of their own.

Tip

Dataset or Attribute? It is often possible to store data as either a Dataset or an Attribute. Our best advice is to keep Attributes small. In HDF5 the typical size limit for attributes is 64Kbytes. If an attribute is going to store more than 64Kbyte, then make it a Dataset. Attributes are also more efficient for storing very small data, such as scalars. However, attributes cannot have attributes of their own, and in HDF5, I/O filters, such as compression and chunking, cannot apply to attributes.

Link Specifications

You can store an object in one place and reference that object in another without copying the object using Links, which can be defined using NWBLinkSpec objects.

from pynwb.spec import NWBLinkSpec

spec = NWBLinkSpec(
    doc='my link',
    target_type='ElectricalSeries',
    quantity='?'
)

doc, quantity, and name work similarly to NWBDatasetSpec.

target_type indicates the neurodata type that can be referenced.

Tip

In case you need to store large collections of links, it can be more efficient to create a dataset for storing the links via object references. In NWB, this is used, e.g., in py:class:~pynwb.epoch.TimeIntervals to store collections of references to TimeSeries objects.

Using these functions in create_extension_spec.py and then running that file will generate YAML files that define your extension. If you are a MATLAB user, you are now ready to switch over to MATLAB. Just run generateExtension ('path/to/ndx_name.extension.yaml') and the extension will be automatically generated for you. If you are a Python user, you need to do a little more work to make a Python API that allows you to read and write data according to this extension. The next two sections will teach you how to create this Python API.