nwb_project_analytics.gitstats module

Module for querying GitHub repos

class nwb_project_analytics.gitstats.GitHubRepoInfo(repo)

Bases: object

Helper class to get information about a repo from GitHub

Variables:

repo – a GitRepo tuple with the owner and name of the repo

static collect_all_release_names_and_date(repos: dict, cache_dir: str, read_cache: bool = True, write_cache: bool = True)
get_release_names_and_dates(**kwargs)

Get names and dates of releases :param kwargs: Additional keyword arguments to be passed to self.get_releases

Returns:

Tuple with the list of names as strings and the list of dates as datetime objects

get_releases(use_cache=True)

Get the last 100 release for the given repo

NOTE: GitHub uses pageination. Here we set the number of items per page to 100

which should usually fit all releases, but in the future we may need to iterate over pages to get all the releases not just the latest 100. Possible implementation https://gist.github.com/victorbordo/5581fdfb89ed93bf3eb2b478529b9e38

Parameters:

use_cache – If set to True then return the chached results if computed previously. In this case the per_page parameter will be ignored

Raises:

Error if response is not Ok, e.g., if the GitHub request limit is exceeded.

Returns:

List of dicts with the release data

static get_version_jump_from_tags(tags)

Assuming semantic versioning release tags get the version jumps from the tags

Returns:

OrderedDict

static releases_from_nwb(cache_dir: str, read_cache: bool = True, write_cache: bool = True)
class nwb_project_analytics.gitstats.GitRepo(owner: str, repo: str, mainbranch: str, docs: str | None = None, logo: str | None = None, startdate: datetime | None = None)

Bases: tuple

Named tuple with basic information about a GitHub repository

static compute_issue_time_of_first_response(issue)

For a given GitHub issue compute the time to first respone based on the the issue’s timeline

docs: str

Online documentation for the software

get_commits_as_dataframe(since, github_obj, tqdm)

Get a dataframe for all commits with updates later than the given data

Parameters:
  • since – Datetime object with the date of the oldest issue to retrieve

  • github_obj – PyGitHub github.Github object to use for retrieving issues

  • tqdm – Supply the tqdm progress bar class to use

Returns:

Pandas DataFrame with the commits data

get_issues_as_dataframe(since, github_obj, tqdm=None)

Get a dataframe for all issues with updates later than the given data

Parameters:
  • since – Datetime object with the date of the oldest issue to retrieve

  • github_obj – PyGitHub github.Github object to use for retrieving issues

  • tqdm – Supply the tqdm progress bar class to use

Returns:

Pandas DataFrame with the issue data

property github_issues_url

URL for GitHub issues page

property github_path

https path for the git repo

property github_pulls_url

URL for GitHub pull requests page

URL with the PNG of the logo for the repository

mainbranch: str

The main branch of the repository

owner: str

Owner of the repo on GitHub

repo: str

Name of the repository

startdate: datetime

Some repos start from forks so we want to track statistics starting from then rather than the begining of time

class nwb_project_analytics.gitstats.GitRepos(*arg, **kw)

Bases: OrderedDict

Dict where the keys are names of codes and the values are GitRepo objects

get_info_objects()

Get an OrderedDict of GitHubRepoInfo object from the repos

static merge(o1, o2)

Merge two GitRepo dicts and return a new GitRepos dict with the combined items

class nwb_project_analytics.gitstats.IssueLabel(label: str, description: str, color: str)

Bases: tuple

Named tuple describing a label for issues on a Git repository.

color: str

Hex code of the color for the label

description: str

Description of the lable

label: str

Label of the issue, usually consisting <type>: <level>. <type> indicates the general area the label is used for, e.g., to assign a category, priority, or topic to an issue. <level> then indicates importance or sub-category with the given <type>, e.g., critical, high, medium, low level as part of the priority type

property level

Get the level of the issue, indicating the importance or sub-category of the label within the given self.type, e.g., critical, high, medium, low level as part of the priority type.

Returns:

str with the level or None in case the label does not have a level (e.g., if the label does not contain a “:” to separate the type and level.

property rgb

Color code converted to RGB

Returns:

Tuple of ints with (red, green, blue) color values

property type

Get the type of the issue label indicating the general area the label is used for, e.g., to assign a category, priority, or topic to an issue.

Returns:

str with the type or None in case the label does not have a category (i.e., if the label does not contain a “:” to separate the type and level).

class nwb_project_analytics.gitstats.IssueLabels(*arg, **kw)

Bases: OrderedDict

OrderedDict where the keys are names of issues labels and the values are IssueLabel objects

property colors

Get a list of all color hex codes uses

get_by_type(label_type)

Get a new IssueLabels dict with just the lables with the given category

property levels

Get a list of all level strings used in labels (may include Node)

static merge(o1, o2)

Merger two IssueLabels dicts and return a new IssuesLabels dict with the combined items

property rgbs

Get a list of all rgb color codes used

property types

Get a list of all type strings used in labels (may include None)

class nwb_project_analytics.gitstats.NWBGitInfo

Bases: object

Class for storing basic information about NWB repositories

class property CORE_API_REPOS

Dictionary with the main NWB git repos related the user APIs.

CORE_DEVELOPERS = ['rly', 'bendichter', 'oruebel', 'ajtritt', 'ln-vidrio', 'mavaylon1', 'CodyCBakerPhD', 'stephprince', 'lawrence-mbf', 'dependabot[bot]', 'nwb-bot', 'hdmf-bot', 'pre-commit-ci[bot]']

List of names of the core developers of NWB overall. These are used, e.g., when analyzing issue stats as core developer issues should not count against user issues.

GIT_REPOS = {'HDMF': GitRepo(owner='hdmf-dev', repo='hdmf', mainbranch='dev', docs='https://hdmf.readthedocs.io', logo='https://raw.githubusercontent.com/hdmf-dev/hdmf/dev/docs/source/hdmf_logo.png', startdate=datetime.datetime(2019, 3, 13, 0, 0)), 'HDMF_Common_Schema': GitRepo(owner='hdmf-dev', repo='hdmf-common-schema', mainbranch='main', docs='https://hdmf-common-schema.readthedocs.io', logo=None, startdate=None), 'HDMF_DocUtils': GitRepo(owner='hdmf-dev', repo='hdmf-docutils', mainbranch='main', docs=None, logo=None, startdate=None), 'HDMF_Schema_Language': GitRepo(owner='hdmf-dev', repo='hdmf-schema-language', mainbranch='main', docs='https://hdmf-schema-language.readthedocs.io/', logo=None, startdate=None), 'HDMF_Zarr': GitRepo(owner='hdmf-dev', repo='hdmf-zarr', mainbranch='dev', docs='https://hdmf-zarr.readthedocs.io', logo='https://raw.githubusercontent.com/hdmf-dev/hdmf-zarr/dev/docs/source/figures/logo_hdmf_zarr.png', startdate=None), 'Hackathons': GitRepo(owner='NeurodataWithoutBorders', repo='nwb_hackathons', mainbranch='main', docs='https://neurodatawithoutborders.github.io/nwb_hackathons/', logo=None, startdate=None), 'MatNWB': GitRepo(owner='NeurodataWithoutBorders', repo='matnwb', mainbranch='master', docs='https://neurodatawithoutborders.github.io/matnwb/', logo='https://raw.githubusercontent.com/NeurodataWithoutBorders/matnwb/master/logo/logo_matnwb.png', startdate=None), 'NDX_Catalog': GitRepo(owner='nwb-extensions', repo='nwb-extensions.github.io', mainbranch='main', docs='https://nwb-extensions.github.io/', logo='https://github.com/nwb-extensions/nwb-extensions.github.io/blob/main/images/ndx-logo-text.png', startdate=None), 'NDX_Extension_Smithy': GitRepo(owner='nwb-extensions', repo='nwb-extensions-smithy', mainbranch='master', docs=None, logo=None, startdate=datetime.datetime(2019, 4, 25, 0, 0)), 'NDX_Staged_Extensions': GitRepo(owner='nwb-extensions', repo='staged-extensions', mainbranch='master', docs=None, logo=None, startdate=None), 'NDX_Template': GitRepo(owner='nwb-extensions', repo='ndx-template', mainbranch='main', docs='https://nwb-overview.readthedocs.io/en/latest/extensions_tutorial/2_create_extension_spec_walkthrough.html', logo=None, startdate=None), 'NWBInspector': GitRepo(owner='NeurodataWithoutBorders', repo='nwbinspector', mainbranch='dev', docs='https://nwbinspector.readthedocs.io', logo='https://raw.githubusercontent.com/NeurodataWithoutBorders/nwbinspector/dev/docs/logo/logo.png', startdate=None), 'NWBWidgets': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-jupyter-widgets', mainbranch='master', docs=None, logo='https://user-images.githubusercontent.com/844306/254117081-f20b8c26-79c7-4c1c-a3b5-b49ecf8cce5d.png', startdate=None), 'NWB_Benchmarks': GitRepo(owner='NeurodataWithoutBorders', repo='nwb_benchmarks', mainbranch='main', docs=None, logo=None, startdate=None), 'NWB_GUIDE': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-guide', mainbranch='main', docs='https://github.com/NeurodataWithoutBorders/nwb-guide', logo='https://raw.githubusercontent.com/NeurodataWithoutBorders/nwb-guide/main/src/renderer/assets/img/logo-guide-draft-transparent-tight.png', startdate=datetime.datetime(2022, 11, 21, 0, 0)), 'NWB_Overview': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-overview', mainbranch='main', docs='https://nwb-overview.readthedocs.io', logo=None, startdate=None), 'NWB_Project_Analytics': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-project-analytics', mainbranch='main', docs='https://github.com/NeurodataWithoutBorders/nwb-project-analytics', logo=None, startdate=None), 'NWB_Schema': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-schema', mainbranch='dev', docs='https://nwb-schema.readthedocs.io', logo=None, startdate=None), 'NWB_Schema_Language': GitRepo(owner='NeurodataWithoutBorders', repo='nwb-schema-language', mainbranch='main', docs='https://schema-language.readthedocs.io', logo=None, startdate=None), 'NeuroConv': GitRepo(owner='catalystneuro', repo='neuroconv', mainbranch='main', docs='https://neuroconv.readthedocs.io', logo='https://github.com/catalystneuro/neuroconv/blob/main/docs/img/neuroconv_logo.png', startdate=None), 'PyNWB': GitRepo(owner='NeurodataWithoutBorders', repo='pynwb', mainbranch='dev', docs='https://pynwb.readthedocs.io', logo='https://raw.githubusercontent.com/NeurodataWithoutBorders/pynwb/dev/docs/source/figures/logo_pynwb.png', startdate=None)}

Dictionary with main NWB git repositories. The values are GitRepo tuples with the owner and repo name.

HDMF_START_DATE = datetime.datetime(2019, 3, 13, 0, 0)

HDMF was originally part of PyNWB. As such code statistics before this start date for HDMF reflect stats that include both PyNWB and HDMF and will result in duplicate counting of code stats if PyNWB and HDMF are shown together. For HDMF 2019-03-13 coincides with the removal of HDMF from PyNWB with PR #850 and the release of HDMF 1.0. For the plotting 2019-03-13 is therefore a good date to start considering HDMF stats to avoid duplication of code in statistics, even though the HDMF repo existed on GitHub already since 2019-01-23T23:48:27Z, which could be alternatively considered as the start date. Older dates will include code history carried over from PyNWB to HDMF. Set to None to consider the full history of HMDF but as mentioned, this will lead to some duplicate counting of code before 2019-03-13

MISSING_RELEASE_TAGS = {'MatNWB': [('0.1.0b', datetime.datetime(2017, 11, 11, 0, 0))], 'NWB_Schema': [('2.0.0', datetime.datetime(2019, 1, 19, 0, 0)), ('2.0.0b', datetime.datetime(2017, 11, 11, 0, 0))]}

List of early releases that are missing a tag on GitHub

NWB1_DEPRECATION_DATE = datetime.datetime(2016, 8, 1, 0, 0)

Date when to declare the NWB 1.0 APIs as deprecated. The 3rd Hackathon was held on July 31 to August 1, 2017 at Janelia Farm, in Ashburn, Virginia, which marks the date when NWB 2.0 was officially accepted as the follow-up to NWB 1.0. NWB 1.0 as a project ended about 1 year before that.

NWB1_GIT_REPOS = {'NWB_1.x_Matlab': GitRepo(owner='NeurodataWithoutBorders', repo='api-matlab', mainbranch='dev', docs=None, logo=None, startdate=None), 'NWB_1.x_Python': GitRepo(owner='NeurodataWithoutBorders', repo='api-python', mainbranch='dev', docs=None, logo=None, startdate=None)}

Dictionary with main NWB 1.x git repositories. The values are GitRepo tuples with the owner and repo name.

NWB2_BETA_RELEASE = datetime.datetime(2017, 11, 11, 0, 0)

Date of the first official beta release of NWB 2 as part of SfN 2017

NWB2_FIRST_STABLE_RELEASE = datetime.datetime(2019, 1, 19, 0, 0)

Date of the first official stable release of NWB 2.0

NWB2_START_DATE = datetime.datetime(2016, 8, 31, 0, 0)

Date of the first release of PyNWB on the NWB GitHub. While some initial work was ongoing before that date, this was the first public release of code related to NWB 2.x

NWB_EXTENSION_SMITHY_START_DATE = datetime.datetime(2019, 4, 25, 0, 0)

NWB_Extension_Smithy is a fork with changes. We therefore should count only the sizes after the fork data which based on https://api.github.com/repos/nwb-extensions/nwb-extensions-smithy is 2019-04-25T20:56:02Z

NWB_GUIDE_START_DATE = datetime.datetime(2022, 11, 21, 0, 0)

NWB GUIDE was forked from SODA so we want to start tracking stats starting from that date

STANDARD_ISSUE_LABELS = {'category: bug': IssueLabel(label='category: bug', description='errors in the code or code behavior', color='#ee0701'), 'category: enhancement': IssueLabel(label='category: enhancement', description='improvements of code or code behavior', color='#1D76DB'), 'category: proposal': IssueLabel(label='category: proposal', description='discussion of proposed enhancements or new features', color='#dddddd'), 'compatibility: breaking change': IssueLabel(label='compatibility: breaking change', description='fixes or enhancements that will break schema or API compatibility', color='#B24AD1'), 'help wanted: deep dive': IssueLabel(label='help wanted: deep dive', description='request for community contributions that will involve many parts of the code base', color='#0E8A16'), 'help wanted: good first issue': IssueLabel(label='help wanted: good first issue', description='request for community contributions that are good for new contributors', color='#0E8A16'), 'priority: critical': IssueLabel(label='priority: critical', description='impacts proper operation or use of core function of NWB or the software', color='#a0140c'), 'priority: high': IssueLabel(label='priority: high', description='impacts proper operation or use of feature important to most users', color='#D93F0B'), 'priority: low': IssueLabel(label='priority: low', description='alternative solution already working and/or relevant to only specific user(s)', color='#FEF2C0'), 'priority: medium': IssueLabel(label='priority: medium', description='non-critical problem and/or affecting only a small set of NWB users', color='#FBCA04'), 'priority: wontfix': IssueLabel(label='priority: wontfix', description='will not be fixed due to low priority and/or conflict with other feature/priority', color='#ffffff'), 'topic: docs': IssueLabel(label='topic: docs', description='Issues related to documentation', color='#D4C5F9'), 'topic: testing': IssueLabel(label='topic: testing', description='Issues related to testing', color='#D4C5F9')}