Why AMIA worries about assessing biomedical data repositories

The American Medical Informatics Association would like to see the National Institutes of Health consider different metrics to assess data quality and completeness when measuring the value of biomedical digital data repositories.

“In the era of precision medicine, biomedical data repositories are the lifeblood of clinical research,” said AMIA President and CEO Doug Fridsma, MD. “As we begin to re-engineer the clinical research enterprise, it will be critical to objectively differentiate between quality and inferior repositories.”

NIH issued a request for information on metrics and repositories to which AMIA responded with a letter to Francis Collins, MD, NIH’s director, arguing against a “one-size fits all” set of value measures which are insufficient for the task of assessing the value of the wide array of current repositories and those in the future.

Also See: NIH to award Mayo Clinic $142M to create PMI biobank

In NIH’s RFI, the agency asked for is industry feedback on both qualitative and quantitative metrics, such as utilization, impact, quality of service and governance.

“AMIA supports the identified domains of utilization, impact, quality of service and governance,” states the letter to NIH. “However, we recommend NIH view these domains with varying degrees of importance.”

Specifically, AMIA made the following recommendation related to the four domains:

• Utilization: “Overall we do not believe utilization data, regardless of how it is captured, should be a strong indication of value for deposition repositories or knowledge bases. Counts of accesses or downloads may be easily measured, but they are at best proxies for desired outcomes. We encourage NIH to focus on measures that assess data stores in terms of meaningful scientific impact.”

• Impact: “We believe this category of metrics reveal more about a repository’s value than other metrics, and we encourage NIH to look for both innovative and comparable measures. AMIA supports the listed examples to measure impact; however, we recommend that NIH be cognizant of the difficulty in capturing proximal and, more importantly, distal outcomes measures—especially given that such measures are likely to be separated by indeterminate time periods.”

• Quality of service: AMIA supports the listed measures and would encourage RFI reviewers to view quality of service related to repositories similar to other consumer-facing web-based property, such as Amazon or GitHub.

• Governance: Experience to-date indicates that several best-practices are emerging from leading biomedical repositories around development of standard policies, processes and transparency. Data deposition policies, dataset descriptions and transparency around funders, advisors and operations are all important hallmarks of 21st century biomedical research. Insofar as repositories have specific kinds of policies in place and are transparent with regards to management and operations, AMIA would not encourage NIH to be overly prescriptive with expectations of legal requirements. NIH should articulate the kinds of information and policies that are expected to be made available, rather than articulate the contents of those policies.

As the size and volume of biomedical data continues to grow, AMIA believes that repositories will play a critical role in enabling research and promoting biomedical research, but utilization—as measured by downloads or log-ins—rank low as a proxy for value, according to Fridsma. By contrast, impact measures, such as publications from data and patents, are more useful metrics of value.

Further, AMIA contends that it is important to acknowledge that repositories are only useful if they are comprised of quality data, including meta-data. The association would also like to see NIH focus additional attention on data veracity and completeness.

For reprint and licensing requests for this article, click here.