Ambitious NIH project aims to improve data for AI-fueled research

‘Bridge to Artificial Intelligence’ program seeks to create complete data sets and identify best practices to support personalized medicine research.


An initiative funded by the National Institutes for Health seeks to improve the quality of data for meaningful AI-assisted research.

The National Institutes of Health has kicked off an ambitious multiyear effort to identify data preparation best practices that could pave the way for robust research using artificial intelligence that could lead to truly personalized medicine.

The “Bridge to Artificial Intelligence,” or Bridge2AI, program recently announced seven awards totaling $30 million to create a “Bridge Center” and four “grand challenge” data generation projects.

These initiatives are designed as initial steps toward “changing the culture of science” by emphasizing the ethical sourcing of data for research from diverse sources, explains Grace Peng, Ph.D., co-coordinator of the new NIH program. The agency expects to eventually invest $130 million in the project over four years if full funding is approved.

Incomplete data

Today, most of the medical data available is incomplete and thus cannot be used for meaningful AI-assisted research that could lead to breakthrough treatment discoveries, Peng says. Available data, whether it’s from electronic health records, X-ray images or voice recordings, generally lacks context – including such details as social issues surrounding the data collected, the type of devices used to collect the data or even the positioning of the patient when a radiological image was taken.

The new project will strive to create “flagship” data sets and best practices for machine learning analysis. The data sets will provide the equivalent of a “nutritional label” that includes the missing details, Peng explains. This will help researchers, for example, account for potential data collection biases based on social or economic issues, she says.

The project also will strive to develop other best practices, including ways to recruit patients to offer their consent for sharing data for research while preserving privacy, the co-coordinator says. Plus, it will develop automated tools to accelerate the creation of “findable, accessible, interoperable and reusable” ethically sourced data sets, the NIH reports.

Using AI for meaningful research toward achieving personalized medicine requires gathering complete data sets “to allow machines to ‘think’ about the data in an intelligent way,” attempting to mimic the way humans infer details about the data, Peng explains.

She stresses that the project is designed to “instill a culture of ethical inquiry throughout the scientific process.” The focus, she adds, is on “data preparation” for next-generation AI models, paving the way for data mining data and, ultimately, making new scientific discoveries that could change the practice of medicine.

“The program will ensure its tools and data do not perpetuate inequities or ethical problems that may occur during data collection and analysis,” according to the NIH. “Researchers will create guidance and standards for the development of ethically sourced, state-of-the-art, AI-ready data sets that have the potential to help solve some of the most pressing challenges in human health, such as uncovering how genetic, behavioral and environmental factors influence a person’s physical condition throughout their life.”

Overcoming AI’s limitations

Although AI is already used in biomedical research in healthcare, its widespread adoption has been limited, in part, by the challenges of applying AI to diverse data types, the NIH reports.

“This is because routinely collected biomedical and behavioral data sets are often insufficient, meaning they lack important contextual information about the data type, collection conditions or other parameters. Without this information, AI technologies cannot accurately analyze and interpret data.”

Another concern, the NIH notes, is that AI technologies that rely on inadequate data may inadvertently incorporate bias or inequities “unless careful attention is paid to the social and ethical contexts in which the data is collected.” That’s why scientists are now working to develop well-described and ethically created data sets, standards and best practices.

Seven projects

The NIH has issued four awards for data-generation projects that will create biomedical and behavioral data sets ready to be used by AI technologies. They’ll also create data standards and tools as well as training materials “that promote a culture of diversity and the use of ethical practices throughout the data-generation process.”

In addition, the NIH issued three awards for work on creating a “Bridge Center” for integration, dissemination and evaluation of the project’s research activities. The center will disseminate products, best-practice and training materials.

“The Bridge2AI program will tap into the power of AI to lead the way toward insights that can ultimately inform clinical decisions and individualize care,” NIH notes in a blog.

“Data generation remains among the greatest challenges that must be resolved for AI to have a real-word impact on medicine,” the blog points out.

More for you

Loading data for hdm_tax_topic #better-outcomes...