IBM Corp. and four major drug manufacturers have donated to the National Institutes of Health a huge database of chemical data extracted from patents and scientific literature.

The database, populated by AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer, contains more than 2.4 million chemical compounds extracted from 4.7 million patents, as well as 11 million biomedical journal abstracts from 1976 to 2000. "The publicly available chemical data can be used by researchers worldwide to gain new insights and enable new areas of research," according to an IBM announcement. "It will also help researchers save time by more efficiently finding information buried in millions of pages of patent documents."

Donated data came from multiple databases and was extracted and brought together in a more uniform form. IBM used an in-house tool, the Strategic IP Insight Platform, or SIIP, to extract the data. That tool originally was developed to organize IBM's own trove of patents, and later further developed for outside purposes, starting with this donation of life sciences data. IBM now is making SIIP commercially available for use in multiple industries.

The chemical data and literature is being donated to the National Center for Biotechnology Information and the Computer-Aided Drug Design Group of the National Cancer Institute. It will be incorporated in NCBI's PubChem public database, and two tools of the National Cancer Institute--the Chemical Structure Lookup Service and the Chemical Identifier Resolver.

The content will be house at A video is available here and information on commercially available SIIP is here.


Register or login for access to this item and much more

All Health Data Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access