Big data takes aim at fighting sudden infant deaths

Bereaved father, Microsoft data scientists join with team at Seattle Children’s Hospital to seek insights.


John Kahan has a poignant photo in his office at Microsoft, where he oversees customer data and analytics. The image shows Kahan, his wife and three daughters celebrating the birth of a boy, his reddish-blond hair hidden by a hat.

A few hours after the photo was taken, Kahan received a phone call he still has trouble recounting without choking up—his baby son, Aaron, had stopped breathing. He died a few days later, a victim of Sudden Infant Death Syndrome.



Last year, with the 13th anniversary of Aaron's death approaching, Kahan resolved to honor what would have been his only son's bar mitzvah by climbing Mount Kilimanjaro to raise money and awareness for SIDS research. When he returned from the climb, his team had a surprise for him—they had been crunching numbers on infant deaths in the U.S. and using data-analysis algorithms to try and find new ways to reduce the number of babies lost to SIDS each year.

To date, the data scientists have put in about 500 hours of their own time. Kahan's employer, through its giving arm Microsoft Philanthropies, contributed free cloud hosting and software tools for their work.

Now, deploying analysis and data visualization tools that can identify trends, the team has found promising leads in combating SIDS. The technology helped uncover various correlations, for example lining up early prenatal care with a lower rate of deaths. The work also provides more information on such known SIDS risk factors as maternal tobacco use.

"Aaron died 13 years ago. In 13 years, we have not really improved this," said Kahan, who also lobbies Congress to preserve health research funding and open up medical data sets for research. "Which basically means roughly 52,000 children in the U.S. have died, and you have parents like us that sit there and go 'I don't know why.' "

Microsoft is partnering with a research team at Seattle Children's Hospital, led by neuroscientist Nino Ramirez. With access to a lab where they can do things like test different factors on slices of mouse-brain tissue, Ramirez's team is examining which avenues hold up when researched further. Promising work will be published in medical and data science journals with a view to influencing clinical practice.

"The processing power in the cloud, the visualization capabilities and the ability to take data science algorithms at scale and be able to at lightning speed look at correlations—there's no way in God's green earth you could have done that 15 years ago, and even if you could, it would have been massive IBM mainframes all over the place, and you'd be waiting for the output," Kahan says.

Kahan, a former IBM executive, was six months into a new job at Microsoft when his wife Heather, also a former IBM executive, gave birth to Aaron. Because the family knew almost no one in Seattle, then-Microsoft CEO Steve Ballmer and Chief Financial Officer John Connors and their staffs helped with funeral arrangements and medical bill coordination and made sure the family was OK. There had been no indication of trouble during the pregnancy, and a subsequent autopsy showed no reason for Aaron's death. Kahan is re-opening those autopsy results to see if the new work can shed light on what went wrong.

Juan Miguel Lavista, a principal data scientist who works for Kahan, was the parent of a week-old baby girl when he walked into Kahan's office in 2013 and asked about the picture of the baby on his desk. Lavista assumed it was one of Kahan's daughters until he told him about Aaron. Now, Lavista is leading the SIDS project, along with people like Urszula Chajewska, who earlier in her career used machine learning to help ferret out malfunctioning equipment in Intel chip fabrication plants.

Normally companies like Microsoft use these tools to optimize sales or track their businesses, but they are equally useful finding breakthroughs in healthcare. "The work we do at Microsoft is very different from the work on SIDS, but from a data science perspective, it's not any different," Lavista says.

Each year in the U.S., six in 1,000 children die in their first year of life, Kahan says, and one of those six dies from unexplained causes. In the early 1990s, the Back to Sleep campaign encouraged parents to stop putting babies to sleep on their stomachs and led to a significant drop in the number of deaths. Since then, however, the rate of unexplained infant deaths has remained steady.

SIDS is not one condition but rather a confluence of factors that make some babies more vulnerable during a critical period of development, Ramirez says. The syndrome is most common during a baby's second month of life and typically doesn't happen after infants reach their first birthday. With SIDS, some factor prevents the baby from getting enough air, and where a normal child would wake up, these babies do not. Finding the factors that combine in these cases, or identifying which babies are most at risk, can help doctors and parents alter risk factors and more closely monitor babies.

Conventional SIDS studies typically comprise only a few hundred cases. By contrast, the Microsoft team has the ability to mine massive datasets collected by the U.S. Centers for Disease Control and look for correlations that would be hard to see across a smaller pool of families. It's an approach that machine learning and artificial intelligence experts have already been applying to the treatment of cancer and other diseases.

The CDC database has 90 columns of information on every child born in the U.S. between 2004 and 2010—29 million records in all. It notes the medical care of the mother during pregnancy, race, education, income and other factors. When a baby dies, that information is captured, too. The Microsoft data scientists created an interactive web showing the relationship between every single variable related to babies and parents and the correlation to SIDS.

One discovery is that women who access prenatal care in their first trimester have a lower-than-average risk of giving birth to a baby that dies of SIDS. Starting prenatal care later than that increases the risk by 30 to 40 percent. The reason may not be the medical care alone, Chajewska says. Rather it may be that the doctor visit serves to convince pregnant women to do things like quit smoking or take vitamins. But the data help policy makers more precisely weigh the cost of things like free prenatal care against the impact.

"You can go to the politicians and say this is how many more kids are dying because of this. Now, suddenly, it's real," says Ramirez, who directs the hospital's Center for Integrative Brain Research. "If you meet a parent, even if they are 80 years old, it never goes out of their mind the trauma that's created, and here we're talking about millions [of babies]. How can we ignore this?"

The data also show the optimal number of prenatal doctor visits that correlate with the lowest SIDS rate. Similarly, researchers have known for years that mothers who smoke while pregnant give birth to babies with a higher rate of SIDS. But the Microsoft team's data show how much the risk increases with every cigarette smoked daily. Because quitting can be hard, this shows that it can make a difference to at least persuade pregnant women to cut how much they smoke.

Ultimately, Ramirez wants to create an online worksheet on pregnant women that doctors can fill out to get a view into each patient's risk factors for SIDS. Right now, the risks discussed with patients are much more general and based on things like race and age, and can't combine many disparate pieces of information into a richer view of probabilities.

"The (SIDS) field is not that big, and it started with pediatricians, but none of them have a background in data science," Ramirez says. "They have their own databases and they try to bring things together, but the professional look at the data doesn't exist. It exists in genetics and cancer research and it has transformed the field. What we are starting here has transformed a little the SIDS field. "

More for you

Loading data for hdm_tax_topic #better-outcomes...