Data Science Investment Counter —Funding raised by UK data science companies in 2018.
£ 5.640 Million

Scientists use machine learning to spot sources of Salmonella

Salmonella typhimurium. Credit: Volker Brinkmann.

Scientists at the University of Georgia Center for Food Safety in Griffin have created a machine-learning system that could lead to quicker identification of the animal source of certain Salmonella outbreaks.

The research, which was published in the January 2019 issue of Emerging Infectious Diseases, was led by Prof Xiangyu Deng and postdoctoral associate Shaokang Zhang, both working at the centre.

Their team analysed more than a thousand genomes to predict the animal sources, especially livestock, of Salmonella Typhimurium.

According to the Foodborne Disease Outbreak Surveillance System, nearly 3,000 outbreaks of foodborne illness were reported in the United States from 2009 to 2015. Of those, 30 per cent were caused by different serotypes of Salmonella, including Typhimurium, Deng said.

“We had at least three outbreaks of Typhimuirum, or its close variant, in 2018. These outbreaks were linked to chicken, chicken salad and dried coconut,” he said. “There are more than 2,600 serotypes of Salmonella, and Typhimurium is just one of them, but since the 1960s, about a quarter of Salmonella isolates linked to outbreaks reported to U.S. national surveillance are Typhimurium.”

The algorithm used by the scientists is called Random Forest. It scanned more than 1,300 S. Typhimurium genomes with known sources, then learned how to predict certain animal sources of S. Typhimurium genomes.

In order to conduct this study, the scientists used Salmonella Typhimurium genomes from three major surveillance and monitoring programs: the CDC’s PulseNet network; the FDA’s GenomeTrakr database of sources in the United States, Europe, South America, Asia and Africa; and retail meat isolates from the FDA arm of the National Antimicrobial Resistance Monitoring System.

“With so many genomes, machine learning is a natural choice to deal with all these data.” Deng said “We used this big collection of Typhimurium genomes as the training set to build the classifier, which predicts the source of the Typhimurium isolate by interrogating thousands of genetic features of its genome.”

The system predicted the animal source of the S. Typhimurium with an overall accuracy of 83 per cent. The classifier performed best in predicting poultry and swine sources, followed by bovine and wild bird sources. The machine was also able to detect whether its predictions were precise or imprecise. When the predictions were precise, the machine was accurate about 92 per cent of the time, Deng said.

The system currently has limitations, Deng explained. For example, it cannot currently predict seafood as a source and it has difficulty predicting Salmonella strains that “jump around among different animals.”

However, he added that this first attempt could  later be extended and improved further.

“I’d call this approach a proof of concept,” he said, “It will get better as more genomes from various sources become available.”


Co-working space and blog dedicated to all things data science.

Subscribe to our newsletter