Data Science Investment Counter —Funding raised by UK data science companies in 2018.
£ 5.640 Million

New machine learning tool could provide answers to some of life’s most fascinating questions

Researchers from the University of Waterloo have created new software that can provide conclusive answers to some of the world’s most interesting questions.

The tool combines supervised machine learning with digital signal processing (ML-DSP), and it is designed to finally answer questions such as how many different species exist on Earth and in the oceans, or how are existing, newly-discovered, and extinct species related to each other.

The program could also positively impact the personalised medicine industry by identifying the specific strain of a virus and then allowing for precise drugs to be developed and prescribed to treat it.

ML-DSP is an alignment-free software tool which transforms a DNA sequence into a digital (numerical) signal and uses digital signal processing methods to distinguish and process these signals.

“With this method even if we only have small fragments of DNA we can still classify DNA sequences, regardless of their origin, or whether they are natural, synthetic, or computer-generated,” said Lila Kari, a professor in Waterloo’s Faculty of Mathematics.

“Another important potential application of this tool is in the healthcare sector, as in this era of personalised medicine we can classify viruses and customise the treatment of a particular patient depending on the specific strain of the virus that affects them.”

The new software was tested against other main classification software tools on two small benchmark datasets and one large 4,322 vertebrate mitochondrial genome dataset.

“Our results show that ML-DSP overwhelmingly outperforms alignment-based software in terms of processing time, while having classification accuracies that are comparable in the case of small datasets and superior in the case of large datasets,” Kari said. “Compared with other alignment-free software, ML-DSP has significantly better classification accuracy and is overall faster.”

The paper containing the new findings is titled ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels, and was recently published in the journal BMC Genomics.

In it, the authors also directed preliminary experiments showing the potential of ML-DSP to be used for other datasets. The system successfully classified 4,271 complete dengue virus genomes into subtypes with 100 per cent accuracy, and 4,710 bacterial genomes into divisions with 95.5 per cent accuracy.

Lila Kari authored the study, together with Western University PhD candidate Gurjit Randhawa and Dr Kathleen Hill, an Associate Professor in the Department of Biology at Western University.

Image via


Co-working space and blog dedicated to all things data science.

Subscribe to our newsletter