Data Science Investment Counter —Funding raised by UK data science companies in 2018.
£ 5.640 Million

AI startup Primer study finds over 40,000 scientists overlooked by Wikipedia

A new study by artificial intelligence (AI) start-up Primer has revealed that over 40,000 prominent scientists aren’t featured on Wikipedia, with most of them being women.

Conducted in order to demonstrate Primer’s proficiency in Natural Language Processing (NLP), the machine learning program behind the research analysed 30,000 English Wikipedia articles about scientists, their Wikidata entries, and over 3 million sentences from news documents describing them and their work.

It also revealed another issue of online knowledge: information decay. For most of those 30,000 scientists who are on English Wikipedia, the machine learning platform identified relevant information that was missing from their articles.

The human editors of the most important source of public information can be supported by machine learning. Algorithms are already used to detect vandalism and identify underpopulated articles. But the machines can do much more.

The tool is called Quicksilver and, according to Primer’s Director of Science John Bohannon,  would be able to discover and analyse the equivalent of ‘500 million news articles, 39 million scientific papers, all of Wikipedia, and then write 70,000 biographical summaries of scientists.’

It could therefore also be used to maintain Wikipedia entries and keep them updated. Among others, Bohannon reports the example of  Aleksandr Kogan, the Moldovan-born data scientist known for having developed the app behind the Cambridge Analytica scandal.

His page stopped being updated in mid-April, which means that updates about him which includes the fact that he also accessed Twitter data, is yet to be added.

Despite Bohannon’s suggestion of using Quicksilver to manage and update Wikipedia, he stretched the absolute need for the tools to act as an assistant to a human-led process.

“The human editors of the most important source of public information can be supported by machine learning. Algorithms are already used to detect vandalism and identify underpopulated articles. But the machines can do much more.”

To show the results of their new platform, Primer published a sample of 100 short Quicksilver-generated summaries of scientists missing from Wikipedia. On their site, they challenge the public: ‘We’re curious how long it will take before someone creates their articles.’

SHARE THIS ARTICLE:
BY SHACK15

Co-working space and blog dedicated to all things data science.

Subscribe to our newsletter