Semantic natural language processing and philosophy of science

On 2015 February 18, James Overton visited the STREAM research group in Montreal, where he presented his research into what scientists are doing when give an explanation for something. Many accounts of scientific explanation have been offered by philosophers of science over the years, but Overton’s offering differs in that he set out to establish his account of scientific explanation by actually examining the scentific literature. Specifically, he took a year’s worth of papers from the journal Science, converted them to unformatted text, and then parsed them using the Python Natural Language Toolkit.

Overton’s methods were an analysis of word frequencies and a random sampling of sentences that seem to be making explanations, to see what sorts of data are used to justify what other sorts of claims. The most shocking result, at least for me, was that the word “law” was almost never used in the sample that Dr Overton described. That’s not to say that there is no discussion of natural laws at all, but given how much space the description of laws takes up in most accounts of scientific explanation, this seemed to be a very striking finding at the least.

This technique is very versatile and could be applied to a number of projects, from exploring the nature of scientific explanation, as Dr Overton has done, or even to a more simple project analysing the frequency of phrases like “sorafenib showed a modest effect,” or “adverse events were manageable,” and seeing if there is any relationship between the word chosen and the result being described.

The views, opinions and positions expressed by these authors and blogs are theirs and do not necessarily represent that of the Bioethics Research Library and Kennedy Institute of Ethics or Georgetown University.