It’s nice to be back writing. Believe me. So let’s get into this shall we?
Before Christmas, I arrived at a thought that I couldn’t move past: the problems we’ve been trying to rectify for the past three years aren’t going anywhere. Our public intellectuals and institutions are getting their information from a poisoned stream and they might not be aware of it. As such, the discussions taking place downstream have become a confusing affair. There’s a feeling that we’re sharing the same planet but operating in different worlds.
This problem has been bothering me, and it’s quite clear I’m not the only person concerned about the state of science publishing. Marcia Angell, a former editor of the New England Journal of Medicine said, “it is simply no longer possible to believe much of the clinical research that is published or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as editor of The New England Journal of Medicine.”
I sometimes have to read that quote twice for it to sink in.
Or how about Richard Horton, editor of The Lancet, saying that “the case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. Something has gone fundamentally wrong.”
These statements were made before his medical journal published fake science papers which were later used to derail hydroxychloroquine, a drug likely to be useful against COVID-19. That drug, had it been implemented, may have cost the pharmaceutical industry billions of dollars in profits, something I wrote about here. It’s a subject matter we’ll return to in just a moment, but here’s the thing, knowingly or not, Richard Horton’s publication was an accessory to rampant pharma profiteering at the expense of public health, and he’s cognizant of the problem.
Something is indeed fundamentally wrong.
If you happen to think this way, you’re actually in good company. There’s Ben Goldacre’s incredible work in both Bad Science and Bad Pharma, or John Ioannidis, who credibly made the case that "most published research findings are false." Or Daniele Fanelli who in 2009 produced data suggesting 33.7% of scientists admit to “questionable research practices” and the rate of scientific misconduct is higher in biomedical research. Or a 2003 study published in the British Medical Journal that showed “systematic bias favors products which are made by the company funding the research.”
Is this surprising? Not really. What is surprising is that our media and science institutes act as though it’s not happening. Our knowledge base isn’t functioning and we remain stubbornly blind to the problem.
The clip above offers a great insight into how all of this works. Readers of The Digger will no doubt be familiar with one particular story on how the industry lords over the conclusions of scientific research. There’s an elephant in the room of science publishing, and it’s very convenient for the pharmaceutical industry that it’s never acknowledged. By securing massive influence over the journals, the industry has actually secured massive influence over science itself. With this in mind, it feels somewhat fruitless trying to foster a better understanding of our world if there’s a firehose of industry-tainted data ready to drown the public at a moment’s notice. If we want to make progress on these problems there’s only so much use in trying to draw attention to the poisoned well, we must move upstream to try and fix the leak.
So that’s what I’ve spent the last eight weeks doing. Today, I’ve just about finished the ‘zero to one’ on an application I’ve designed to help address some of these problems. It’s called openpaper.science.
So why have I built this and what does it do?
The first rung on the ladder towards addressing these problems is closing the gap between different disciplines. If we have more people engaging with ‘what the science says’, and combine expertise together, I think we stand a much stronger chance of pollinating new insights about our world. I believe we need better ways to engage the public on the messy process of science, so that journalists, engineers, scientists, doctors, newscasters, podcasters, and everyone in between can have a way to engage in the data.
Why?
Well…if society is entirely dependent on ‘speakers of science’ to get their bearings, we’re all subjected to whatever poison leaks into their water supply. William Tyndale had this insight 500 years ago when he translated the bible from its original languages of Hebrew and Greek into English so that ordinary people could engage with the scripture without being dependent on the word of the church.
“I had perceived by experience, how that it was impossible to stablish the lay people in any truth, except the scripture were plainly laid before their eyes in their mother tongue, that they might see the process, order, and meaning of the text.”
In comparing the revolutionary time in which he lived to our own, the parallels seem remarkable. Here in 2023, jailbreaking scientific literature from its native jargon into something more palatable is one way that the ‘singular speakers of science’ paradigm can be disrupted. It’s one way we might democratize information in the hope to make better make sense of our diverging world.
Last year, I made a few attempts at this with my substack. Could I close the communication gap between disagreeing scientists and the general public? Could I close the gap between disagreeing scientists and doctors? Ideas and important data were not cross-pollinating, and it seemed the phenomenon had its roots in a great failure of communication. So I spent weeks attempting to ‘translate’ a very technical but important paper on some of the lesser-known risks associated with the mRNA products. Agree or disagree with what Professor Peter McCullough and his colleagues argued, one truth was hard to escape: the public had close to zero chance of even knowing of the paper’s existence. Its obscurity was some kind of abstract editorial decision made by some unseen force, a strange process happening automatically in the periphery of our imperfect knowledge system.
Even if the public did hear about the paper, could they understand any of its key points? If they heard about it, wouldn’t it be likely they’d hear a take from one of the “somewhat compromised” academics that Dr. Peter Frost talks about in the clip above?
With this in mind, I rewrote McCullough’s paper into a format that allowed more people to consider it for themselves. This endeavor proved useful because many people contacted me to say the article had given them food for thought. Rewriting key insights from science papers seemed like a fruitful avenue to help engage the public with the science languishing in the shadows of our invisible editor-in-chief. But it’s a huge amount of work to write even one article like this and there are probably hundreds of papers published every day.
Could something be done?
That’s when the penny dropped. With new developments in AI, it should be possible to achieve these ‘translations’ at scale. So I got started, and many weeks later, thanks to my incredible subscribers who stuck by me through complete radio silence, an alpha version is now live and ready at openpaper.science.
So what does it do?
The application reads the latest pre-print papers, for now with a focus on Covid-19, and at the click of a button, these papers can be ‘translated’ into something much friendlier to laypeople. There’s a huge amount of scope because the papers can be rewritten with multiple different audiences in mind.
If users want a science paper summarised for a high school class, or if users want the key points extracted for a podcast, openpaper.science can do that. Just hit that green summarise button, and the app should take care of it. The application checks for a creative-commons license and acts accordingly. Licensing on most science papers means that these summaries must remain private to the user who created them, effectively becoming like private notes. However, many are licensed in a way that allows these adaptations to be shared.
The papers can be summarised in a way that makes sense for undergraduates, postgrads, or any other combination. You can tell the AI what knowledge you already have and therefore which parts of the paper to explain more carefully. For example, a user might request to "explain this paper using concepts familiar to me as an undergrad mathemetician.” We could ask the AI to find the key insights of the paper and suggest “what are the implications of these findings?”
Where things get interesting is in the underlying ‘semantic understanding’ of AI. Because the system ‘understands’ what the papers actually say, it should soon be possible to combine scientific data together in interesting and creative ways; “Show me all the papers with a similar hypothesis and method, but with differing results.” It should be possible to extract data from science papers at scale and dynamically compute that data into something resembling a meta-analysis. These tools, when they arrive, will allow the public to have a very quick sense of ‘what the data says’ on a given topic.
But we’re getting ahead of ourselves here because right now, after a ‘zero to one’ sprint, openpaper.science’s primary feature is its ability to create summaries of scientific papers with a single uneditable ‘prompt’. If you’d like to edit how the AI summarises the paper, that requires a subscription because every single AI query costs actual money. The next step is adding an AI-powered semantic search across the thousands of papers in the database to allow people to find exactly what they’re looking for. Then it makes sense to slowly increase the scope of the dataset across different branches of science.
For the moment, new users have a free tier, the details of which I’m working out. I’ve added a subscription model for people who want immediate access to the custom prompts, and those subscriptions will go toward developing the project. Subscribing to The Digger on substack also helps.
The guiding principle of the app is to allow natural language interaction with scientific literature, with a UI that gently nudges readers back to the original medical literature itself, something I believe to be very important, and here’s why.
There are already many big projects looking at creating biomedical AI language models. Think ‘Chat GPT’ but trained entirely on medical literature. Microsoft are developing one, it’s already open-sourced with the paper and codebase available as of last week. Facebook launched one, only to take it down within a matter of days. Why such nervy behavior? Because a general intelligence that can synthesize knowledge instantly is a very powerful tool. As such, we should expect a mad scramble by ‘the industry’ to get some serious leverage over it.
There is a great and recent example that I think illustrates this beautifully. Take a look at this comment on hacker news which put my jaw on the floor. The science paper explaining Microsoft’s new biomedical AI shared one very interesting question that the researchers fed to their AI model. Requesting their AI to complete the sentence, the researchers wrote, ‘the drug that can treat COVID-19 is ….”
Look at the answer for yourself.
These AI systems ‘know’ that the body of evidence to support the use of hydroxychloroquine is there, so they answer in the affirmative. The only way our current society can ‘know’ that hydroxychloroquine is not a treatment for COVID-19 is because of the massive lobbying power of the pharmaceutical industry. When these AI systems start to produce factually accurate, but politically and commercially inconvenient answers, it will trigger an intellectual immune response from the institutions pontificating on AI ethics.
These tools will become so powerful they could unravel faith in our institutions by unveiling a litany of strange but previously hidden truths. Statements like the hydroxychloroquine one above will be confidently asserted, and it’s going to be as easy as a google search. Can you imagine the kinds of questions the public might ask? And the certainty with which a perfect AI system can answer? It becomes difficult to predict the consequences of the widespread adoption of unfiltered AI tools like this, so we’d be fools to think there won’t be a scramble to filter their output. When those filters come, for right or for wrong, they are certain to also benefit the powers quietly lobbying to put them there.
As a matter of fact, this is already happening with GPT-3. OpenAI is working very hard to put in guard rails that guide the AI’s responses, an endeavor equally matched by a grassroots effort to remove them.
To avoid a scenario in which a small number of people control access to a shrinking source of truth we must ensure we have a seat at the table. Maybe two or three seats for that matter. Standing still will ensure we are left in the dust, that’s why I think that projects like openpaper.science are important; because we’re at the very beginning of what’s possible.
I don’t doubt there will be detractors. Those that say this is a cardinal science sin. That science needs its checks and balances of big journal peer-review. That AI is biased, that it will make mistakes, and can’t be trusted. I’m sure the path forward isn’t going to be simple, but the pandemic highlighted big problems in 'The Science' and the bunny won't go back in the box. The current state of affairs is far from desirable, and the future of AI is inevitable, so really the question is who and how should we ‘control’ these new tools. If we do nothing, our indifference will be exploited by the same powerful forces that shape our current moment.
We need to open up science publishing so it’s more accessible, much less fragmented, and much more open. With the technology now available to us, there ought to be a better way for us to quickly get a handle on ‘what the data says’.
There is much, much more to say, but I don’t think a long article is the best format for that. Perhaps a substack thread, a twitter space, or a podcast might be better. If you’re interested in this project, reach out on this thread. If you know of someone who can support a project like this please forward my article and put me in touch. As ever, shares of my articles help enormously. If you’re not a subscriber, make sure you hit that subscribe button. This is an open thread, please keep it civil.
I like your idea of making difficult to read literature more accessible to the public. It seems like a step in the right direction.
As for the problem "It's no longer possible to believe clinical research" the fundamental problem is that clinical research experiments are not repeated. Edison repeated experiments frequently. Science is not based on trust. Almost all published papers are basically just "Trust me bro this is what I did and this is what it means". Anonymous peer review is simply "these smart guys you don't know say the author is being straight up". That is not science. You describe your experiment in detail so others can repeat it and verify the results for themselves. That is science.
Medical experiments cannot be repeated for two reasons. First they are too expensive. They cost tens of millions of dollars. No one has the resources to repeat a drug trial and verify the results. Secondly, even if you were somehow to gather up the tens of millions of dollars, the ethics committee will forbid you from repeating the experiment. Why? They will argue that you are denying the placebo group a drug of known efficacy and that is immoral. So not only are drug experiment not science, they can never be science under the current constraints.
Since we cannot repeat the experiments we are stuck with "Trust me bro, this is what I did and this is what it means". I think we can all see the problem with that now.
I have been in the pharm and medical device development field for over 50 years. I have gone from the most basic positon of medical writer to director of clinical research, quality assurance and regulatory affairs. I read your article and it appears you missed your own point...that the data and pubs are tainted. That is absolutely true. But your solution does not address that issue. It is good for the 'average' reader to get a good abstract of a publication or series of publications, but that does not solve the problem you identified.