Photo by
ThisisEngineering RAEng
on
Unsplash
Scientific progress is notoriously hard to measure. At first glance, we seem to be making great strides:
However, when we look at the effort required to make those discoveries as well as their downstream impacts, we see a different story:
All innovation and technology we enjoy today is the result of science. Vaccines, cars, planes, a cure for cancer, you name it. Before the Wright brothers, flying “like a bird” was thought to be impossible. Now, we complain about the TSA making us remove our shoes.
It’s entirely possible that speeding up the rate of scientific progress can make the difference between intergalactic space travel being possible in our lifetime or not.
In the beginning, science was more local. Most fields of science didn’t exist yet. Of those that did, they didn’t have many sub-fields. Scientific communities were smaller. If you wanted to be a scientist in a particular domain, you moved to wherever those scientists were, and you apprenticed under them. When you had questions, you asked.
Today, science is happening at a massive scale, which has had a few negative effects:
Information overload. The sheer volume of discoveries is unbelievable: about 2.5 million scientific papers are published each year. The more information we have, the more important it is to structure and index it properly. If you ask most scientists how they keep up with all the new ideas nowadays, they'll say that there are too many.
Ian Goodfellow (an esteemed deep learning researcher) said this about keeping up:
Not very long ago I followed almost everything in deep learning, especially while I was writing the textbook. Today that does not seem feasible, and I really only follow topics that are clearly relevant to my own research.
Knowledge fragmentation. Specialization is a natural consequence of completing a PhD. Ideas build on each other. Domains split into subdomains, which split into sub-subdomains, and so on. As a result, even scientists in the same general field can have trouble accessing and understanding each other's work.
Simplicity is not incentivized. More papers to review means journals and conferences take shortcuts when deciding how remarkable a discovery is. Making your discovery seem more innovative and complex can help you get published in the more prestigious journals and conferences.
A metaphor from Chris Olah, a research scientist at Google Brain, on conducting science:
Achieving a research-level understanding of most topics is like climbing a mountain. Aspiring researchers must struggle to understand vast bodies of work that came before them, to learn techniques, and to gain intuition. Upon reaching the top, the new researcher begins doing novel work, throwing new stones onto the top of the mountain and making it a little taller for whoever comes next.
Today, the mountains are
In general, the journey to become (and stay) a scientist requires sifting through and ingesting far more information. There are several tools, like Google and Wikipedia, that help alleviate this problem but don’t solve it completely.
So wait, what's wrong with just Googling what you want to find?
A few things:
And what about Wikipedia?
Wikipedia does help with 2, 3, and 4, but misses two knowledge graph properties: granularity and normalization.
Granularity: Wikipedia is biased towards articles on various subjects - a consistent level of detail. If you just want a summary, it’s too verbose: you won’t read most of the page. If you want details, it’s too high level: you’ll have to go elsewhere for a more tailored source. With knowledge graphs, you can continuously zoom in and out depending on what level of abstraction you need.
Normalization: Part of Wikipedia’s verbosity is due to information duplication. If you go to the Wikipedia page on pigs, it tells you that pigs can “acquire human influenza”. You can also find the same information on the Wikipedia page about the flu: “Influenza may also affect other animals, including pigs...” In one-off use cases, like a quick Google search, this isn’t too big of a deal. However, if you’re traversing from node to node (or page to page), this redundancy becomes a bigger factor and degrades the user experience. By “normalizing” the content (i.e. removing redundancy), that problem is removed.
Let’s do some brainstorming on qualities this tool would ideally preserve:
Storing knowledge this way can result in a whole range of applications. A few examples:
To provide a rough sketch of what I have in mind, here's how a portion of such a graph representing programming languages could look like:
Imagine if every expert built their own knowledge graph with the properties specified beforehand.
Because of the “single source of truth” property for every idea, combining them with other knowledge graphs becomes more straightforward. Duplicate ideas are merged, new ideas are coalesced, and contradictory ideas are distilled into commonalities connected by separate branches.
Eventually, we will have one massive, centralized knowledge graph that the entire world can access.
Thank you to the Compound Writing members who reviewed this post: Joel Christiansen, Stew Fortier, Ross Gordon, Gian Segato, Nick deWilde, Tom White, and Michael Shafer.