The Problem of Medical Knowledge Scale

Any program designed to serve as a medical decision-support system contains a store of medical knowledge, however depending on the breadth of its clinical domain, the number of records in its database can range from a few to many thousands. But there is a question that whether the existing system will be possible to store the requisite number of facts —at a reasonable cost— and to retrieve them in an efficient and effective manner?

To answer this question, we must first ask how much a computer program must “know” before it knows all of general internal medicine? Obviously, any calculation of this sort must be highly speculative, but it seems certain that the program must have available at least that body of information which is contained in a standard textbook of medicine. It appears that every one of two most widely used textbooks of medicine, Principles of Internal Medicine Harrison’s and Textbook of Internal Medicine Cecil-Loeb’s, contains of the order of 200000 facts. This estimate far understates, however, the total amount of information that is relevant to the practice of internal medicine. It is clear, for example, that there is a fund of basic science information used by the clinician which does not appear in such a textbook of medicine. To account for this body of data, we will double our estimate to a total of 400000 facts. Finally, there is a considerable body of information about the real world (life insurance examinations, army physicals, time of day, and seasons of the year) which, we will estimate, requires knowledge of still another 100000 facts. This brings us to a total of 500000 facts. If we now double this value to take cognizance of possible underestimates, we arrive at an upper bound of approximately one million facts as the core body of information in general internal medicine. The core knowledge embodied in the approximately ten separate subspecialties of internal medicine is, of course, considerably larger. To estimate the volume of clinical information basic to the entire domain of subspecialty medicine, we first have estimated the number of facts in textbooks of nephrology, cardiology and hematology. Each of these subspecialty treatises contains of the order of 60000 facts. From this we estimate that the core body of information in all medical subspecialty texts combined is about 600000 facts. If we assume that approximately one third of this information represents duplications among the specialty fields, we arrive at a total body of 400000 facts, a value approximately twice that estimated for general internal medicine. Using the same ratio between facts and other kinds of relevant information that we have used in the case of general medicine, we calculate, correcting again for any possible underestimate, that the core of information in the subspecialties of internal medicine does not exceed two million facts.