A Model of Differential Diagnosis in Internal Medicine
This article presents ideas that aim at the design and development of a heuristic method to formalize differential diagnosis, with special emphasis on the MedFrame/CADIAG-IV methodology. Originally, this method has been proposed by an experienced clinician in a draft text. The following text passages describing the differential diagnosis process are taken (and translated) from the original German text:
“Given is the physician’s current level of expertise in her or his medical domain. This expertise comprises a certain number of diseases and manifestations of medical problems (symptoms, lab tests, findings…) as well as the knowledge about relationships (causal, statistical, empirical…) between them. If the physician is confronted with an actual case, she or he matches the actual findings of the patient to the pool of findings in her or his personal experience and takes into consideration a number of possible diseases. Consecutively, this set of possible diseases is explored further by differential diagnosis. This process can be described as follows:
-
Consciously or unconsciously the physician takes into consideration the strength of confirmation of the patient’s findings for the distinct diagnoses. The strength of confirmation as assessed by the physician depends on her or his personal experience for several reasons:
1.1. The more diseases she or he associates with a certain finding, the less is the strength of confirmation of the finding for a certain disease.
1.2. If a finding is observed in only one disease, the existence confirms the respective disease.
1.3. Obviously, the assessment of the strength of confirmation is subjective since it depends on the overall number of diseases in the physician’s personal experience.
Mathematically, the strength of confirmation is proportionate to the frequency of occurrence of the actual finding with the disease under consideration and indirect proportionate to the overall frequency of the finding in the whole spectrum of diseases.
. -
On the other hand, since there is much variation in the frequencies of occurrence of findings with a certain disease, the physician also takes into consideration this quantity.
2.1. The more often a certain finding occurs in a certain disease, the more likely is the disease.
2.2. If a finding is obligatory occurring with a disease, and this finding is not found in the actual case, the respective disease can be excluded.
2.3. Again, the assessment of the frequency of occurrence is subjective since it depends on the physician’s personal experience.”
Now we are in the position to discuss about the concepts of strength of confirmation and frequency of occurrence, where many efforts in medical informatics research have been spent on fixing statistical quantities describing the strength of confirmation and frequency of occurrence of different manifestations in certain diseases (sensitivity). As a point of departure, a good means to examine the frequentistic approach to probability is the calculation of strength of confirmation values by using
contingency tables. The four quadrants in a
contingency table represent (a) the number of individuals that show the symptom
and are diagnosed the disease
, (b) the number of individuals that show the symptom
and are not suffering from the disease
, (c) the number of individuals that do not show the symptom
but are suffering from disease
, and (d) the number of cases that neither do show the symptom
nor are suffering from disease
. The frequentistic strength of confirmation value can be derived from relation as follows:
![]()
In analogy, the frequency of occurrence value is given by:
![]()
Obviously, both values reflect a positive relationship (association) between a symptom
and a disease
. However, medical knowledge also comprises negative associations between medical concepts (e.g., if a patient with abdominal pain does not have fever the diagnostic hypothesis appendicitis becomes less likely though it is still possible, i.e., not excluded). Accordingly, a consideration of negative evidence is highly desirable to model the differential diagnosis process (cf., the MYCIN approach). A closer look at figure reveals that the information about negative associations is implicitly contained in the
contingency table. (see above figure)
The frequentistic “negative” strength of confirmation value (termed strength of exclusion value hereafter) can be derived from the
contingency table as follows:
![]()
whereas the “negative” frequency of occurrence value is given by:
![]()
However, an introduction of negative and positive relationships alone would not be sufficient to formalize the negative evidence of medical concepts. Rather, we assign two values to every medical entity, strength of evidence and strength of counterevidence. Both quantities are fuzzy numbers in
; such that a value of 0 means that there is no evidence (or counterevidence) regarding the respective medical fact, while a value of 1 is interpreted as proof (or exclusion). Intermediate values denote evidence that is not sufficient to prove or exclude the concept.
These two values are independent from each other and it may occur that both evidence and counterevidence have a value greater than 0 (see above figure which also shows how the extreme cases have to be interpreted). This need not be a contradiction if one of these values is less than 1. However, the interpretation of a situation where both evidence and counterevidence can be found will depend on the specific case; such a decision should not be made automatically, but be left to the physician.
Despite of the excellence of this approach, the major impediment to an actual implementation is the problem of providing an unambiguous definition of
. Obviously, the number of patients not suffering from disease
, e.g., acute viral hepatitis, that show a certain symptom, e.g., pruritus, is different in a department of gastroenterology than in a department of dermatology. Furthermore, it is counterintuitive to assume, that in a small sub-speciality of medicine (with only a few known diseases) the strength of confirmation of a certain disease by an unspecific (though in this case highly discriminative) symptom
, e.g., fever, should be assessed as being indirect proportionate to the frequency of the symptom in all known diseases of all medical domains. Certainly, the resulting strength of confirmation values would be systematically too low.
One possible approach uses an extended model of a
contingency table. Here, the “complementary” diseases that constitute
are explicitly listed. The main advantage of this approach is that the diffuse concept of
can be treated more transparently. Accordingly, the calculation of the strength of confirmation can be reformulated as:

If the absolute numbers in the
contingency table are replaced by conditionally independent probabilities
and the diseases
are assumed to be disjunct entities, the frequentistic interpretation of the strength on confirmation value can be derived from Bayes’ rule:

It should be noticed that these assumptions requires an estimation of the a priori probabilities of the diseases. However, in our experience the experts have less difficulties in estimating the
and
values than assessing the
values. Because one of the major assumptions of MedFrame/CADIAG-IV is that very rare diseases should be treated in the same way as frequent disorders (e.g., their relative importance is equal), it is neglected the different a priori probabilities of the diseases and it assumes that all a priori probabilities are equal. Following this assumption, the calculation is even further simplified:

Another assumption of this approach, that has not been mentioned so far, is that the set of diagnoses has to be exhaustive (at least, for the medical domain under consideration). This prerequisite is imposed by the Bayes’ rule. An approximation to this assumption (which can hardly be maintained in a realistic scenario) can be achieved by introducing a weight
that denotes the relative importance (frequency) of a finding in the medical domain under consideration compared to the overall importance (frequency) of the finding in all medical domains. As an illustration, the finding rhagades, which is a banal and unspecific manifestation that can be observed quite frequently in certain liver diseases, is of little significance in the differential diagnosis of liver diseases. By assigning a weight of 0.1 to the finding rhagades, which may be interpreted as the ratio of the frequency of occurrence in liver diseases to the frequency of occurrence in all diseases, the otherwise too high strength of confirmation value (e.g., acute alcoholic hepatitis) can be reduced.

A further extension of how to deal with
is based on a model of multiple diagnostic levels or categories, in medicine referred to as a nosology. It can be argued that the main objective of the strength of confirmation values is to discriminate between diagnostic hypotheses within a certain differential diagnostic group. As can be easily resulted, the natural candidates for differential diagnosis are groups of diseases that are related to each other either by etiology, anatomical structure, or other criteria. If the groups of differential diagnosis candidates are treated as distinct sub-domains of medical knowledge, then each of them can be treated in the way that has been described above. To summarize, taxonomy of disease categories can be used to develop partial characterizations of clinical problems. The use of such a hierarchical structure would enable the development of differential diagnoses in a top down fashion, with higher level nodes of the hierarchy acting as milestones in the diagnostic process. This structure has the inherent advantage of permitting the conceptualization of a clinical problem to be formulated in the most general terms consistent with the data, with refinement of the concept taking place as additional evidence is developed.




