Mathematical Models for Medical Diagnosis

Medical informatics has spent much attention to present mathematical models of medical diagnosis, according to the various methods of decision-making and reasoning under uncertainty in the past decades. A brief discussion of these models —based on the conceptual models of medical diagnosis— is given in the following:

  1. Probabilistic Models: One of the first approaches proposed to deal with decision making in medical filed was the Bayes Formula. The basic Bayes formulation assumes that there is a single cause of the patients’ problems, and that it must be one of a set of known hypotheses. Furthermore, it is assumed that the findings or symptoms associated with diseases are conditionally independent. Given a priori probability distribution over a set of possible diseases, and a conditional probability distribution telling how the outcome of a test or examination depends on the diseases, the Bayes’ theorem allows a computation of the a posteriori probability after the outcome of the test is known. Let  be the set of possible diagnoses. It is assumed that this set is exhaustive (i.e., no other diagnoses are possible) and the diagnoses are mutually exclusive. These assumptions assure that:

    The assumption of exhaustiveness requires that the model be complete for the set of diagnostic problems to which it is to be applied. Violations of this requirement can lead to nonsense results, because one of the possible hypotheses must be correct. Unless “other diagnoses” is a valid hypothesis, a problem inappropriately brought to a program assuming exhaustiveness of its hypotheses set will be inappropriately diagnosed. But “other diagnoses” is a difficult hypothesis to characterize —what is its prior likelihood, and what are its predicted manifestations?— The second assumption, that only one disease is present at the same time (mutual exclusivity) is often erroneous in real cases, especially if the patient is suffering from a chronic disease that is accompanied by an acute disorder. The conditional probability of a symptom S given D is simply the probability that S occurs when D occurs. The conditional probability of S given that D is the known diagnosis is written as P(S|D) and is . The Bayes’ rule is then:

    where by the assumption of exhaustiveness that S must be caused by one of the Di, we will have:

    When using Bayesian inference where there are many possible independent observations (symptoms) possible, it is common (though not necessarily correct) to assume conditional independence. Mathematically, two symptoms, S1 and S2 are conditionally independent just in case:

    Intuitively, the symptoms are conditionally independent if they are linked only through the diseases that cause them, but are otherwise unrelated. If the symptoms are not conditionally independent, one has to use the joint conditional probabilities, which are in that case no longer just the product of the individual ones. Thus, if large a number of symptoms are conditionally dependent; a vast number of joint conditional probabilities must be estimated. The ILIAD software was one of the systems that it uses a frame-based version of the Bayes model.
    The restrictive assumptions of the Bayesian approach as described above (exactly a single disease and all symptoms conditionally independent), make effective computation possible, but also make the models unrealistic for many real-world medical problems. Therefore, several research groups were searching for alternative Bayesian approaches. An especially appropriate formulation of the probabilistic inference problem was worked out —by Pearl in the early 1980’s—. The result of this research is now known as the theory of Bayesian Networks. A Bayesian network for a given domain represents the joint probability distribution over the set of variables of the domain, as a set of local distributions combined with a set of conditional independence assertions; that allows the construction of the global joint probability distribution from the local distributions. According to this theory, if the nodes of a network are Xi and the set of parent nodes of X is P(X), then the probability of any particular set of values for all the nodes is:

    where, for each variable Xi is a set of variables that renders Xi and  conditionally independent. Usually, one is not interested in the probability of some state of the whole network in which all nodes have specific values, but in a partial state in which only some nodes have specific values. In this case, it is necessary to sum over all possible states of those nodes whose values are not known. Thus, if the values of the first k of n nodes are not known, then:

    The sum is taken over all possible combinations of values of the nodes  and each term of the sum is computed. The computation of large Bayesian networks is hard to cope with, and therefore many approximation algorithms have been developed. In fact, the evaluation of large Bayes networks is still an area of much research. So far, only some large medical diagnostic applications have been successfully implemented. The main approach of representation of uncertainty in Internist-1/QMR is Bayesian networks.
  2. Evidence Theoretical Models: As already pointed out, the use of Bayesian methods requires either large amounts of valid data or numerous approximations and assumptions, e.g., exhaustiveness and mutual exclusivity of the hypotheses, and statistical conditional independence of the observations. Therefore, the success of Bayesian systems depends to a large extent on the availability of good data. Unfortunately, vast portions of medical knowledge suffer from having so few data and so much imperfect knowledge (e.g., very rare congenital syndromes) that a rigorous probabilistic analysis, the ideal standard by which to judge the rationality of a physicians decisions, is not possible. As a result, some systems were introduced to a non-probabilistic and un-formalized reasoning model, which are based on Evidence Theory. In effect, these models are approximations to conditional probability. Such techniques are not exact, but since the used quantities reflect judgmental (and thus highly subjective) knowledge, a rigorous application of Bayes’ theorem would not necessarily produce more accurate results either. One of the most famous systems based on the evidence theory is MYCIN, where it is nowadays often referred to as the “grandfather” of expert systems. MYCIN has been developed as a consultation system for selection of antibiotic therapy and appropriate management of patients who have infections. The model of inexact reasoning in MYCIN employs a model of evidential strength that is similar to the concept of confirmation. The degree of confirmation of hypothesis h on the basis of evidence e is defined as C[h|e]. This roughly parallels the notation for conditional probability, P(h|e). However, manipulating quantitative confirmation values as though they were probabilities quickly leads to apparent inconsistencies or paradoxes. Especially, it is counterintuitive to suggest that the confirmation of the negation of a hypothesis, e.g., C[h|e], is not 1-C[h,e] and hence, it is assumed that disconfirmation is different from confirmation and must be dealt with differently. This property of the logic of confirmation theory is essential in all models using an evidence theoretical approach. In MYCIN, belief and disbelief (MB and MD) have been chosen as the confirmation and disconfirmation measures.
    * MB[h,e]=x means “the measure of increased belief in the hypothesis x, based on the evidence e, is x
    MD[h,e]=y means “the measure of increased disbelief in the hypothesis y, based on the evidence e, is y
    The evidence e need not be an observed event, but may be a hypothesis (itself subject to confirmation). In accordance with subjective probability theory, the expert’s personal probability P(h) reflects his belief in h at any given time. Thus 1-P(h) can be viewed as an estimate of the expert’s disbelief regarding the truth of h. If P(h|e) is greater than P(h), the observation of e increases the expert’s belief in h while decreasing his disbelief regarding the truth of h. In this case, the increased belief in h is given by the ratio:

    On the other hand, if P(h|e) is less than P(h), the decreased belief (disbelief) in h is given by:

    In addition to these two units of measurement, a third measure, termed Certainty Factor (CF), that combines the MB and MD is defined,

    The certainty factor is an artifact for combining degrees of belief and disbelief into a single number. This number is needed in order to facilitate comparisons of the evidential strength of competing hypotheses. A formal specification of the above definitions of MB[h,e] and MD[h,e] in terms of conditional and a priori probabilities can be written as:


    When the a priori belief in a hypothesis is small (i.e., P(h) is close to zero), the CF of a hypothesis confirmed by evidence e is approximately equal to its conditional probability on that evidence:

    whereas  in this case. This observation suggests that confirmation, to the extent that it is adequately represented by CFs, is close to conditional probability (in certain cases), although it still defies analysis as a probability measure. If two rules (two items of evidence) yield two certainty factors X and Y for a hypothesis h, these are combined by:

    A pointed out by Shortliffe, the CF combines knowledge of both P(h) and P(h|e). Since the experts often have trouble stating P(h) and P(h|e) in quantitative terms, there is reason to believe that a CF that weights both the numbers into a single measure is actually a more natural intuitive concept (e.g., “I don’t know what the probability is that every patient with increased serum bilirubin level suffers from acute viral hepatitis, but every time is see a case of acute viral hepatitis with increased serum bilirubin level my belief is increased by x that this finding can be found in all cases of this disease”).
  3. Fuzzy Set Theory and Fuzzy Logic: L. Zadeh’s theory of fuzzy subsets is an attempt at a mathematical theory of vagueness and uncertainty. Fuzzy set theory is employed to define vague medical entities as fuzzy sets and provides the means for approximate reasoning by adopting the fuzzy compositional rule of inference to calculate the membership grades of patient’s findings to diseases. The relationships between findings and diseases are described by Frequency of Occurrence and Strength of Confirmation values of either statistical or judgmental origin. Furthermore, complex combinations of findings can be evaluated by means of fuzzy logical connectives and are defined with respect to their relationships to diseases.
    In a fuzzy-based model, medical entities are not only present or absent. Thus, when the findings of an actual patient are assessed, each of them is assigned a degree of membership to a set of prototypical conceptual medical entities. The mechanism of the assessment of patient data using fuzzy sets is described by the fuzzy relation:

    denoting the presence relationship between the elements of the non-fuzzy sets of findings S and diseases D. In addition, the confirmation relationship between the findings and diseases is defined as fuzzy relation:

    The fuzzy relation Rs denotes the fuzzy set of the assessed findings on the actual patient after the examination procedure:

    with P denoting the non-fuzzy set of the patient’s findings. The reasoning mechanism consists of a calculation of three different fuzzy indications by means of the composition of fuzzy relations:
    ۱٫ Composition for SiDi (hypotheses and confirmation)

    ۲٫ Composition for SiDi (exclusion by present symptoms)

    ۳٫ Composition for SiDi (exclusion by absent symptoms)

    where, pSi and Dj. Also, all Dj for which  are assumed to be proven, whereas all Dj for which  or  are excluded. All diagnoses Dj where  are treated as diagnostic hypotheses. The boundary value epsilon is a heuristic value which excludes diagnoses with very low evidences (e.g. ). Because the values mu are independent of the number of pieces of evidence that support the diagnostic hypotheses Dj, a heuristic function was introduced which considers the number of symptoms present and absent to a certain degree that show supportive relationships to the diagnostic hypotheses Dj. It calculates support scores SSDj according to which all diagnostic hypotheses are ranked in descending order,

    This un-normalized function considers the degrees of compatibility of given symptoms and weights the associated frequency of occurrence and strength of confirmation according to pre-assigned weights alpha=0.09 and beta=0.91. These weights cause that the impact of the strength of confirmation on the support scores is ten times stronger than the frequency of occurrence. CADIAG-II, a medical diagnostic system based on these methods of fuzzy reasoning, has been successfully applied to several sub-domains of general internal medicine and radiology.
  4. Case-Based Reasoning: It has been observed that physicians often relate the present cases to those seen in the past. From this observation in medicine and other fields grew the paradigm of Case-Based Reasoning (CBR). The underlying idea is the assumption that similar problems have similar solutions; so if a suitable measure of similarity exists, the new case can be related to one or more similar past cases in an appropriately indexed database. Though this assumption is not always true, it holds for many practical domains.
    The Case-Based Reasoning Cycle
    Figure shows the case-based reasoning cycle. As depicted in this figure, case-based reasoner consists of two main tasks: The first is the retrieval, which is the search for or the calculation of most similar cases. If the case base is rather small, a sequential calculation is possible, otherwise faster non-sequential indexing or classification algorithms should be applied. For this task much research has been undertaken in the recent years and actually it has become correspondingly easy to find sophisticated CBR retrieval algorithms adequate for nearly every sort of application problem. The second task, the adaptation (reuse and revision) means a modification of solutions of former similar cases to fit for a current one. If there are no important differences between a current and a similar case, a simple solution transfer is sufficient. Sometimes only few substitutions are required, but sometimes the adaptation is a very complicated process. So far, no general adaptation methods or algorithms have been developed; the adaptation is still absolutely domain dependent. One of the earliest medical expert systems that uses CBR techniques is CASEY. It deals with heart failure diagnosis. The system uses three steps: A search for similar cases, a determination process concerning differences and their evidences between a current and a similar case, and a transfer of the diagnosis of the similar to the current case or —if the differences between both cases are too important— an attempt to explain and modify the diagnosis. If no similar case can be found or if all modification attempts fail, CASEY uses a rule-based domain theory. Still, the weakness of case-based reasoning for medical diagnosis remains the difficulty of producing an appropriate metric for deciding that two cases are similar enough in the aspects that matter for solving the problem at hand. One way of overcoming this problem is to narrow the domain while keeping the database as large as possible so there are always similar if not identical cases to use for reasoning. Thus, all of the case-based programs in the literature are in restricted domains.