In order to evaluate the accuracy of the frequency prediction heuristics, we
manually estimated the values of the arc-weights for the
ASN.1 specification of an e-mail protocol (X.400 P1) (Hoschka, 1995).
We compare these manual frequency estimates with the performance of three
different prediction heuristics:
Figure 1 compares the results of the manual frequency prediction with each of the prediction heuristics for the X.400 benchmark. We first analyze the global quality of the heuristic predictions when compared to the manual prediction. Looking at the most frequently used types to the left of the graphs, we see that in all heuristics the top two most frequently used types are predicted with excellent accuracy.
The largest error is introduced by the array heuristic. With this heuristic, a type that is rarely used in practice is put into the top 20% of the most frequently used types.
This shows that at least for X.400 P1 the assumption that many types will be transmitted as fields of an array does not hold. The definition of an array type does not necessarily mean that an array will actually be transmitted. Array types are used rather to indicate that the arity of a field can be bigger than one, even when it is equal to one in most practical cases. Given that the array construct is the equivalent to a loop in an interface definition, this is an important important observation. It contradicts the hypothesis used for writing optimizers in general-purpose compilers that a program spends most of its time in loops, and that therefore loop optimizations are more important than others.
The excellent prediction results for type frequencies in X.400 P1 are due to the fact that electronic mail messages contain a "central" data type, namely the e-mail address. Addresses are contained in many places of the message: in the sender field, in the recipient field and in the array tracing the message's route through the e-mail transmission system. Consequently, the types making up the address fields are referenced by many other types in the ASN.1 specification. Many distributed applications contain such central data types. We repeated the experiment with two other ASN.1 specification, the X.500 directory service and the Z39.50 information retrieval protocol, and found that the "central" data structure (directory name and record) always ended up in the top 20% of the most frequent types.
Figure 1: Experimental results of comparing prediction heuristics (X.400 P1 benchmark)