# H. Wellens

## Applications of Procrustes analysis in orthodontic diagnosis and treatment planning

*21-03-2019*

A scientific essay in Medical Sciences

DOCTORAL THESIS defended in public on 21st of March 2019

**SUMMARY**

In **chapter 1 **the background and rationale of this PhD study are explained. Orthodontists still struggle to efficiently distill diagnostic data from cephalometric images and to demonstrate the usefulness of lateral cephalometry in orthodontics. Many explanations have been offered to explain the difficulties associated with analyzing lateral cephalograms, which can broadly be categorized as technical problems, which together complicate accurate and reliable identification of landmarks, and analytical problems associated with the choice and definition of (reference) landmarks and normative values. Another, often overlooked category is geometrical distortion, which might be defined as ‘the geometric phenomenon allowing patients with seemingly identical mandibulomaxillary relationships to exhibit markedly different cephalometric values. The central premise of this thesis is that the role of geometrical distortion in lateral cephalometry can be understood more clearly by considering the land-surveyor analogy. Land surveyors also make extensive use of angular and linear measurements to determine, among others, property boundaries. However, when attempting to apply some of our traditional approaches for assessing sagittal intermaxillary relationships to the cadastral survey (the ANB angle and Wits appraisal), it becomes intuitively clear that inter-individual variation in the location of the associated reference landmarks and planes essentially forms the basis of geometrical distortion. The absence of any tools to avoid positional uncertainty of the reference structures essentially precludes any definitive statement as to the validity of the resulting measurements. The land-surveyor analogy suggests that geometric distortion might constitute one of lateral cephalometry’s more fundamental problems: what exactly is the relevance of landmarking error in lateral cephalometry if the appropriateness of the measurement’s reference landmarks is doubtful to begin with? Secondly, the traditional approach to problem-solving in cephalometrics, simply moving around the ‘cephalometric measuring-tripod’ to a different set of landmarks or planes to perform linear or angular measurements from, would seem to serve little purpose other than shifting positional uncertainty from one set of reference landmarks to the next. Land surveyors cleverly avoid this measurement conundrum by employing a fixed, external reference frame, consisting of benchmarks (i.e. points of which the location has been determined highly accurately in three spatial dimensions). This begs the question whether lateral cephalometry could potentially benefit from imitating the land-surveyors’ approach, by performing the measurements from a fixed reference frame instead of an inter-individual variable one.** **

**Chapter 2 **provides a relatively simple test of the potential merits of the proposed method by comparing the traditional cephalometric measurements to those obtained when exchanging the patient’s reference points and/or planes with those of a Procrustes superimposed template (the 12-year male-female Bolton template): the ‘normalized’ measurements. The conventional and normalized values were calculated in 71 patients (26 males: mean age 13.1 years, SD 1.1 years; 45 females: mean age of 14.6 years, SD 8.2 years). The measurements involved were the ANB angle and Wits appraisal, the individualized ANB angle according to Hussels and Nanda, Järvinen’s floating norm, the APDI (antero-posterior dysplasia index, introduced by Kim and Vietas), perpendicular projections of points A and B onto Hall-Scott’s maxillo-mandibulary bisector, similar projections on palatal plane (as proposed by Ferrazzani), on Frankfort horizontal plane (introduced by Chang), and on the SN-line (Taylor), as well as and Downs’ AB plane angle. A considerable increase was observed in the correlation between the “normalized” measurements, in comparison to the conventional counterparts. As an example, the correlation between the conventional ANB angle and Wits appraisal was a moderate ρ=0.624, compared to ρ=0.972 for the normalized measurements. The increased correspondence between the normalized analyses improved the chances of both tests agreeing on the patient’s sagittal discrepancy. Albeit no true measure of diagnostic performance, the improved correlations thus decreased the possibility of differing, or even totally opposing diagnostic outcomes resulting from their application to nonborderline patients. The proposed methodology therefore seemed to merit further investigation.

The decision to use the Bolton 12-year male-female averaged template in determining the normalized measurements was quite arbitrary and begs the question whether a more population-specific reference frame could be developed, suitable for North-European patients. Further questions arose: should a different reference frame be applied for male and female patients, or for adults versus children? This required scrutinizing and characterizing the patterns of craniofacial variation of the target population, as described in **chapter 3**. One hundred and seventy eight orthodontic patients (79 male, and 99 female) were collected between the ages of 7.5 and 40 years old. Sixteen skeletal landmarks were digitized in each patient, after which the resulting configurations were subjected to generalized Procrustes superimposition. The male and female subgroups were tested for differences in mean shapes and ontogenetic trajectories. The latter pertains to changes in shape, resulting from changes in size. Size, in this context, serves as a proxy for ‘growth and development’. Shape variability was characterized using principal component analysis, applied to the Procrustes superimposed landmark configurations. Furthermore, six different scenarios for craniofacial modularity were tested. The results showed that there were no significant differences in the male and female Procrustes mean shapes (*p*=0.33), although males were on average found to be significantly larger (*p*<0.001). Mild sexual ontogenetic allometric divergence was noted, although the spherical scatter of the male-female point clouds limited the ability to draw definitive conclusions, probably as a result of the cross-sectional nature of the patient sample, without clearly separated age classes. The same sphericity likewise obscured shape differences between older and younger individuals. When controlling for the effects of allometry, the male-female shape difference became statistically highly significant, albeit clinically still very subtle and probably insignificant. Principal component analysis indicated that of the four retained biologically interpretable components, the two most important sources of variability were vertical shape variation (i.e. dolichofacial vs. brachyfacial growth patterns) and sagittal relationships (maxillary prognatism vs. mandibular retrognathism, and vice versa). Additionally, the presence of an anterior and posterior craniofacial columnar module was confirmed, separated by the pterygomaxillary plane, as proposed by Enlow. These modules can be further subdivided into four sub-modules, involving the posterior skull base, the ethmomaxillary complex, a pharyngeal module, and the anterior part of the jaws. In conclusion, this study provided a population specific reference frame (the pooled sample mean shape) and quantified the associated shape variation. It also provided evidence that, at least for diagnostic purposes, the use of a single, pooled reference frame would seem to make sense. Chapters two and three suggested that the geometric morphometric framework, involving generalized Procrustes superimposition (GPS) and principal component analysis (PCA), might be helpful in solving some of lateral cephalometry’s more fundamental problems (i.e. geometric distortion). It was not quite clear however how our traditional cephalometric measures, such as ANB angle, Wits appraisal, or GoGnSN angle, relate to this new tool.** **

**Chapter 4 **aimed to demonstrate how the hitherto unclear relationship between the shape space defined by the first two principal components (resulting from the PCA) and the aforementioned traditional cephalometric measures may be established, and to elucidate possible clinical applications thereof. In the process, it was hoped the proposed methodology would provide further support for the land surveyor analogy, by quantifying, and demonstrating visually how the traditional lateral measures represent compound (and often convoluted) measures of craniofacial shape. Two hundred lateral cephalograms were digitized, after which the resulting landmark configurations were subjected to GPS and PCA. The sample mean shape was then deformed along/parallel to principal components (PCs) 1 and 2, recording the resulting ANB, Wits, and GoGnSN value at each location. This allowed calculating trajectories through the PC1–PC2 space connecting locations with identical values. These were finally utilized to renormalize the PC1–PC2 space. Intriguingly, the resulting Wits appraisal trajectories were almost straight and parallel to PC1. Those for the ANB angle were angled approximately 20 degrees downward relative to PC1, with a more accentuated curvature. The GoGnSN curves were mildly angled relative to the PC2 axis, their curvature increasing slightly with increasing PC1 scores. The trajectories’ curvature and slope, and the changing nature thereof over the PC1-PC2 plane, provides further evidence of often quite complex nature of the craniofacial traits measured by the traditional cephalometric measures. By combining the aforementioned trajectories, it was possible to delineate the region of the PC1–PC2 shape space which would be regarded as normodivergent and skeletal Class I according to traditional lateral cephalometry and to contrast this to those defined by the GPS-PCA approach. Geometric distortion could be avoided by assigning patients the ANB, Wits, or GoGnSN value of the sample mean shape, deformed to the patient’s position within the PC1–PC2 plot. As mentioned above, correlation calculations clearly do not represent a true measure of diagnostic performance. Although encouraging, the results in chapter one therefore provided only indirect evidence of the proposed method’s merits. Diagnostic performance is usually established using Receiver Operating Characteristic (ROC) curve analysis, which plots the sensitivity (or true positive ratio) versus 1-specificity (or false-positive ratio) for a full range of possible values of the diagnostic test`s cut-off value. The area under the resulting curve serves as a measure of diagnostic performance: the larger the surface area under the curve (the closer the curve approaches the upper left corner of the graph), the more powerful the test. One of the biggest hurdles in the application of ROC curve analysis in lateral cephalometry has always been the fact that it requires a gold standard, providing the correct answer to the diagnostic question. Until recently, the latter simply did not seem exist; a problem for which the introduction of the geometric morphometric framework in orthodontics might have provided a convenient solution, as evident from chapter three. Another potential problem is ROC curve analysis’ dichotomous nature, requiring clearly discernible health states in order to provide the black-or-white diagnostic result required to determine the test`s diagnostic power, which would seem to align poorly with the continuous spectrum of facial variation present in the (orthodontic patient) population. Additionally, as evident from the curved, sloped trajectories presented in chapter four, the application of a single, static gold-standard cut-off value for the metric under investigation would seem to make little sense. A floating-norm approach to these cut-off points would seem more appropriate.

The aim of **chapter 5 **therefore was to assess the diagnostic performance of both the conventional and normalized version of the ANB angle and Wits appraisal using an extended version of Receiver Operating Curve (ROC) analysis which renders ROC surfaces, instead of curves. The required ‘gold standard’ was derived statistically, by applying generalized Procrustes superimposition (GPS) and principal component analysis (PCA) to the digitized landmarks, and ordering patients based upon their PC2 scores. The patient sample of chapter four was revisited, consisting of 200 lateral cephalograms (107 males, mean age: 12.8 years, SD: 2.2, 93 females, mean age: 13.2 years, SD: 1.7), which were subjected to GPS and PCA. Upon calculating the conventional and normalized ANB and Wits values, ROC surfaces were constructed by varying not just the cephalometric test’s cut-off value within each ROC curve, but also the gold standard cutoff value over different ROC curves in 220 steps between minus two and two standard deviations along PC2. The volume under the resulting ROC surfaces (VUS) served as a measure of overall diagnostic performance. The statistical significance of the volume differences was determined using permutation tests (1000 rounds, with replacement). Intriguingly, the diagnostic performance of the conventional ANB and Wits was remarkably similar (81.1 and 80.75% VUS, respectively, p>0.05). Normalizing the measurements improved all VUS highly significantly (91 and 87.2 %, respectively, p<0.001). A potentially confusing consequence of changing the gold standard cut-off value as well in ROC surface analysis, is that that in doing so, the conventional ROC curve analysis’ three Class problem (Class I, II or III) is effectively turned into a two Class one (less Class II-more Class III, or more Class II-less Class III). Hence only one ROC curve is reported per measure in ROC surface analyses (which combined create a surface) instead of the usual two for ROC curve analysis (one for Class II/I and one for Class I/III). The conclusion of chapter five was that the conventional ANB and Wits measures of sagittal discrepancy do not differ in their diagnostic performance. Normalizing the measurements did seem to have some merit since the improvements were significant, albeit perhaps not spectacular. The latter may be explained quite simply, due to the fact that the first two principal components explained a considerable amount, but still only part of the variability in craniofacial shape.** **

**Chapter 6 **discusses some methodological issues, and elucidates future perspectives.