LEARNING-BASED LANDMARK ESTIMATION OF 3D BODY SCANS
by
AHMED BARUWA
A THESIS
Presented to the Department of Computer Science
and the Division of Graduate Studies of the University of Oregon
in partial fulfillment of the requirements
for the degree of
Master of Science
December 2023
THESIS APPROVAL PAGE
Student: Ahmed Baruwa
Title: Learning-Based Landmark Estimation of 3D Body Scans
This thesis has been accepted and approved in partial fulfillment of the requirements
for the Master of Science degree in the Department of Computer Science by:
Daniel Lowd Chair
Humphrey Shi Core Member
Susan Sokolowski Core Member
Jacob Searcy Core Member
and
Krista Chronister Vice Provost of Graduate Studies
Original approval signatures are on file with the University of Oregon Division of
Graduate Studies.
Degree awarded December 2023
2
© 2023 Ahmed Baruwa
3
THESIS ABSTRACT
Ahmed Baruwa
Master of Science
Computer Science
December 2023
Title: Learning-Based Landmark Estimation of 3D Body Scans
The use of anatomical landmarks spans a diverse set of applications because
they are essential for understanding the human body. Several research studies have
examined the correlation between body shape variations and human performance.
Anatomical landmarks are useful for taking anthropometric measures that can be
used to characterize body geometries that relate to human performance. In this
thesis, we compare parametric models of the human body that were developed from
two machine learning methods - Convolutional Neural Network (CNN) and the
Lasso Regression Model, to serve as tools for scalable anthropometric measurement.
The models were trained on two publicly available labeled body scan datasets:
Civilian American and European Anthropometry Resource (CAESAR) and
Shape Retrieval Contest (SHREC). The models were used to localize human body
landmarks in several poses. This work provides a scalable approach for collecting
anthropometric measures.
4
CURRICULUM VITAE
NAME OF AUTHOR: Ahmed Baruwa
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED:
University of Oregon
Obafemi Awolowo University
DEGREES AWARDED:
Master of Science, Computer Science, 2023, University of Oregon
Bachelor of Science, Electronic and Electrical Engineering, 2019, Obafemi
Awolowo University
AREAS OF SPECIAL INTEREST:
Data Science
PROFESSIONAL EXPERIENCE:
Data Analyst, Interswitch, 6 months
Research Engineering Intern, InstaDeep, 4 months
Data Scientist, KPMG, 10 months
GRANTS, AWARDS AND HONORS:
PUBLICATIONS:
5
ACKNOWLEDGEMENTS
First, I would like to express my deepest appreciation to Professor Daniel
Lowd, my advisor, whose guidance was instrumental to the success of this project.
I am also incredibly grateful to Professor Susan Sokolowski for her unwavering
support throughout my program. Her passion, foresight, and commitment to
excellence were truly inspiring. The invaluable support she provided during the
critical stages of my program played a pivotal role in my achievements.
Furthermore, I extend my sincere gratitude to Professor Jacob Searcy from
whom I gained extensive knowledge about the ever-growing field of data science.
His willingness to address my numerous inquiries with satisfactory answers was
truly invaluable. Also for sharing the SMPL models discussed in this project.
I am indebted to my closest friends, Wemimo Ayannubi, Fatai Balogun, Seun
Fadugba, Jon Rabourn, and Shama Sama, who stood by me steadfastly throughout
my program. Their steadfast support and encouragement were indispensable, and I
am grateful to the University of Oregon for bringing us together.
To my family, I owe an immeasurable amount of gratitude for their
unwavering belief in me. Their late-night calls, well wishes were vital in keeping
me motivated and determined.
This work was supported by the Wu Tsai Human Performance Alliance and
the Joe and Clara Tsai Foundation.
6
TABLE OF CONTENTS
Chapter Page
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
A Parametric Model of the Human Body . . . . . . . . . . . . . . . 13
Significance of Our Work . . . . . . . . . . . . . . . . . . . . . . . . 13
II. LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
III. METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Skinned Multi-Person Linear Model (SMPL) . . . . . . . . . . . . . 18
Atomistic CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Holistic CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Spatial Transformation with the T-Network . . . . . . . . . . . . . . 20
Least Absolute Shrinkage and Selection Operator (Lasso) Regression 21
IV. RESULTS AND DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . 22
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Civilian American and European Surface Anthropometry Resource . 22
Shape Retrieval Contest (SHREC) 2014 . . . . . . . . . . . . . . . . 23
Fitting Machine Learning Models on 3D Point Clouds . . . . . . . 25
Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 26
7
Chapter Page
Holistic CNN versus Atomistic CNN Training Setting . . . . . . . 26
The Effect of Dataset Sizes on Landmarking Quality . . . . . . . . 29
V. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
APPENDICES
A.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
B.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8
LIST OF FIGURES
Figure Page
1. The three postures in the CAESAR database . . . . . . . . . . . . . . . . 23
2. Average squared Euclidean distance between predicted landmarks and ground
truth landmarks, comparing holistic CNN and atomistic CNN models on
CAESAR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3. Average Euclidean distance between predicted and ground truth landmarks
for holistic CNN against atomistic CNN models on SHREC data . . . 28
9
LIST OF TABLES
Table Page
1. Comparative analysis of measurement errors in lasso regression and atomistic
CNN models on the SHREC dataset. All measurements are in centimeters,
representing Euclidean distances between predicted landmark coordinates
and ground truth coordinates. . . . . . . . . . . . . . . . . . . . . . . 24
2. Comparative analysis of measurement errors in Holistic CNN model with and
without the T-Network on the SHREC dataset. All measurements are in
centimeters, representing Euclidean distances between predicted landmark
coordinates and ground truth coordinates. . . . . . . . . . . . . . . . 24
3. Atomistic CNN vs Surface-to-Surface Registration (STS) for SHREC . . . 25
4. Euclidean distances between predicted and ground truth landmark between
Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters. 32
4. Euclidean distances between predicted and ground truth landmark between
Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters. 33
4. Euclidean distances between predicted and ground truth landmark between
Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters. 34
4. Euclidean distances between predicted and ground truth landmark between
Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters. 35
5. Euclidean distances between predicted and ground truth landmarks for holistic
CNN with and without T-Network . . . . . . . . . . . . . . . . . . . . 36
5. Euclidean distances between predicted and ground truth landmarks for holistic
CNN with and without T-Network . . . . . . . . . . . . . . . . . . . . 37
5. Euclidean distances between predicted and ground truth landmarks for holistic
CNN with and without T-Network . . . . . . . . . . . . . . . . . . . . 38
5. Euclidean distances between predicted and ground truth landmarks for holistic
CNN with and without T-Network . . . . . . . . . . . . . . . . . . . . 39
10
Table Page
6. Atomistic CNN models for SHREC using varied training set sizes . . . . . 40
7. Lasso regression models for SHREC using varied training set sizes . . . . 41
8. Holistic CNN models for SHREC using varied training set sizes . . . . . . 41
9. Lasso regression models for CAESAR using varied training set sizes . . . 42
10. Holistic CNN models for CAESAR using varied training set sizes . . . . . 45
11. Atomistic CNN models for CAESAR using varied training set sizes . . . . 48
11
CHAPTER I
INTRODUCTION
Anatomical landmarks are essential for taking anthropometric measurements
of the human body. Some scholars in the field of human nutrition proposed a
relationship between human performance and 3D body shape, metrics like fat
composition (Ng et al., 2016). Objective attributions to human body performance
through the use of quantitative measures like anatomical landmark locations are
viable ways of understanding human health.
Functional product designers require 3D body measurements to develop
2D product patterns or blueprints that fit the human body appropriately. There
are lots of 3D Body scans that are not landmarked and designers are unable to
design products required for safety and performance using these scans. Accurate
3D measurements of the human body are necessary to facilitate the design of good
products.
Problem Statement
Manual approaches to identifying anatomical landmarks such as palpation,
and handpicking software have been used in the past. In this thesis, we will
investigate the use of machine learning methods to automate the process of
landmarking on 3D bodies. This will be beneficial for the following reasons: rapid
development in medical research can be made and also for functional product
design.
12
A Parametric Model of the Human Body
With the advent of artificial intelligence and the amount of storage available
for recording data in numerous formats, it is possible to fit parametric models using
iterative means to approximate variations in data. We derive parametric models of
the body using a large database of human 3D body scans. Through this research,
we can expect scalability in the process of taking measurements around the human
body which in effect will improve our understanding of the shape of the human
body. In this work, we consider two methods of scalability - the first one uses
convolutional filters to identify locations around the body, and also one that uses
a linear approach to fitting models in the human body.
Significance of Our Work
In our research, we develop machine learning models that localize landmarks
in the human body using raw point cloud features. Most approaches use hand-
engineered features like kernel signatures [Aubry et al., 2011, Yang et al., 2018] to
model the human body. We train two different models - CNN and lasso regression.
In this work, we built and compared the relative performances of an end-to-end
neural network to that of a linear machine learning model 3D scans of the human
body. Using our methods, body landmarks can be identified on multiple point
clouds at a time rapidly which is a scalable alternative to manual procedures.
13
CHAPTER II
LITERATURE REVIEW
The body geometry of humans reflects a lot of things about them. Deductions
can be made about the nature of an individual from their body shape.
A Body Shape Index (ABSI), shown in equation 1, is a formula that is often
used to predict the risk of premature mortality in people using three variables -
waist circumference (WC), body mass index (BMI), and height.
WC
BSI = (2.1)
BMI2/3 ·Height1/2
Grant et al. [2017] derived a relationship between the ABSI of the subjects to
mortality from all causes, including cardiovascular (CVD) conditions and cancer
using data collected across 4056 Australian adults.
Thomas et al. [2013] invented a new metric that is height independent - Body
roundness Index (BRI), as an alternative to body mass index for identifying people
with the risk of visceral adiposity tissue volume (VAT). The authors discovered
this relationship through their study conducted on three quantitative resources
- Anthropometric measurements of human subjects found in the Third National
Health and Nutrition Examination Survey (NHANES III), Magnetic Resonance
Imaging (MRI) measured VAT data of subjects at St. Luke’s Roosevelt Hospital
New York Nutrition Obesity Research Center (NORC) and MRI measured VAT
of subjects pooled from several studies conducted at the Christian Albrecht’s
University in Kiel, Germany.
14
Kuijk et al. [2019] identified shape-dependent risk factors that can cause
posterior cruciate ligament (PCL) injuries. The authors studied the Rosenberg
radiographs of 94 patients with ruptured PCL after carefully studying the shape of
their knees.
Sitko et al. [2023] examined the relationship between anthropometric
variations among road cyclists with different performance levels. Additionally, an
experiment was conducted to evaluate whether anthropometric measurements could
be indicative of physiological markers commonly used to categorize road cyclists
based on their performance levels. The researchers categorized 46 cyclists into
groups based on their VO2 max levels, including recreationally trained, trained,
well-trained, and professional cyclists. Graded exercise tests were conducted, and
comprehensive anthropometric assessments were completed as part of the study.
The length of the leg is often used as an index for measuring human physical
attractiveness among other qualities like the nutritional status of infants, health
status, and fecundity.
[Kiire, 2016] conducted extensive research to investigate the relationship
between Leg-to-body ratio (LBR) and attractiveness in humans. In their study,
they invited 40 male and 40 female Japanese subjects to rate on a scale of 1 to 7
the level attractiveness of 22 human subjects of mixed sexes, and their experiments
showed that measured participants with LBRs closest to the mean LBR were rated
most attractive by human judges.
Machine learning (ML) has seen widespread success in diverse domains
such as speech recognition [Baevski et al., 2020], and self-playing agents [Silver
et al., 2016]. ML continues to show promise in applications that require automatic
15
landmarking on the human body, where several studies have demonstrated its
capabilities for the task.
Pinte Caroline [2021] used an ML model to localize electrode positions from
Magnetic Resonance Imaging (MRI) scans by pre-training it on Ultrashort Echo
time (UTE) sequences of MRI images.
Hargreaves et al. [2021] performed forensic facial reconstruction on fossil
bones using a generative deep learning algorithm trained on a limited amount of
learning data.
Grishchenko et al. [2022] created “BlazePose GHUM Holistic”, a lightweight
neural network pipeline for estimating 3D landmarks from monocular images.
Giachetti et al. [2014] organized a point-localization contest for automatic
landmarking on body scans, however, their method used hand-engineered features
and a small data set.
In this work, we used ML to automatically identify 3D coordinates of
landmarks in point cloud data from 5000 3D body scans from the 2002 Civilian
American and European Surface Anthropometry Resource (CAESAR) database
and SHREC 2014 database. This was done by training a deep neural network
on large databases of scans, where useful features were extracted from the data
and mapped to landmark locations that can be referenced for anthropometric
measurements.
16
CHAPTER III
METHODOLOGY
Point clouds are a rich, yet sparse surface representation of non-rigid bodies.
Each is an array of points with several fields representing points in 3D space - fields
like x,y,z coordinates, r,g,b values of each point.
However, due to the inherent sparsity of point clouds, these data points are
often widely dispersed, leading to inherent challenges in accurately representing
the object’s surface and structure. As a result, a careful amount of attention must
be devoted to the precise measurement of each attribute to ensure the fidelity of
the resulting representation. Notably distinct from other surface representations,
working with point clouds encapsulates a unique set of challenges and advantages.
The sparsity of the data requires specialized techniques for efficient storage,
processing, and analysis. Moreover, the characteristics of point clouds make them
especially suitable for scenarios involving irregular or non-rigid structures, where
conventional mesh-based representations may prove less effective. Point clouds have
found diverse applications, ranging from 3D object reconstruction and augmented
reality to autonomous navigation and environmental monitoring.
This chapter discusses two machine learning approaches used in modeling the
parametric relationship between the structural features of human point cloud data
and geometric landmark locations using large hand-labeled point cloud datasets.
We compared two machine learning models for fitting point clouds - a
convolutional neural network (CNN) and a Lasso regressor. As for the CNN model,
we compared two different training settings of convolutional neural networks -
the Atomistic CNN model which learns individual landmarks with a fixed set of
17
parameters, and a holistic CNN model which learns multiple landmarks with a
fixed set of parameters.
Our CNN models are adapted from PointNet [Charles et al., 2017], a 30-layer
neural network with 2.8 million parameters. It was trained on the ModelNet40
[Wu et al., 2014] for object detection dataset and ShapeNet dataset [Chang et al.,
2015] for part segmentation. PointNet was demonstrated to be a model that
was capable of solving three tasks - object classification, part segmentation, and
semantic segmentation. PointNet is a deep neural network that efficiently works
with point cloud data by exploiting the permutation invariance property of point
clouds, through convolution and maxpooling. The CNN models used in this work
are a slight modification of the PointNet. The output scores of the PointNet are
what we use as landmark coordinates.
Skinned Multi-Person Linear Model (SMPL)
SMPL models [Bogo et al., 2016] are parameterized models that capture pose
and shape variation in 3D bodies. With SMPL models it is possible to obtain a
prior that represents a large set of data by minimizing the reconstruction error
between the model and every point cloud object. We utilized SMPL to extract
representations of the point cloud data, expressed in a low-dimensional space. Each
SMPL point cloud fit contained 10,475 points resulting in an input tensor of size
(10475 x 3) per sample for both the CAESAR and SHREC datasets.
Atomistic CNN Model
In the atomistic CNN training setting, first, we preprocessed the dataset and
fitted a convolutional neural network on the dataset one landmark at a time. We
18
utilized a simple convolutional neural network with 101,123 parameters. During
training, the input to the neural network was a batch of 3D body scan point clouds,
where one of the pre-processing steps was to normalize the point cloud coordinates
to meters, as the CAESAR data had mixed metric scales. We padded the point
clouds to a uniform dimension of 256000 for CAESAR and 50000 for SHREC
datasets. The resulting tensor is then fed to a masking layer which informs the
network of the variable point set sizes of the point clouds in the batch during
training. The tensor was next passed through a stack of 1D convolution layers,
where each convolution used 128 (1x1) filters. The stack of convolution was then
followed by a global max-pooling layer to aggregate point features and reduce
the tensor to a single dimension, effectively making the network permutation
invariant to the input point cloud. The resulting tensor was followed by two fully-
connected (FC) layers. The first FC layer had 512 channels; the second had three
channels. The atomistic CNN model has a total of 101,123 parameters. All hidden
layers were equipped with Leaky-ReLU non-linearity. The training was done by
optimizing the mean squared error (MSE) objective between predicted and ground
truth landmarks using Adam at a learning rate of 1e-3.
Holistic CNN Model
A unified model that approximates multiple landmark locations might be
a fast and transferable alternative to models that can only learn one landmark
location per time. Here, the point cloud undergoes an input transformation, also
feature transformation with the use of a T-network to align the point cloud in the
input space and the embedding space respectively. The T-network comprises three
convolution layers followed by three linear layers, thereby increasing the capacity
19
of the model. It learns a transformation matrix; which effectively aligns the point
cloud and also reduces the input dimension by one. This architecture contains two
stages of transformation - a 64x64 input transformation matrix, and a 3x3 feature
transformation matrix. This feature learns a transformation matrix that aligns the
3D features during training. Finally, the 2D output features are fed into two fully-
connected layers that predict landmark locations. The output dimension of the
entire model is 3*C, where C is the number of landmarks. The holistic CNN model
has a total of 3,518,247 parameters.
Spatial Transformation with the T-Network
The T-network, derived from [Khan et al., 2022], is an innovative neural
network architecture designed to improve the processing of 3D point cloud
information through learned spatial transformations. The T-Network with its
spatial transformer module has demonstrated its effectiveness in various tasks,
such as 3D object recognition, scene understanding, and point cloud segmentation
[Charles et al., 2017]. By leveraging learned spatial transformations, the T-network
enhances the processing of point cloud data, leading to improved results in tasks
that require robust spatial reasoning. The main advantages of the T-Network
adapted for point cloud data are as follows: The T-network can handle varying
orientations, positions, and scales within point cloud data, enhancing the model’s
ability to recognize objects or structures despite different spatial arrangements. The
T-network equips the model with the ability to landmark point clouds with varying
orientations, positions, and scales within point cloud data, enhancing the model’s
ability to recognize objects or structures despite different spatial arrangements.
20
The T-network was specifically featured in the holistic CNN architecture to
capture global context and relationships between points in the point cloud, whereas
for pointwise classification, the focus is on classifying individual points within a
point cloud independently. The spatial relationships between points are less critical,
and the model can often rely on local features. Since the T-network’s strength lies
in capturing global relationships, it might introduce unnecessary complexity to a
pointwise landmarking, without providing substantial benefits.
Least Absolute Shrinkage and Selection Operator (Lasso) Regression
We fitted a lasso regression model on raw point cloud features. A crucial
preprocessing step was replacing missing point cloud features using mean
imputation. We fitted three regressors to represent x, y, and z coordinate labels.
While CNN models were robust to the unordered nature of the point cloud data
and could approximate the landmark locations to a decent level of accuracy, the
lasso regression models struggled due to the fact that they are just linear models
without non-linear activation units nor permutation invariant layers, however, with
the kind of a fixed set of ordering SMPL fits offer, the originally large point-cloud
data was down-sampled by a factor of 25 while preserving the spatial information of
points in the point cloud data. Tables 1 and 2 show the results of landmarking on
SHREC dataset, the Appendix shows the results for CAESAR.
21
CHAPTER IV
RESULTS AND DISCUSSION
In this chapter, we will describe the observations we made through the
analyses of our methods discussed in the previous chapter.
Datasets
Civilian American and European Surface Anthropometry Resource (CAESAR)
The CAESAR [Robinette and Daanen, 1999] project is a large-scale project
carried out to facilitate the design of apparel and equipment. 3D scans were
collected across 5000 individuals (male and female) within the age bracket of 18
to 65 in three countries - the United States of America, the Netherlands, and
Italy to study the common variations in the human body. By understanding the
common body shape variations in humans, product designers, and engineers can
develop discrete sizing for apparel, workstations, and vehicular manufacturing.
Because the survey was taken through 3D cameras and the exposure of the different
parts of the human body varies with posture, each subject’s data was collected in
three separate poses, depicted in figure 2 - the Standing Posture (A-pose) which
the individual stands with the arms slightly abducted away from the body, digits
pointing downwards, the Seated Comfortable Working Posture (B-Pose) in which
the subjects place their arms on their thighs and the Seated Coverage posture
(C-Pose) - the subjects raises their two arms with their digits pointing upward in
a horizontal plane forming a right angle at the elbow and also at the knee joint.
Although this pose is very much less reproducible, it was deemed to create a
22
FIGURE 1. The three postures in the CAESAR database
better surface area coverage of the underarms [Brunsman et al., 1997]. A total of
74 landmark positions were recorded per subject per pose in the database. The
Cyberware WB4 and Vitronic 3D scans were used to take measurements in North
America and Europe respectively. Each 3D scan was represented as a water-tight
mesh averaging 200K vertices per scan. We selected CAESAR partly because it is
the largest commercially available database of human 3D scans to date.
Shape Retrieval Contest (SHREC) 2014
The SHREC dataset was collected towards a contest in shape retrieval and
pattern recognition. Participants were asked to identify six landmarks on 3D body
scans using modern geometry and pattern recognition [Giachetti et al., 2014].
The scans were acquired using a structured light 3D body scanner (Breuckmann
BodyScan). The dataset is split into fifty 3D scans for training and fifty for testing.
The participants were tasked to reproduce the coordinates of each of the six
landmarks on the 3D body scan. The landmarks were manually annotated on the
scans using the Meshlab software [Cignoni et al., 2008] point-picking tool.
23
TABLE 1. Comparative analysis of measurement errors in lasso regression
and atomistic CNN models on the SHREC dataset. All measurements are in
centimeters, representing Euclidean distances between predicted landmark
coordinates and ground truth coordinates.
LANDMARK Lasso Atomistic CNN
mean median 95% mean median 95%
ACROMIALE 6.63 6.50 9.49 2.38 2.11 4.64
ILIOCRISTALE 2.57 2.41 5.64 3.17 3.09 5.81
RADIALE 1.21 0.97 2.69 3.61 2.97 7.32
STYLION 5.32 5.18 7.50 2.85 2.58 5.32
TIBIALE LATERALE 2.87 2.68 5.32 2.51 2.30 4.95
TROCHANTERION 3.11 3.09 6.88 3.12 2.62 4.29
TABLE 2. Comparative analysis of measurement errors in Holistic CNN model
with and without the T-Network on the SHREC dataset. All measurements are
in centimeters, representing Euclidean distances between predicted landmark
coordinates and ground truth coordinates.
LANDMARK Holistic CNN (w/ T-net) Holistic CNN (w/o T-net)
mean median 95% mean median 95%
ACROMIALE 2.70 2.66 4.00 4.14 3.35 7.43
ILIOCRISTALE 3.21 3.54 4.79 3.65 3.65 5.95
RADIALE 5.21 5.80 6.05 6.25 6.21 9.09
STYLION 5.01 5.01 7.69 5.23 4.47 10.10
TIBIALE LATERALE 2.58 2.44 3.59 2.82 2.89 4.60
TROCHANTERION 3.04 3.10 4.54 3.11 2.45 4.93
24
TABLE 3. Comparative analysis of measurement errors in atomistic CNN
models on the SHREC dataset and Surface-to-Surface Registration (STS). All
measurements are in centimeters, representing Euclidean distances between
predicted landmark coordinates and ground truth coordinates.
LANDMARK STS Atomistic CNN
mean median mean median
ACROMIALE 1.24 1.16 2.38 2.11
ILIOCRISTALE 2.29 2.00 3.17 3.09
RADIALE 2.72 2.31 3.61 2.97
STYLION 2.47 1.72 2.85 2.58
TIBIALE LATERALE 1.56 1.55 2.51 2.30
TROCHANTERION 2.58 2.52 3.12 2.62
Training Machine Learning Models to Fit Landmarks on 3D Point
Clouds
First, we trained a convolutional neural network (CNN) to fit the landmarks.
Next, we fitted a Lasso regression model on the datasets. We report performance
in terms of the average Euclidean distances between the predicted and ground
truth landmark position in the SHREC dataset shown in Tables 1 and 2. While
the atomistic CNN model outperformed the Lasso regression model in fitting the
four of the SHREC landmarks, it performed worse on two particular landmarks -
“Illiocristale” and “Radiale”. This is shown in Table 1. However, the reverse is the
case for CAESAR - Lasso regression model outperformed the atomistic CNN model
on 50 out of 74 CAESAR landmarks. Our observation on the usage of T-Network
in the CNN architecture is that it improves the model performance, by up to 53%
on the SHREC dataset as shown in Table 2. In Table 3, we compare our best-
performing model on SHREC (Atomistic CNN) with surface-to-surface registration
(STS) Giachetti et al. [2014] and observe the relatively close error of the atomistic
CNN models with the STS on two landmarks - “Radiale” and “Trochanterion”.
25
The STS method outperformed atomistic CNN by a significant margin of about
1.8cm on other landmarks. A crucial preprocessing step was normalizing the
measurements, converting those that were originally recorded in millimeters to
meters as 25% of the CAESAR measurements were recorded in millimeters. Tables
4 and 5 show the landmarking errors made by the various models on the CAESAR
dataset with the atomistic CNN having the best landmarking performance on
the test set for both datasets. We observed that even though the holistic CNN
model fitted on the SHREC dataset performed a lot better on all landmarks when
the T-Network was used, it was not always the case in the CAESAR dataset,
however, the holistic CNN model performed better on the majority of the CAESAR
landmarks when a T-Network was used.
We split the CAESAR dataset, in a ratio of 80%/20% and we did a 77%/33%
split on the SHREC dataset.
Implementation Details
All experiments were run on the University of Oregon’s supercomputer,
TALAPAS. Training and testing were done on a Tesla V100 machine, fitting CNN
models on CAESAR took 2 hours for each landmark, it took about 30 minutes on
SHREC. Training a holistic model lasted for a duration of 12 hours on CAESAR,
and it lasted a minute on SHREC. All line plots were made with Plotly Chart
Studio (plotly.com). All 3D mesh figures were generated using PyVista.
Holistic CNN versus Atomistic CNN Training Setting
The CNN model was trained to minimize the mean-squared error (MSE)
between the predicted landmark and the manually labeled landmark. We observed
26
that training an (atomistic) CNN model for each landmark yielded estimations
with smaller mean squared errors when compared with a single (holistic) CNN
classifier fitted to represent all landmark locations. This is evident in the scatter
and bar plots shown in figures 2 and 3. Figure 2 is a scatter plot comparing the
errors of landmark estimations of a holistic CNN model against that of an atomistic
CNN model in the CAESAR dataset. The majority of the points in the scatter
plot lie above the “holistic CNN = atomistic CNN” line, which signifies that the
holistic CNN model had Euclidean distances that are significantly larger than those
made in the atomistic CNN model, which might be due to the unrelatedness in
the distribution of the coordinates across all landmarks positions within the point
cloud data. Figure 3 is a bar plot of the two training settings on SHREC. When
we compare the average Euclidean distances between predicted and ground truth
landmarks for each of these datasets in the training set, it will be observed that the
holistic CNN model makes more significant prediction errors in each case. From the
results achieved during experimentation, one can argue that training an atomistic
model would be a better fit and applicable in practice compared to a single model
on all landmarks.
27
FIGURE 2. Average squared Euclidean distance between predicted landmarks and
ground truth landmarks, comparing holistic CNN and atomistic CNN models on
CAESAR data
FIGURE 3. Average Euclidean distance between predicted and ground truth
landmarks for holistic CNN against atomistic CNN models on SHREC data
28
The Effect of Dataset Sizes on Landmarking Quality
We conducted an analysis to assess the influence of training models on both
CAESAR and SHREC datasets, using only a fraction of the original training data.
The results of this analysis are presented in Tables 6 to 11, with all results obtained
from CAESAR included in Appendix 2. We divided each model into three subsets,
utilizing 50%, 25%, and 10% of the training set, and then tested them on the entire
test set to evaluate the impact of different training set sizes on test performance.
Our observations revealed that models trained on larger datasets consistently
yielded lower test errors for all SHREC models. However, in the case of CAESAR,
when we applied a Lasso regression model to these varying training set sizes, the
performance remained relatively similar across all landmarks. For atomistic CNN
models, landmarking errors increased as the training set size decreased, while
holistic CNN models exhibited slight improvements as we reduced the training
set sizes. This suggests that the atomistic CNN models might achieve even better
performance if trained on larger datasets.
29
CHAPTER V
CONCLUSION
Through the advancements in computational geometry and machine learning,
several methods that estimate the positions/paths of points in point cloud data
have been devised. The goal of this thesis project was to develop and compare
machine learning methods for the purpose of landmark estimation of 3D point
cloud data towards health diagnostics and functional product design of products
that perform with safety and efficiency. We developed a scalable method for
landmarking human body scans that can be improved upon to create a system
that reliably takes measurements around 3D body meshes for commercial purposes.
In this work, we created a tool that can predict landmark locations on the human
body in multiple poses. Chapter 3 discusses the novel methods we used in training
our neural network models and we identified scenarios where training models in a
particular fashion produces models that are less error-prone.
Future Work
Due to the graphical nature of 3D meshes, a tool that can automatically
estimate point locations and also take surface measurements around human point
cloud data can be built by exploring other methods like graph neural networks
(GNN) [Hamilton et al., 2017], vision transformers [Dosovitskiy et al., 2021].
Ensembling multiple models into one unified model would hypothetically be an
advancement over the methods described in this paper.
In future work, ML models will be used to take anthropometric measurements
and to train new landmarks - ones different from the CAESAR dataset to
30
enable custom measurement capabilities related to human performance and
functional product creation. This research also demonstrated that known
tedious anthropometric measurement procedures could be expedited to develop
more relevant 2D blueprints, to enable the efficient commercialization of
functional products that fit better - to improve performance, comfort, and safety.
Additionally, exploring the possibility of utilizing unsupervised machine-learning
algorithms to train machine-learning models for localizing landmarks on the human
body.
31
APPENDIX A
TABLE 4. Euclidean distances between predicted and ground truth landmark
between Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters.
Landmark Lasso Regression Atomistic CNN
mean median 95% mean median 95%
10th Rib Midspine 3.73 3.39 7.30 3.05 1.91 6.13
Butt Block 2.88 2.15 3.20 2.14 3.22 4.95
Cervicale 3.27 3.12 6.18 4.49 4.49 5.51
Crotch 3.46 3.31 6.32 1.79 1.09 4.97
Lt. 10th Rib 4.40 4.30 8.08 1.67 1.58 2.61
Lt. Acromion 3.96 3.65 7.25 4.56 4.55 5.56
Lt. ASIS 4.79 4.44 8.00 2.46 2.34 3.56
Lt. Axilla, Ant 3.91 3.73 7.45 4.21 5.33 8.33
Lt. Axilla, Post. 4.02 3.78 7.51 3.86 5.40 7.14
Lt. Calcaneous, Post. 1.81 1.66 3.48 1.23 2.85 3.36
Lt. Clavicale 3.33 3.10 6.23 3.73 3.34 4.61
Lt. Dactylion 4.22 4.01 7.73 1.33 1.33 1.63
Lt. Digit II 2.30 2.03 4.57 4.41 4.38 5.37
Lt. Femoral Lateral Epicn 3.05 2.85 5.84 4.39 1.43 1.74
Lt. Femoral Medial Epicn 3.79 3.67 6.55 1.42 1.42 1.72
Lt. Gonion 3.44 3.27 6.44 6.24 6.23 7.60
Lt. Humeral Lateral Epicn 4.18 3.95 7.47 3.40 3.37 4.33
Lt. Humeral Medial Epicn 4.01 3.75 7.80 2.22 2.21 2.92
32
TABLE 4. Euclidean distances between predicted and ground truth landmark
between Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters.
Landmark Lasso Regression Atomistic CNN
mean median 95% mean median 95%
Lt. Iliocristale 4.13 3.85 7.50 1.38 1.30 2.05
Lt. Infraorbitale 3.60 3.30 6.87 8.92 8.92 10.74
Lt. Knee Crease 2.96 2.87 5.51 1.43 1.41 1.70
Lt. Lateral Malleolus 1.83 1.66 3.79 3.75 3.73 4.56
Lt. Medial Malleolus 5.47 5.54 6.84 3.63 3.62 4.43
Lt. Metacarpal-Phal. II 4.07 3.89 7.36 1.03 1.03 1.26
Lt. Metacarpal-Phal. V 4.13 3.93 7.48 1.91 1.91 1.12
Lt. Metatarsal-Phal. I 2.06 1.86 4.07 4.17 4.15 5.11
Lt. Metatarsal-Phal. V 1.91 1.74 3.88 4.25 4.22 5.19
Lt. Olecranon 4.16 3.72 7.68 2.69 2.66 3.51
Lt. PSIS 4.25 4.05 7.68 6.00 5.72 9.69
Lt. Radial Styloid 4.27 4.11 7.80 7.83 7.80 9.62
Lt. Radiale 4.13 3.91 7.48 3.57 3.55 4.45
Lt. Sphyrion 5.81 5.80 7.31 3.75 3.73 4.55
Lt. Thelion/Bustpoint 4.13 3.87 7.56 2.23 3.35 4.06
Lt. Tragion 3.55 3.35 6.88 2.83 3.82 2.994
Lt. Trochanterion 4.05 3.79 7.27 2.78 2.70 3.951
Lt. Ulnar Styloid 4.25 3.98 8.00 2.77 2.76 4.94
Nuchale 3.71 3.53 7.02 3.69 2.68 3.85
Rt. 10th Rib 4.28 3.99 7.52 2.57 1.47 3.49
33
TABLE 4. Euclidean distances between predicted and ground truth landmark
between Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters.
Landmark Lasso Regression Atomistic CNN
mean median 95% mean median 95%
Rt. ASIS 3.80 3.58 7.20 2.37 2.25 3.44
Rt. Acromion 3.82 3.57 7.08 4.32 4.28 5.25
Rt. Axilla, Ant 3.73 3.47 6.83 3.12 2.87 3.91
Rt. Axilla, Post. 3.90 3.62 7.42 2.55 3.30 4.07
Rt. Calcaneous, Post. 1.79 1.63 3.46 2.72 2.06 2.91
Rt. Clavicale 3.27 3.07 6.08 3.70 3.67 4.63
Rt. Dactylion 4.28 4.09 7.76 1.31 1.30 1.58
Rt. Digit II 2.38 2.22 4.41 4.38 4.36 5.35
Rt. Femoral Lateral Epicn 3.08 2.90 5.76 1.43 1.41 1.75
Rt. Femoral Medial Epicn 4.05 3.91 7.07 3.39 3.87 4.31
Rt. Gonion 3.45 3.30 6.51 6.19 6.21 7.59
Rt. Humeral Lateral Epicn 4.14 3.92 7.60 3.15 3.11 4.08
Rt. Humeral Medial Epicn 3.93 3.77 7.18 2.05 2.01 0.27
Rt. Iliocristale 4.00 3.79 7.31 1.32 1.25 1.98
Rt. Infraorbitale 3.62 3.32 6.76 8.94 8.94 10.74
Rt. Knee Crease 3.01 2.85 5.51 1.43 1.41 1.70
Rt. Lateral Malleolus 1.94 1.77 3.74 3.75 3.73 4.56
Rt. Medial Malleolus 2.82 2.70 4.83 3.61 3.59 4.43
Rt. Metacarpal-Phal. II 4.15 3.93 7.54 6.94 6.89 9.07
Rt. Metacarpal-Phal. V 4.19 3.94 7.62 4.07 4.90 5.05
34
TABLE 4. Euclidean distances between predicted and ground truth landmark
between Lasso Regression and Atomistic CNN models across 74 landmark locations
on CAESAR test set. All measurements are expressed in centimeters.
Landmark Lasso Regression Atomistic CNN
mean median 95% mean median 95%
Rt. Metatarsal-Phal. I 2.19 2.02 4.21 4.17 4.16 5.12
Rt. Metatarsal-Phal. V 2.07 1.92 3.96 4.24 4.22 5.19
Rt. Olecranon 4.11 3.97 7.41 2.49 2.46 3.29
Rt. PSIS 4.01 3.80 7.09 6.10 5.73 9.91
Rt. Radial Styloid 4.20 3.90 7.68 7.47 7.42 9.11
Rt. Radiale 4.12 3.81 7.42 3.27 3.25 4.21
Rt. Sphyrion 3.46 3.33 5.30 3.73 3.73 4.58
Rt. Thelion/Bustpoint 3.94 3.62 7.22 3.15 3.34 5.22
Rt. Tragion 3.54 3.25 6.64 8.25 8.27 9.92
Rt. Trochanterion 4.08 3.87 7.49 2.74 2.65 3.82
Rt. Ulnar Styloid 4.22 3.91 7.53 7.62 7.59 9.29
Sellion 3.74 3.46 7.11 9.94 9.92 11.99
Substernale 3.93 3.76 7.33 1.96 1.81 3.05
Supramenton 3.62 3.34 6.66 7.10 7.05 8.78
Suprasternale 3.28 3.07 6.32 3.56 3.52 4.53
Waist, Preferred, Post. 4.16 3.87 8.09 4.33 4.01 5.13
35
TABLE 5. Euclidean distances between predicted and ground truth landmark
the holistic CNN with and without T-Network across 74 landmark locations on
CAESAR test set. All measurements are expressed in centimeters.
Landmark Holistic CNN (w/o T-net) Holistic CNN (w/ T-net)
mean median 95% mean median 95%
10th Rib Midspine 3.77 3.48 7.00 3.51 2.90 7.89
Butt Block 4.12 3.29 4.61 2.63 3.22 3.41
Cervicale 4.26 3.90 7.86 4.78 2.45 13.21
Crotch 3.08 2.85 6.04 2.70 1.97 8.14
Lt. 10th Rib 3.88 3.63 7.34 3.74 3.34 7.92
Lt. Acromion 3.63 3.35 6.73 3.18 2.74 7.06
Lt. ASIS 4.00 3.92 8.06 4.33 2.50 12.18
Lt. Axilla, Ant 4.03 3.70 8.01 3.36 2.48 8.50
Lt. Axilla, Post. 4.14 3.85 7.83 3.91 2.78 9.59
Lt. Calcaneous, Post. 1.60 1.51 2.97 7.18 2.69 19.55
Lt. Clavicale 4.19 3.95 8.18 3.95 2.29 10.75
Lt. Dactylion 5.06 4.65 9.85 4.96 3.78 11.79
Lt. Digit II 1.66 1.51 3.16 8.07 2.76 21.99
Lt. Femoral Lateral Epicn 2.53 2.45 4.58 3.62 2.70 9.25
Lt. Femoral Medial Epicn 2.52 2.42 4.94 3.58 2.77 8.87
Lt. Gonion 4.44 4.07 8.76 5.02 2.55 14.04
Lt. Humeral Lateral Epicn 3.82 3.55 7.36 3.60 2.60 8.30
Lt. Humeral Medial Epicn 3.85 3.65 7.24 3.61 2.76 7.83
Lt. Iliocristale 3.95 3.76 6.96 3.55 2.98 7.79
Lt. Infraorbitale 4.65 4.20 8.92 5.46 2.73 15.04
36
TABLE 5. Euclidean distances between predicted and ground truth landmark
the holistic CNN with and without T-Network across 74 landmark locations on
CAESAR test set. All measurements are expressed in centimeters.
Landmark Holistic CNN (w/o T-net) Holistic CNN (w/ T-net)
mean median 95% mean median 95%
Lt. Knee Crease 2.48 2.35 4.52 4.07 2.76 10.83
Lt. Lateral Malleolus 1.50 1.37 2.76 7.50 2.73 20.69
Lt. Medial Malleolus 1.63 1.60 2.99 5.88 2.32 17.03
Lt. Metacarpal-Phal. II 4.27 3.86 8.46 3.55 2.83 8.18
Lt. Metacarpal-Phal. V 4.07 3.67 8.19 3.77 2.80 9.03
Lt. Metatarsal-Phal. I 1.65 1.59 3.04 7.37 2.61 20.44
Lt. Metatarsal-Phal. V 1.49 1.43 2.77 7.74 2.66 20.94
Lt. Olecranon 3.84 3.52 7.18 3.61 2.61 8.15
Lt. PSIS 3.90 3.63 7.18 3.42 2.90 7.87
Lt. Radial Styloid 3.92 3.46 8.12 3.59 2.78 8.66
Lt. Radiale 3.81 3.60 7.52 3.34 2.43 7.43
Lt. Sphyrion 1.64 1.62 3.08 6.09 2.35 17.19
Lt. Thelion/Bustpoint 4.09 3.76 7.68 3.59 2.73 8.66
Lt. Tragion 4.63 4.30 8.77 5.13 2.53 14.35
Lt. Trochanterion 3.80 3.50 7.06 3.30 2.76 7.44
Lt. Ulnar Styloid 3.81 3.45 7.56 3.50 2.68 8.31
Nuchale 4.61 4.18 8.96 5.58 2.96 14.92
Rt. 10th Rib 4.05 3.89 7.92 3.52 3.12 7.65
Rt. ASIS 3.61 3.31 6.77 3.17 2.67 7.25
Rt. Acromion 4.18 3.90 8.05 4.13 2.62 10.90
37
TABLE 5. Euclidean distances between predicted and ground truth landmark
the holistic CNN with and without T-Network across 74 landmark locations on
CAESAR test set. All measurements are expressed in centimeters.
Landmark Holistic CNN (w/o T-net) Holistic CNN (w/ T-net)
mean median 95% mean median 95%
Rt. Axilla, Ant 4.06 3.66 7.89 3.67 2.71 8.95
Rt. Axilla, Post. 3.93 3.70 7.64 3.78 2.62 9.10
Rt. Calcaneous, Post. 1.62 1.47 3.20 7.31 2.81 20.38
Rt. Clavicale 4.17 3.88 8.02 3.50 2.13 9.55
Rt. Dactylion 4.93 4.58 9.23 5.17 3.74 13.02
Rt. Digit II 1.83 1.74 3.48 7.60 2.67 20.73
Rt. Femoral Lateral Epicn 2.65 2.49 4.89 4.06 2.85 10.70
Rt. Femoral Medial Epicn 2.41 2.29 4.61 3.54 2.77 9.62
Rt. Gonion 4.68 4.29 8.66 5.14 2.60 14.20
Rt. Humeral Lateral Epicn 3.69 3.41 6.64 3.50 2.59 7.81
Rt. Humeral Medial Epicn 3.72 3.54 3.27 2.85 6.96
Rt. Iliocristale 3.90 3.74 6.93 3.28 2.67 7.57
Rt. Infraorbitale 4.71 4.27 8.94 5.48 2.79 15.30
Rt. Knee Crease 2.46 2.32 4.53 3.93 2.61 10.75
Rt. Lateral Malleolus 1.70 1.55 3.06 6.41 2.47 17.35
Rt. Medial Malleolus 1.69 1.58 3.18 7.01 2.67 19.38
Rt. Metacarpal Phal. II 4.12 3.74 7.84 3.73 2.81 9.39
Rt. Metacarpal-Phal. V 4.04 3.66 8.03 3.48 2.82 8.16
Rt. Metatarsal-Phal. I 1.72 1.61 3.24 7.54 2.73 20.52
Rt. Metatarsal-Phal. V 1.77 1.67 3.42 8.16 2.88 22.04
38
TABLE 5. Euclidean distances between predicted and ground truth landmark
the holistic CNN with and without T-Network across 74 landmark locations on
CAESAR test set. All measurements are expressed in centimeters.
Landmark Holistic CNN (w/o T-net) Holistic CNN (w/ T-net)
mean median 95% mean median 95%
Rt. Olecranon 3.80 3.47 7.01 3.65 2.84 8.22
Rt. PSIS 3.58 3.26 6.75 3.55 2.98 8.44
Rt. Radial Styloid 3.81 3.45 7.75 3.44 2.75 8.03
Rt. Radiale 3.76 3.48 6.70 3.32 2.54 7.33
Rt. Sphyrion 1.72 1.60 3.41 6.27 2.56 17.20
Rt. Thelion/Bustpoint 3.99 3.71 7.77 3.51 2.64 8.59
Rt. Tragion 4.74 4.46 8.83 5.62 2.64 15.65
Rt. Trochanterion 3.67 3.48 6.94 3.38 2.82 7.82
Rt. Ulnar Styloid 3.80 3.43 7.81 3.73 2.91 8.85
Sellion 4.74 4.38 8.93 5.51 2.91 15.14
Substernale 3.89 3.57 7.50 3.24 2.71 7.13
Supramenton 4.50 4.22 8.75 4.49 2.55 12.34
Suprasternale 4.19 3.98 8.08 3.56 2.21 9.66
Waist, Preferred, Post. 3.63 3.33 6.89 3.58 3.02 8.39
39
APPENDIX B
ACROMIALE ILIOCRISTALE RADIALE
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 3.14 3.07 4.60 3.53 3.81 3.87 3.65 5.17 6.39
Median 3.12 3.01 4.48 3.33 3.29 3.71 3.25 4.66 5.60
95th Percentile 5.56 5.11 8.23 7.60 7.62 7.38 7.05 11.13 11.84
STYLION TIBIALE LATERALE TROCHANTERION
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 3.50 4.65 7.66 3.07 3.72 4.17 3.43 3.90 4.88
Median 2.77 4.43 7.07 2.99 3.41 3.82 3.28 3.61 4.65
95th Percentile 7.09 8.94 14.23 5.48 6.05 7.31 6.81 8.19 8.49
TABLE 6. Test errors from Atomistic CNN models trained on SHREC using
varied training set sizes - 50%, 25% and 10% of the original training set. All
measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
40
ACROMIALE ILIOCRISTALE RADIALE
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 4.30 7.14 12.73 4.92 2.48 6.71 4.45 10.38 23.54
Median 4.26 6.89 12.95 4.93 2.27 6.17 4.36 10.48 23.10
95th Percentile 7.08 10.08 15.04 8.79 6.59 10.18 7.29 13.79 29.06
STYLION TIBIALE LATERALE TROCHANTERION
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 7.53 6.71 32.88 11.5 27.52 20.22 2.46 17.49 16.21
Median 7.49 6.71 32.97 11.26 27.52 20.05 1.88 17.37 16.23
95th Percentile 10.10 8.54 37.40 13.89 30.98 23.61 6.67 21.69 20.04
TABLE 7. Test errors from Lasso regression models trained on SHREC using
varied training set sizes - 50%, 25% and 10% of the original training set. All
measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
ACROMIALE ILIOCRISTALE RADIALE
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 6.77 6.03 12.73 1.14 2.48 6.71 10.52 10.38 23.54
Median 5.02 6.89 12.95 3.09 2.27 6.17 11.52 10.48 23.10
95th Percentile 9.47 10.08 15.04 3.39 6.59 10.18 12.59 13.79 29.06
STYLION TIBIALE LATERALE TROCHANTERION
Landmark 50% 25% 10% 50% 25% 10% 50% 25% 10%
Mean 7.83 16.71 32.88 5.44 27.52 20.22 9.93 17.49 16.21
Median 8.00 11.09 32.97 6.28 27.52 20.05 11.06 17.37 16.23
95th Percentile 10.80 18.54 37.40 9.44 30.98 23.61 19.93 21.69 20.04
TABLE 8. Test errors from holistic CNN models trained on SHREC using
varied training set sizes - 50%, 25%, and 10% of the original training set. All
measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
41
42
TABLE 9. Table 9. Test errors from Lasso Regression models trained on CAESAR using varied training set sizes -
50%, 25% and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean
distances between predicted landmark coordinates and ground truth coordinates.
Lasso (50%) Lasso (25%) Lasso (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
10th Rib Midspine 3.73 3.42 7.42 3.71 3.41 7.48 3.70 3.41 7.53
Butt Block 4.20 3.89 8.13 4.20 3.89 8.13 4.10 3.81 7.96
Cervicale 3.27 3.11 6.18 3.26 3.05 6.10 3.34 3.19 6.27
Crotch 3.45 3.29 6.30 3.49 3.30 6.47 3.55 3.36 6.62
Lt. 10th Rib 4.39 4.32 8.10 4.40 4.28 7.98 4.35 4.24 7.86
Lt. ASIS 3.96 3.62 7.32 3.95 3.68 7.32 3.92 3.57 7.15
Lt. Acromion 4.71 4.34 8.19 4.77 4.53 7.99 4.66 4.49 7.67
Lt. Axilla, Ant 3.88 3.69 7.43 3.91 3.73 7.42 3.80 3.56 7.24
Lt. Axilla, Post. 4.02 3.79 7.52 4.05 3.79 7.58 3.90 3.62 7.04
Lt. Calcaneous, Post. 1.81 1.67 3.47 1.80 1.65 3.50 1.82 1.65 3.45
Lt. Clavicale 3.33 3.10 6.20 3.34 3.15 6.23 3.31 3.10 6.02
Lt. Dactylion 4.13 3.92 7.49 4.17 3.92 7.59 4.26 4.02 7.86
Lt. Digit II 2.30 2.03 4.56 2.30 2.05 4.54 2.31 2.04 4.55
Lt. Femoral Lateral Epicn 3.04 2.85 5.86 3.04 2.86 5.78 3.01 2.85 5.79
Lt. Femoral Medial Epicn 4.04 3.95 7.02 4.16 4.03 6.99 3.85 3.74 6.71
Lt. Gonion 3.43 3.22 6.47 3.50 3.37 6.50 3.40 3.21 6.37
Lt. Humeral Lateral Epicn 4.19 3.95 7.52 4.18 3.99 7.45 4.17 3.98 7.42
Lt. Humeral Medial Epicn 4.02 3.76 7.76 4.01 3.74 7.74 4.01 3.67 7.68
Lt. Iliocristale 4.14 3.86 7.44 4.13 3.86 7.41 4.07 3.78 7.48
Lt. Infraorbitale 3.61 3.33 6.88 3.58 3.35 6.67 3.54 3.30 6.68
Lt. Knee Crease 2.96 2.87 5.40 2.95 2.88 5.46 2.91 2.81 5.38
Lt. Lateral Malleolus 1.83 1.66 3.79 1.83 1.65 3.78 1.84 1.66 3.76
Lt. Medial Malleolus 5.16 5.21 6.52 4.74 4.74 6.02 5.48 5.54 6.84
Lt. Metacarpal-Phal. II 4.00 3.86 7.20 4.03 3.88 7.39 4.06 3.87 7.22
Lt. Metacarpal-Phal. V 4.07 3.88 7.48 4.12 3.91 7.63 4.04 3.87 7.20
43
TABLE 9. Test errors from Lasso Regression models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Lasso (50%) Lasso (25%) Lasso (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Lt. Metatarsal-Phal. I 2.06 1.86 4.06 2.06 1.88 4.16 2.06 1.88 4.14
Lt. Metatarsal-Phal. V 1.91 1.74 3.88 1.91 1.75 3.74 1.91 1.73 3.76
Lt. Olecranon 4.17 3.78 7.69 4.17 3.77 7.59 4.14 3.76 7.57
Lt. PSIS 4.26 4.05 7.74 4.23 3.98 7.65 4.19 3.87 7.71
Lt. Radial Styloid 4.22 4.09 7.74 4.16 4.05 7.70 4.24 4.15 7.69
Lt. Radiale 4.14 3.88 7.54 4.14 3.95 7.50 4.12 3.93 7.48
Lt. Sphyrion 5.81 5.80 7.30 6.96 6.97 8.44 3.25 3.17 4.80
Lt. Thelion/Bustpoint 4.12 3.87 7.65 4.13 3.90 7.47 4.10 3.88 7.48
Lt. Tragion 3.54 3.34 6.93 3.55 3.34 6.75 3.53 3.36 6.85
Lt. Trochanterion 4.06 3.76 7.18 4.02 3.82 7.38 4.01 3.77 7.26
Lt. Ulnar Styloid 4.23 3.99 8.05 4.19 3.95 7.77 4.21 3.98 7.75
Nuchale 3.77 3.54 7.16 3.68 3.48 6.86 3.65 3.42 7.02
Rt. 10th Rib 4.28 4.01 7.51 4.28 4.01 7.47 4.21 3.95 7.37
Rt. ASIS 3.80 3.58 7.14 3.79 3.53 7.16 3.76 3.49 7.25
Rt. Acromion 3.84 3.60 7.10 3.81 3.53 6.98 3.80 3.59 6.94
Rt. Axilla, Ant 3.75 3.47 6.90 3.74 3.44 6.80 3.71 3.43 6.82
Rt. Axilla, Post. 3.95 3.66 7.52 4.07 3.75 7.81 3.93 3.88 7.29
Rt. Calcaneous, Post. 1.79 1.63 3.44 1.79 1.60 3.48 1.80 1.63 3.48
Rt. Clavicale 3.28 3.07 6.05 3.28 3.07 5.93 3.25 3.06 5.90
Rt. Dactylion 4.26 4.08 7.69 4.04 3.85 7.32 4.26 4.12 7.70
Rt. Digit II 2.38 2.22 4.40 2.37 2.23 4.44 2.37 2.21 4.40
Rt. Femoral Lateral Epicn 3.08 2.91 5.70 3.07 2.91 5.63 3.06 2.87 5.55
Rt. Femoral Medial Epicn 4.24 4.13 7.32 4.70 4.56 8.00 4.11 3.98 6.98
Rt. Gonion 3.46 3.32 6.48 3.45 3.28 6.45 3.43 3.24 6.57
44
TABLE 9. Test errors from Lasso Regression models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Lasso (50%) Lasso (25%) Lasso (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Rt. Humeral Lateral Epicn 4.14 3.91 7.57 4.14 3.92 7.52 4.16 3.89 7.52
Rt. Humeral Medial Epicn 3.93 3.74 7.16 3.94 3.73 7.18 3.96 3.75 7.35
Rt. Iliocristale 4.00 3.85 7.33 4.00 3.76 7.30 3.94 3.72 7.33
Rt. Infraorbitale 3.62 3.33 6.83 3.57 3.28 6.65 3.52 3.25 6.56
Rt. Knee Crease 2.98 2.75 5.46 3.00 2.84 5.23 2.94 2.83 5.37
Rt. Lateral Malleolus 1.94 1.77 3.73 1.94 1.76 3.65 1.94 1.76 3.78
Rt. Medial Malleolus 2.61 2.50 4.55 3.42 3.35 5.36 3.36 3.28 5.27
Rt. Metacarpal Phal. II 4.18 3.94 7.55 3.83 3.63 6.90 3.86 3.73 6.98
Rt. Metacarpal-Phal. V 4.24 3.98 7.72 3.98 3.73 7.29 4.02 3.79 7.33
Rt. Metatarsal-Phal. I 2.20 2.02 4.19 2.19 2.04 4.20 2.19 2.02 4.21
Rt. Metatarsal-Phal. V 2.07 1.92 3.96 2.06 1.91 3.98 2.06 1.93 3.96
Rt. Olecranon 4.11 4.01 7.46 4.10 3.99 7.42 4.13 4.01 7.57
Rt. PSIS 4.02 3.85 7.12 3.99 3.75 7.06 3.96 3.68 7.04
Rt. Radial Styloid 4.33 3.95 7.87 3.95 3.62 7.20 4.02 3.76 7.25
Rt. Radiale 4.12 3.84 7.40 4.11 3.83 7.35 4.14 3.87 7.36
Rt. Sphyrion 3.99 3.87 5.75 3.63 3.52 5.52 2.71 2.61 4.51
Rt. Thelion/Bustpoint 3.95 3.61 7.13 3.92 3.62 7.20 3.91 3.55 7.36
Rt. Tragion 3.55 3.30 6.72 3.48 3.26 6.55 3.41 3.16 6.39
Rt. Trochanterion 4.08 3.85 7.47 4.06 3.89 7.42 4.04 3.83 7.42
Rt. Ulnar Styloid 4.35 4.02 7.90 4.04 3.78 7.15 4.13 3.81 7.43
Sellion 3.75 3.49 7.20 3.64 3.39 6.83 3.59 3.40 6.88
Substernale 3.94 3.75 7.38 3.97 3.79 7.23 3.92 3.71 7.13
Supramenton 3.62 3.38 6.72 3.62 3.35 6.61 3.60 3.38 6.71
Suprasternale 3.28 3.06 6.35 3.28 3.11 6.26 3.26 3.12 6.20
Waist, Preferred, Post. 4.17 3.87 8.10 4.13 3.82 8.00 4.12 3.82 7.97
45
TABLE 10. Test errors from holistic CNN models trained on CAESAR using varied training set sizes, 50% - 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Holistic CNN (50%) Holistic CNN (25%) Holistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
10th Rib Midspine 7.72 7.75 13.01 6.12 5.77 11.56 6.51 6.15 11.87
Butt Block 9.31 9.82 11.43 8.33 9.04 10.70 8.13 8.55 14.31
Cervicale 12.24 11.99 20.18 12.80 13.10 20.83 12.32 12.34 21.32
Crotch 6.93 6.66 12.11 6.47 6.07 11.83 6.39 5.94 11.75
Lt. 10th Rib 6.61 6.41 11.55 6.16 5.99 10.86 7.47 7.44 12.55
Lt. ASIS 6.59 6.39 10.46 6.70 6.54 10.96 7.34 6.97 12.47
Lt. Acromion 13.41 14.39 22.16 11.90 12.47 20.52 10.10 10.71 17.17
Lt. Axilla, Ant 8.98 8.87 15.60 9.97 9.87 16.84 9.54 9.46 16.26
Lt. Axilla, Post. 10.38 10.49 17.00 11.55 11.55 17.90 9.86 9.70 16.03
Lt. Calcaneous, Post. 24.99 26.64 30.42 21.52 21.79 27.82 18.42 20.54 24.42
Lt. Clavicale 13.09 13.27 20.64 9.00 8.61 16.53 8.92 8.34 18.58
Lt. Dactylion 16.21 16.15 21.74 13.51 13.17 19.63 13.28 13.58 20.28
Lt. Digit II 26.27 27.23 31.42 21.58 22.29 27.01 22.87 26.03 28.13
Lt. Femoral Lateral Epicn 15.57 15.75 19.74 11.57 11.55 15.55 11.10 11.26 16.39
Lt. Femoral Medial Epicn 14.72 15.64 20.13 11.32 11.46 16.75 12.91 13.92 19.23
Lt. Gonion 14.23 14.69 22.57 12.14 12.19 19.87 13.11 13.10 21.46
Lt. Humeral Lateral Epicn 9.34 9.31 15.01 10.89 10.78 15.45 8.37 8.11 14.15
Lt. Humeral Medial Epicn 8.29 8.17 13.51 9.19 8.89 14.41 8.94 8.79 14.98
Lt. Iliocristale 8.05 7.92 12.65 6.54 6.24 11.04 7.59 7.36 12.56
Lt. Infraorbitale 13.80 13.87 22.86 13.74 14.14 21.90 12.94 12.81 21.37
Lt. Knee Crease 13.61 13.93 18.60 12.13 12.06 16.72 12.60 13.23 19.14
Lt. Lateral Malleolus 24.99 26.50 30.04 22.48 22.75 29.58 18.29 20.29 23.98
Lt. Medial Malleolus 21.01 22.83 27.44 20.37 21.72 27.75 17.44 19.55 24.16
Lt. Metacarpal-Phal. II 12.64 12.67 17.64 12.87 12.89 17.79 12.69 13.11 19.36
Lt. Metacarpal-Phal. V 12.91 12.96 18.31 11.55 11.43 16.45 11.23 11.41 17.08
46
TABLE 10. Test errors from holistic CNN models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Holistic CNN (50%) Holistic CNN (25%) Holistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Lt. Metatarsal-Phal. I 26.49 27.87 31.37 24.38 25.18 30.50 20.41 23.11 25.26
Lt. Metatarsal-Phal. V 26.05 27.52 31.10 23.27 23.52 29.55 23.13 24.97 28.99
Lt. Olecranon 9.70 9.88 15.29 9.58 9.34 14.65 7.88 7.56 13.01
Lt. PSIS 6.31 6.10 10.72 6.02 5.63 11.07 7.24 6.69 13.33
Lt. Radial Styloid 10.06 9.99 14.51 11.08 11.06 16.37 9.44 9.40 14.83
Lt. Radiale 11.34 11.26 17.15 9.17 8.82 14.62 10.67 10.46 16.52
Lt. Sphyrion 21.43 22.60 28.17 22.27 23.02 29.98 17.66 20.16 24.83
Lt. Thelion/Bustpoint 8.57 8.52 14.75 7.48 7.26 13.13 8.98 8.91 14.67
Lt. Tragion 16.07 16.42 24.19 14.52 14.69 22.73 14.23 14.40 22.53
Lt. Trochanterion 7.00 6.94 11.08 6.80 6.57 11.58 8.02 7.89 13.47
Lt. Ulnar Styloid 12.05 12.10 16.42 9.94 9.89 14.69 9.91 10.02 16.16
Nuchale 16.01 16.13 25.61 14.64 15.10 22.98 12.68 13.01 20.35
Rt. 10th Rib 6.14 5.92 11.22 6.60 6.47 11.28 6.10 5.75 11.50
Rt. ASIS 6.36 6.00 11.10 5.69 5.39 10.21 5.60 4.96 11.20
Rt. Acromion 11.72 11.78 19.15 10.40 10.51 17.13 10.24 9.89 18.53
Rt. Axilla, Ant 10.07 10.33 16.47 8.05 7.66 13.54 8.68 8.03 16.59
Rt. Axilla, Post. 11.00 11.12 17.64 10.30 10.59 16.58 9.19 8.88 16.76
Rt. Calcaneous, Post. 24.78 26.26 30.44 20.43 20.96 26.39 19.40 21.61 24.39
Rt. Clavicale 11.19 11.13 18.83 10.27 10.24 17.47 10.31 9.83 18.03
Rt. Dactylion 14.63 14.59 20.38 13.57 13.34 19.64 13.08 13.40 20.16
Rt. Digit II 25.70 26.28 31.45 21.07 21.12 27.09 19.82 22.41 24.09
Rt. Femoral Lateral Epicn 12.37 12.56 16.47 14.04 13.85 20.18 11.17 11.55 16.30
Rt. Femoral Medial Epicn 12.50 13.02 17.35 12.01 12.42 17.90 11.91 12.60 17.77
Rt. Gonion 12.61 13.11 20.54 11.08 10.84 20.05 12.92 13.18 19.56
Rt. Humeral Lateral Epicn 9.12 9.02 14.31 10.13 9.95 15.35 9.41 8.90 16.18
47
TABLE 10. Test errors from holistic CNN models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Holistic CNN (50%) Holistic CNN (25%) Holistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Rt. Humeral Medial Epicn 8.75 8.53 13.86 7.96 7.67 12.40 6.92 5.96 14.75
Rt. Iliocristale 6.51 6.27 11.14 6.99 6.71 11.44 6.28 5.79 11.07
Rt. Infraorbitale 16.73 17.37 26.07 14.93 14.96 22.98 12.38 12.19 21.43
Rt. Knee Crease 13.43 13.78 18.60 13.63 13.52 19.13 12.15 12.46 18.66
Rt. Lateral Malleolus 22.84 24.02 28.36 23.09 23.70 28.89 16.92 18.65 22.56
Rt. Medial Malleolus 22.20 23.82 27.76 20.53 21.69 27.11 16.25 18.10 21.12
Rt. Metacarpal Phal. II 12.76 12.76 17.88 11.66 11.50 17.27 11.13 11.10 17.22
Rt. Metacarpal-Phal. V 11.69 11.68 16.84 10.80 10.78 15.60 10.34 10.28 16.63
Rt. Metatarsal-Phal. I 24.89 26.05 29.84 24.97 25.62 32.54 20.23 22.35 26.63
Rt. Metatarsal-Phal. V 23.65 25.27 28.81 22.74 22.89 29.69 17.93 19.69 24.11
Rt. Olecranon 10.77 10.72 15.86 9.20 8.84 14.64 7.57 7.03 14.42
Rt. PSIS 7.27 6.98 12.14 7.44 7.08 12.77 7.96 7.55 13.38
Rt. Radial Styloid 10.51 10.35 15.22 9.24 9.23 14.51 7.99 7.80 13.34
Rt. Radiale 9.13 8.95 13.88 8.29 8.14 12.91 10.22 10.18 15.09
Rt. Sphyrion 23.16 24.87 29.62 20.10 21.20 26.33 18.77 21.74 24.76
Rt. Thelion/Bustpoint 8.66 8.57 14.39 8.42 7.97 14.93 9.71 9.50 16.32
Rt. Tragion 15.25 15.98 23.90 14.70 14.48 22.73 12.55 12.65 20.00
Rt. Trochanterion 8.09 7.81 12.74 8.41 8.17 13.39 7.08 6.62 12.28
Rt. Ulnar Styloid 11.69 11.86 16.59 10.26 10.18 15.33 8.71 8.70 14.13
Sellion 17.63 18.18 27.23 12.91 12.76 22.07 12.35 12.09 22.38
Substernale 7.38 7.22 13.24 6.23 5.70 11.90 7.12 6.80 12.70
Supramenton 15.40 15.87 24.57 11.02 10.93 19.33 11.48 11.39 19.48
Suprasternale 12.47 12.75 19.91 10.75 10.87 18.40 8.25 7.00 19.52
Waist, Preferred, Post. 7.88 7.72 12.47 7.65 7.44 12.35 7.28 7.06 12.15
48
TABLE 11. Test errors from atomistic CNN models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Atomistic CNN (50%) Atomistic CNN (25%) Atomistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
10th Rib Midspine 2.01 1.50 7.21 2.25 1.11 7.51 2.90 1.73 9.22
Butt Block 2.93 3.64 6.04 2.26 3.09 9.63 5.32 4.01 8.31
Cervicale 1.90 1.61 3.69 1.01 1.73 2.95 1.36 1.97 3.68
Crotch 1.56 2.92 3.86 1.73 1.10 4.39 2.42 1.54 6.38
Lt. 10th Rib 2.74 1.80 8.56 3.28 2.09 10.32 3.93 2.58 12.38
Lt. ASIS 1.77 0.96 5.26 5.21 4 13.11 3.49 2.28 10.2
Lt. Acromion 1.26 1.02 2.97 1.59 1.22 4.71 1.94 1.35 5.19
Lt. Axilla, Ant 3.32 1.35 4.02 1.30 2.59 3.06 1.89 2.31 3.09
Lt. Axilla, Post. 1.93 2.07 3.01 1.96 2.38 3.92 2.05 3.91 5.03
Lt. Calcaneous, Post. 2.05 1.82 2.95 1.68 2.08 3.01 2.14 2.8 4.01
Lt. Clavicale 1.24 0.94 3.30 1.09 0.86 2.8 0.89 0.61 2.52
Lt. Dactylion 1.95 1.49 5.21 2.26 1.75 5.73 2.23 1.75 5.95
Lt. Digit II 0.91 0.72 2.12 1.27 1.04 3.14 1.38 0.92 4.18
Lt. Femoral Lateral Epicn 1.16 0.83 3.32 1.23 0.85 3.37 1.55 1.08 4.6
Lt. Femoral Medial Epicn 1.51 1.21 3.61 1.19 0.83 3.61 2.66 1.96 7.38
Lt. Gonion 1.13 0.88 2.96 1.10 0.79 3.16 0.96 0.69 2.52
Lt. Humeral Lateral Epicn 0.99 0.68 2.87 1.43 1.78 4.21 1.53 1.14 4.33
Lt. Humeral Medial Epicn 1.34 1.03 3.55 1.3 0.87 3.85 1.56 1.11 4.83
Lt. Iliocristale 1.90 1.34 5.49 1.87 1.21 5.44 2.67 1.77 7.99
Lt. Infraorbitale 0.82 0.63 1.96 0.57 0.35 1.65 0.73 0.45 2.44
Lt. Knee Crease 1.38 1.15 3.55 0.79 0.53 2.55 3.17 2.7 7.01
Lt. Lateral Malleolus 0.66 0.53 1.69 1.17 1.01 2.59 1.09 0.83 2.97
Lt. Medial Malleolus 0.60 0.46 1.64 1.14 0.96 2.76 1.38 1.11 3.48
Lt. Metacarpal-Phal. II 1.45 1.13 3.76 1.29 0.91 3.7 2.33 1.78 6.27
Lt. Metacarpal-Phal. V 2.04 1.41 5.61 1.77 1.47 4.05 1.64 1.05 5.20
49
TABLE 11. Test errors from atomistic CNN models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Atomistic CNN (50%) Atomistic CNN (25%) Atomistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Lt. Metatarsal-Phal. I 0.92 0.72 2.21 1.00 0.79 2.33 0.84 0.64 2.12
Lt. Metatarsal-Phal. V 1.57 1.41 2.49 1.44 1.29 2.19 1.29 0.88 3.87
Lt. Olecranon 1.54 1.17 4.33 1.01 0.68 3.09 1.79 1.34 5.10
Lt. PSIS 3.06 1.67 10.32 2.86 1.63 9.07 4.60 2.71 14.85
Lt. Radial Styloid 1.30 0.84 4.04 2.28 1.8 5.98 1.61 1.08 4.99
Lt. Radiale 1.10 0.82 3.08 1.23 0.92 3.36 1.48 0.99 4.14
Lt. Sphyrion 1.10 0.96 2.41 0.82 0.68 1.97 1.33 0.99 3.68
Lt. Thelion/Bustpoint 2.05 3.31 4.79 1.96 2.65 2.90 2.03 3.10 3.53
Lt. Tragion 0.79 0.62 1.90 0.53 0.35 1.43 0.96 0.73 2.54
Lt. Trochanterion 3.24 2.25 10.02 2.26 1.49 7.32 3.38 2.02 11.00
Lt. Ulnar Styloid 0.88 0.59 2.59 1.27 0.95 3.41 1.83 1.38 4.98
Nuchale 1.66 1.16 4.79 1.41 0.92 4.31 2.34 1.69 6.53
Rt. 10th Rib 2.95 1.88 9.46 3.25 1.82 11.46 5.22 3.86 14.39
Rt. ASIS 1.72 1.15 5.09 2.14 1.24 6.27 2.92 1.93 8.63
Rt. Acromion 0.78 0.54 2.34 1.29 1.00 3.2 2.71 2.11 7.38
Rt. Axilla, Ant 1.69 2.38 2.91 1.72 1.04 2.81 1.38 1.92 2.14
Rt. Axilla, Post. 1.82 3.01 4.94 1.09 2.34 3.09 2.49 3.01 3.39
Rt. Calcaneous, Post. 2.71 2.64 4.85 1.55 2.99 3.52 3.24 3.43 4.39
Rt. Clavicale 0.99 0.79 2.60 2.34 1.97 5.34 1.22 0.96 3.28
Rt. Dactylion 2.76 1.76 6.28 3.71 2.45 7.39 3.74 2.41 8.63
Rt. Digit II 1.88 1.63 3.94 1.26 1.07 2.66 1.10 0.78 3.07
Rt. Femoral Lateral Epicn 1.07 0.68 3.37 1.41 0.86 4.39 2.23 1.66 6.26
Rt. Femoral Medial Epicn 2.53 1.89 3.17 2.89 3.29 4.19 3.01 3.71 4.08
Rt. Gonion 1.63 1.19 4.68 1.06 0.77 2.80 0.83 0.58 2.28
Rt. Humeral Lateral Epicn 1.11 0.81 3.02 2.45 1.93 6.25 1.55 1.16 4.10
50
TABLE 11. Test errors from atomistic CNN models trained on CAESAR using varied training set sizes - 50%, 25%
and 10% of the original training set. All measurements are expressed in centimeters, representing Euclidean distances
between predicted landmark coordinates and ground truth coordinates.
Atomistic CNN (50%) Atomistic CNN (25%) Atomistic CNN (10%)
Landmark Names Mean Median 95th Percentile Mean Median 95th Percentile Mean Median 95th Percentile
Rt. Humeral Medial Epicn 1.35 1.02 3.53 1.69 1.12 5.17 1.81 1.38 5.02
Rt. Iliocristale 2.06 1.51 5.76 2.34 1.65 6.06 2.45 1.64 7.37
Rt. Infraorbitale 1.04 0.80 2.75 1.15 0.79 3.08 1.04 0.67 2.81
Rt. Knee Crease 0.70 0.51 1.67 1.14 0.79 3.06 1.30 0.88 3.75
Rt. Lateral Malleolus 0.65 0.49 1.70 0.83 0.68 2.13 1.37 1.08 3.45
Rt. Medial Malleolus 0.73 0.59 1.65 0.72 0.51 1.86 1.16 0.89 3.06
Rt. Metacarpal Phal. II 1.64 1.22 4.43 1.26 0.90 3.60 2.03 1.51 5.92
Rt. Metacarpal-Phal. V 1.28 0.99 3.44 1.34 0.98 3.88 1.94 1.38 5.61
Rt. Metatarsal-Phal. I 0.65 0.54 1.66 1.74 1.44 3.90 1.09 0.84 2.62
Rt. Metatarsal-Phal. V 0.66 0.50 1.72 0.76 0.51 2.16 1.33 1.08 3.14
Rt. Olecranon 1.14 0.75 2.99 1.03 0.59 2.99 1.42 0.83 4.27
Rt. PSIS 3.42 2.03 10.53 3.30 1.98 9.61 4.97 2.94 14.88
Rt. Radial Styloid 1.09 0.76 3.35 1.83 1.52 4.42 2.17 1.57 5.82
Rt. Radiale 0.70 0.44 2.19 1.49 1.16 3.64 1.16 0.79 3.20
Rt. Sphyrion 0.92 0.81 1.86 1.17 1.00 2.43 1.95 1.58 4.96
Rt. Thelion/Bustpoint 1.38 1.09 2.63 1.05 3.05 3.19 2.59 1.67 3.91
Rt. Tragion 1.85 2.65 3.07 3.67 1.47 4.74 1.76 1.54 2.16
Rt. Trochanterion 3.26 2.43 8.94 4.36 3.34 11.23 3.36 2.16 10.38
Rt. Ulnar Styloid 2.21 1.60 6.37 1.06 1.70 3.25 2.20 1.44 6.40
Sellion 2.47 2.05 6.25 1.78 1.53 2.07 1.23 0.88 3.41
Substernale 1.36 0.94 3.95 1.51 0.92 4.84 1.98 1.33 6.27
Supramenton 0.79 0.50 2.38 1.57 1.25 3.83 1.11 0.71 3.36
Suprasternale 1.14 0.87 3.01 1.23 0.94 3.40 1.12 0.84 2.95
Waist, Preferred, Post. 2.34 3.12 3.69 2.66 2.08 2.90 1.94 2.28 3.03
REFERENCES CITED
Mathieu Aubry, Ulrich Schlickewei, and Daniel Cremers. “the wave kernel
signature: A quantum mechanical approach to shape analysis.”. IEEE
International Conference on Computer Vision Workshops, 2011.
Huanyu Yang, Kuangrong Hao, and Yongsheng Ding. Semantic segmentation of
human model using heat kernel and geodesic distance. 2018. doi:
10.1155/2018/7974340. URL https://doi.org/10.1155/2018/7974340.
Janet F. Grant, Catherine R. Chittleborough, Zumin Shi, and Anne W. Taylor.
The association between a body shape index and mortality: Results from an
australian cohort. 2017. URL
https://doi.org/10.1371/journal.pone.0181244.
Diana M. Thomas, Carl Bredlau, Anja Bosy-Westphal, Manfred Mueller, Wei Shen,
Dympna Gallagher, Yuna Maeda1, Andrew McDougall1, Courtney M.
Peterson, Eric Ravussin, and Steven B. Heymsfield. Relationships between
body roundness with body fat and visceral adipose tissue emerging from a
new geometrical model. 2013.
KSR Van Kuijk, M Reijman, SMA Bierma-Zeinstra, JH Waarsing, and
DE Meuffels. Posterior cruciate ligament injury is influenced by intercondylar
shape and size of tibial eminence. pages 1058–1062, 2019. doi:
10.1302/0301-620X.101B9.BJJ-2018-1567.R1.
Sebastian Sitko, Rafel Cirer-Sastre, Nuria Garatachea, and Isaac López-Laval.
Anthropometric characteristics of road cyclists of different performance levels.
Applied Sciences, 13(1), 2023. ISSN 2076-3417. doi: 10.3390/app13010224.
URL https://www.mdpi.com/2076-3417/13/1/224.
Satoru Kiire. Effect of leg-to-body ratio on body shape attractiveness. arch sex
behaviour. 2016. doi: 10.1007/s10508-015-0635-9.
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec
2.0: A framework for self-supervised learning of speech representations. 2020.
David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre,
George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda
Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John
Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach,
Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game
of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
URL
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html.
51
Maurel Pierre Pinte Caroline, Fleury Mathis. Deep learning-based localization of
eeg electrodes within mri acquisitions. Frontiers in Neurology, 12, 2021. doi:
10.3389/fneur.2021.644278. URL
https://www.frontiersin.org/articles/10.3389/fneur.2021.644278.
Mitchell Hargreaves, David Ting, Stephen Bajan, Kamron Bhavnagri, Richard
Bassed, and Xiaojun Chang. A generative deep learning approach for forensic
facial reconstruction. pages 1–7, 11 2021. doi:
10.1109/DICTA52665.2021.9647290.
Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan,
Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich,
Matthias Grundmann, and Cristian Sminchisescu. Blazepose ghum holistic:
Real-time 3d human landmarks and pose estimation. 2022.
Andrea Giachetti, Emanuele Mazzi, Francesco Piscitelli, M Aono, A Ben Hamza,
T Bonis, Peter Claes, A Godil, C Li, M Ovsjanikov, et al. Shrec’14 track:
automatic location of landmarks used in manual anthropometry. In
Eurographics Workshop on 3D Object Retrieval, pages 93–100, 2014.
R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas. ”pointnet: Deep learning on
point sets for 3d classification and segmentation. pages 77–85. 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi:
10.1109/CVPR.2017.16.
Zhirong Wu, Shuran Song, Aditya Khosla, Xiaoou Tang, and Jianxiong Xiao. 3d
shapenets for 2.5d object recognition and next-best-view prediction. CoRR,
abs/1406.5670, 2014. URL http://arxiv.org/abs/1406.5670.
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing
Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al.
Shapenet: An information-rich 3d model repository. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages 1–9.
IEEE, 2015.
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero,
and Michael J. Black. Keep it smpl: Automatic estimation of 3d human pose
and shape from a single image. 2016.
Tariq M. Khan, Antonio Robles-Kelly, and Syed S. Naqvi. T-net: A
resource-constrained tiny convolutional neural network for medical image
segmentation. pages 1799–1808, 2022. doi: 10.1109/WACV51458.2022.00186.
Kathleen M. Robinette and Hein A. M. Daanen. The caesar project: a 3-d surface
anthropometry survey. pages 380–386, 1999.
52
Matthew A. Brunsman, Hein A. M. Daanen, and Kathleen M. Robinette. Optimal
postures and positioning for human body scanning. Proceedings. International
Conference on Recent Advances in 3-D Digital Imaging and Modeling (Cat.
No.97TB100134), pages 266–273, 1997.
Paolo Cignoni, Massimiliano Corsini, and Guido Ranzuglia. Meshlab: an
open-source 3d mesh processing system. ERCIM News, 2008(73), 2008.
William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation
learning on large graphs. In Proceedings of the 31st International Conference
on Neural Information Processing Systems, NIPS’17, page 1025–1035, Red
Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is
worth 16x16 words: Transformers for image recognition at scale. 2021.
53