Automated segmentation of medial temporal lobe subregions on in vivo T1-weighted MRI in early stages of Alzheimer's disease

Medial temporal lobe (MTL) substructures are the earliest regions affected by neuro-fibrillary tangle pathology — and thus are promising biomarkers for Alzheimer's disease (AD). However, automatic segmentation of the MTL using only T1-weighted (T1w) magnetic resonance imaging (MRI) is challenging due to the large anatomical variability of the MTL cortex and the confound of the dura mater, which is commonly segmented as gray matter by state-of-the-art algorithms because they have similar intensity in T1w MRI. To address these challenges, we developed a novel atlas set, consisting of 15 cognitively normal older adults and 14 patients with mild cognitive impairment with a label explicitly assigned to the dura, that can be used by the multiatlas automated pipeline (Automatic Segmentation of Hippocampal Subfields [ASHS-T1]) for the segmentation of MTL subregions, including anterior/posterior hippocampus, entorhinal cortex (ERC), Brodmann areas (BA) 35 and 36, and parahippocampal cortex on T1w MRI. Cross-validation experiments indicated good segmentation accuracy of ASHS-T1 and that the dura can be reliably separated from the cortex (6.5% mis-labeled as gray matter). Conversely, FreeSurfer segmented majority of the dura mater (62.4%) as gray matter and the degree of dura mislabeling decreased with increasing


| INTRODUCTION
The medial temporal lobe (MTL) is the site of several neurodegenerative pathologies, most notably of neurofibrillary tangle (NFT) pathology, a hallmark of Alzheimer's disease (AD), which is thought to first affect the transentorhinal cortex, before it spreads to the entorhinal cortex (ERC) and cornu ammonis region 1 of the hippocampus (Braak & Braak, 1995, 1991Ding, Van Hoesen, Cassell, & Poremba, 2009). As NFT pathology is closely related to neuron and synapse loss (Bobinski et al., 1997;Braak & Braak, 1991;Fukutani et al., 1995), certain MTL subregions may therefore show early and selective atrophy and serve as imaging biomarker in the early stages of AD. In fact, a recent in vivo magnetic resonance imaging (MRI) study showed selective atrophy in Brodmann area 35 (BA35), a region that approximates the transentorhinal region, in individuals with preclinical AD compared to controls (Wolk et al., 2017). These subregions are also of interest because they are thought to subserve different cognitive functions, such as recollection and familiarity (Wolk & Dickerson, 2011;Yonelinas et al., 2007), and are part of two dissociable MTL networks, where the anterior hippocampus, ERC and perirhinal cortex (PRC) are part of the anterior MTL network and the posterior hippocampus and parahippocampal cortex (PHC) are part of the posterior MTL network (Ranganath & Ritchey, 2012). These networks are also thought to be affected in the early stages of AD .
Fine-grained measurement of subregions of the MTL has therefore received increasing attention in the recent years, with many studies utilizing high-resolution T2-weighted (T2w) MRI images, often with~0.4 × 0.4 mm 2 in-plane resolution (Bender et al., 2018;Ekstrom et al., 2009;Mueller et al., 2007;Olsen et al., 2013;Preston et al., 2010;Yushkevich, Amaral, et al., 2015;Zeineh, Engel, Thompson, & Bookheimer, 2001). The advantage of these images is that they allow for improved visualization of MTL structures less visible on T1-weighted (T1w) MRI scans. For example, the stratum radiatum lacunosum moleculare, an important boundary between certain subfields of the hippocampus, and dura mater, which is part of the meninges, can be visualized with T2w scans (Figure 1). The advantage of the clear visualization of the dura with these T2w MRI images is that it allows for accurate segmentation of important adjacent MTL subregions in contrast to T1w MRI images in which the dura has similar intensity as gray matter (Xie et al., 2016).
In our prior work (Yushkevich, Pluta, et al., 2015), we have developed a multiatlas segmentation software/package, called Automatic Segmentation of Hippocampal Subfields (ASHS, https://www.nitrc. org/projects/ashs), that can reliably segment MTL subregions in T2w MRI (the pipeline that works with T2w MRI is referred to as ASHS-T2 in this article). Even though there are advantages of these T2w MRI images over T1w MRI images, there are large data sets of T1w MRI scans available and analyzing these data sets would allow for more power to test hypotheses of interest. Additionally, T1w images often have higher resolution in the through-plane direction which helps in better resolving the folding and branching of sulci, important for the segmentation of these MTL cortical regions.
Available methods for the parcellation of MTL subregions on T1w MRI include several manual approaches (Kivisaari, Probst, & Taylor, 2013;Malykhin, Bouchard, Camicioli, & Coupland, 2008). An advantage of these manual approaches is that they often take into account the anatomical variability of the MTL cortex, which has multiple distinct anatomical subtypes, defined by the folding and branching patterns of the collateral sulcus (CS), that greatly affects the location of the borders between MTL cortices (Ding & Van Hoesen, 2010), for example, when the CS is deep, BA35 is located at the medial bank of the CS while when CS is shallow, BA35 also occupies the fundus and lateral bank. However, manual segmentation is infeasible for larger data sets like the AD neuroimaging initiative (ADNI), which includes thousands of MRI scans. Among the automated methods available for MTL subregion segmentation on T1w MRI, the specialized modules for MTL provided by FreeSurfer (Fischl, 2012) are of the most widely used in the literature in older populations (Delli Pizzi et al., 2016;Lehmann et al., 2010;Mah, Binns, & Steffens, 2015;Mishra et al., 2017;Pasquini et al., 2016). FreeSurfer includes a module for labeling hippocampal subfields and hippocampal lamina based on an ex vivo atlas (Iglesias et al., 2015). However, we have previously argued that standard resolution T1w MRI scans do not provide sufficient resolution for the visualization of the inner structure of the hippocampus and the parcellation of the hippocampal subfields Wisse, Biessels & Geerlings, 2014). FreeSurfer also provides separate specialized modules for ERC (Fischl et al., 2009) and PRC segmentation (Augustinack et al., 2013) based on spatial probability maps derived from ex vivo imaging. However, since these probability maps are defined in the space of a single template, this approach does not directly account for different subtypes of the MTL cortex.
Another important issue for T1w MRI scans, as mentioned above, relates to the visualization of the dura mater. In the MTL, a large proportion of the ERC and parts of the PRC are located directly adjacent to the dura and often appear merged with parts of the dura in T1w MRI (red arrows in Figure 1). To the best of our knowledge, none of the automatic analysis pipelines for MTL cortices using T1w MRI alone have addressed this confound, and the dura is often segmented as part of the gray matter by the state-of-the-art image processing algorithms, including FreeSurfer (the third column in Figure 1). This likely leads to an error in the quantification of ERC and PRC, which potentially confounds the findings of research studies. In healthy individuals for whom there is little or no cerebrospinal fluid (CSF) separating the dura from the cortex, parts of the dura may be mistakenly labeled as cortex, whereas in patient groups with severe gray matter atrophy, the presence of such CSF would lead automatic methods to correctly exclude dura from the cortex label. This would potentially lead to a systematic bias in the estimation of group differences. To correct for this error, some studies performed manual correction either based on empirical rules or using T2w MRI of the same subject (e.g., the Human Connectome Project; Glasser et al., 2013). However, manual correction is labor intensive and T2w MRI scans are not always available. This is therefore not a feasible solution for large data sets consisting of only T1w MRI scans.
We hypothesize that the MTL cortex can be reliably automatically separated from the dura mater only using the T1w MRI even though there is only limited contrast between them. There are important features in T1w MRI that could be informative of the boundary of the dura and the cortex, for example, (a) the thin layer of CSF between the dura and the cortex can be visualized in some subjects (green arrow in Figure 1-b1) and (b) there are portions of the dura near the brain stem and inferior to the sulcus that are not merged with the cortex (white arrows in Figure 1). These features make it possible to infer the boundary between the cortex and the dura, even if it is not completely visible. Indeed, we recently developed a new atlas set that can be used by an established multiatlas segmentation framework (Yushkevich, Pluta, et al., 2015) together with a superresolution (SR) technique (Manjón et al., 2010) that is able to reliably segment ERC and PRC in T1w MRI that explicitly accounts for the confound of the dura mater and for the existence of multiple MTL cortex anatomical variants (Xie et al., 2016). To account for dura confounds, the atlas set for this pipeline was created using pairs of T1w and T2w scans in the same subjects, with the T2w previously used as an atlas set in the T2w-based MTL subregion segmentation approach by Yushkevich, The dura mater (indicated by purple lines) has similar intensity as gray matter in T1w MRI (the first column) but can be easily separated from the cortex in T2w MRI (the hypointensity layer in the second column). It is often segmented as part of the cortex by state-of-theart algorithms, for example, FreeSurfer (the third column). A thin layer of CSF (green arrow) can be visualized in some subjects (second row, b) but not in the others (first row, a). White arrows point to the portions of the dura that are not merged with the cortex. CSF, cerebrospinal fluid; MRI, magnetic resonance imaging; T1w MRI, T1-weighted MRI; T2w MRI, T2-weighted MRI [Color figure can be viewed at wileyonlinelibrary.com] Pluta, et al. (2015). The T2w images in the atlas set include labels for the ERC and subdivisions of the PRC, that is, BA35 and BA36, generated based on a segmentation protocol that takes anatomical variability of the CS into account and was developed in consultation with the neuroanatomist SLD, who has specific expertise in anatomical variability of the CS (Ding & Van Hoesen, 2010). The segmentations of this atlas set were transformed into the T1 space after coregistration with the T2w MRI of the same subject and manually edited to account for any residual misregistration errors. Additionally, the T1w atlas set segmentations were extended with a dura label informed by the aligned T2w MRI of the same subject, which helps in accurately locating the boundary between dura and the MTL cortex. Evaluation of this pipeline indicated that a large portion of the dura was assigned the correct label in our pipeline but not in other methods [FreeSurfer (Fischl, 2012) and ANTs (Avants, Epstein, Grossman, & Gee, 2008)], which included a large portion of the dura in the gray matter label. Crossvalidation experiments showed promising segmentation accuracy of the automatic segmentation relative to manual segmentation for the MTL cortical regions [Dice similarity coefficient (DSC;Dice, 1945) is close to that using the T2w MRI (the ASHS-T2 pipeline, range from 0.671 to 0.755), in which the boundary between the cortex and the dura can be visualized]. Moreover, the clinical utility of the pipeline was evaluated by examining the statistical power in discriminating controls from amnestic mild cognitive impairment (aMCI) patients, and indicated that BA35, in absolute terms, had the greatest area under the curve among the MTL cortex subregions, which is consistent with the Braak staging in the MTL (Braak & Braak, 1995).
This article extends this recent work, which was published in conference proceedings, with a richer set of MTL subregion measurements and additional experiments. We have extended our label set to include the PHC and the hippocampus, including a subdivision of the anterior and posterior hippocampus. Also, we provided thickness values in addition to volumes for the MTL cortices because thickness measures are less sensitive to border placement which may make them less sensitive to likely one aspect of errors in segmentation. We have improved the registration between the T1w and T2w MRI scans allowing for a closer alignment which required less editing of the transformed segmentations in the T1w-space. We performed crossvalidation experiments to assess the accuracy of the automatic segmentation relative to manual one, and compared our pipeline with FreeSurfer version 6.0 (Fischl, 2012) to evaluate how the different methods label dura in T1w MRI. And we further evaluated the performance of our pipeline in scans from ADNI phases GO and 2 by comparing MTL subregional volumes and thickness in amyloid negative controls with individuals with preclinical AD, prodromal AD and AD dementia. In addition, the atlas and software developed in this article are made publicly available (https://sites.google.com/view/ashs-dox/ home). Finally, we have also provided an easy-to-use cloud-based service built into the ITK-SNAP image segmentation tool (Yushkevich et al., 2006) that allows users to execute our pipeline on a remote server with a few mouse clicks. The cloud based serviced is briefly summarized in Supplementary Material A. A Detailed tutorial of our cloud-based service is available at https://sites.google.com/view/ ashs-dox/cloud-ashs/overview. The atlas set used in this study consists of 15 cognitively normal older controls (NC) and 14 aMCI patients. These participants were recruited from the Penn Memory Center/Alzheimer's Disease Center (PMC/ADC) at the University of Pennsylvania. Diagnosis of aMCI was made following established criteria (Petersen, 2004;Petersen et al., 2009;Winblad et al., 2004). This study was approved by the Institutional Review Board of the University of Pennsylvania and informed consent was provided by all subjects. This is the same atlas set that was used by Yushkevich, Pluta, et al. (2015) and Xie et al. (2017) to develop the atlas set using both T1w MRI and high-resolution T2w MRI. To avoid confusion, the atlas set developed in this study will be referred to as the PMC-T1 atlas and the one used in Yushkevich, Amaral, et al. (2015), Yushkevich, Pluta, et al. (2015), and Xie et al.
(2017) will be referred to as the PMC-T2 atlas. Demographic and the mini-mental state examination (MMSE) data for the aMCI and NC groups are shown in Table 1 participant is determined by thresholding a summary measure of Florbetapir standardized uptake value ratio (SUVR) derived from Florbetapir PET using a threshold of SUVR > = 1.11 (Landau et al., 2012). The summary Florbetapir SUVR measure came from publicly available processed data on the ADNI website, which was calculated by taking the mean SUVR of a set of regions typically associated with increased uptake in AD and using cerebellar gray matter as reference region [details described in Landau et al. (2012)]. In total, 667 participants were included and grouped into Aβ negative (Aβ−) controls, preclinical AD (Aβ positive controls), early prodromal AD (Aβ positive early MCI), late prodromal AD (Aβ positive late MCI) and dementia patients (Aβ positive AD). Four subjects' T1w MRI scans suffered from severe motion and thus were excluded from in this study. Table 2 summarizes the characteristics of the remaining 663 subjects.

| ADNI imaging protocol
The MRI imaging protocols of the ADNI study that were used to acquire the T1w MRI scans were previously described in Jack et al. (2008) and Leow et al. (2006). For Florbetapir PET, images were acquired in a 20 min PET brain scan session (four frames of 5 min duration) after a 50-min uptake phase following injection of 10 mCi of tracer.

| Manual segmentation of the MTL subregions in T1w MRI
The procedure of manual segmentation can be divided into two steps: manual segmentation of the MTL cortex and the hippocampus. Manual segmentation of the MTL cortex was initialized with the manual segmentations of the PMC-T2 atlas set (in the space of the T2w MRI) propagated to the space of the T1w MRI. Information from both T1w and T2w MRI scans of the same subject was taken into account during manual segmentation, which is crucial for separating dura from the cortex. For the hippocampus, an automatic segmentation was first generated and followed by manual editing. All edits and segmentations were performed using ITK-SNAP (Yushkevich et al., 2006).

| Segmentation of MTL cortex and dura
Manual segmentations of the MTL cortex from the PMC-T2 atlas set from Yushkevich, Pluta, et al. (2015) and Xie et al. (2017) were propagated to the space of the aligned T1w MRI, followed by manual edits and addition of the dura label (more details regarding the segmentation protocol for the MTL cortex can be found in these two citations). Figure 2 shows three examples that illustrate the workflow. Details are described below.
Alignment between T2w MRI and the T1w MRI of the same subjects were performed following the steps below: 1. The T1w MRI was rigidly aligned to the T2w MRI using the ANTs (http://stnava.github.io/ANTs/) with mutual information as the similarity metric.
2. The T1w MRI was upsampled to 0.5 × 0.5 × 1.0 mm 3 by applying a patch-based SR technique (Manjón et al., 2010) for the purpose of bringing the resolution of the T1w MRI closer to that of the T2w MRI. Also, the SR upsampling increases the contrast between the dura and gray matter in T1w MRI so that the boundary between them can be better visualized. neighbor interpolation, respectively. The purpose of this step is to make the voxel size of the T2w MRI and SR T1w MRI similar.
4. The interpolated T2w MRI was cropped based on its manual segmentation with a margin of 10 voxels in all directions. This is done separately for left and right hemispheres.

From experiments, we found that global rigid intermodality registration [
Step (1)] is not sufficient to accurately align the bilateral MTL regions which is probably due to small local spatial distortion of the two modalities. In order to generate better alignment of the MTL between the two modalities, for each hemisphere, affine registration was performed between the SR T1w MRI and the cropped upsampled T2w MRI, initialized with the rigid transformation between the whole brain T1w MRI and T2w MRI obtained in Step (1). This additional local affine registration, which was not included in the prior work (Xie et al., 2016), is essential for accurate alignment of the MTL between the two modalities.
6. The SR T1w MRI was transformed and resampled to the cropped upsampled T2w MRI space (referred to as registered SR T1w MRI), in which manual segmentations of the MTL cortex and the hippocampus were performed.
After registration of the T2w and T1w MRI described above, the MTL region of both modalities are well aligned as shown in the first two images of each example in Figure 2. Labels of the MTL cortex, including cortical labels (ERC, BA35, BA36, PHC) and sulcus labels (CS and occipitotemporal sulcus [OTS]), were copied over to the registered SR T1w MRI (the third and the fourth images of each example in Figure 2). Because of slight differences in appearance between T1w and T2w MRI and small errors in registration due to highly anisotropic voxel size of T2w MRI, intermodality registration and the upsampling of both modalities, the labels were checked and manually edited to correctly match the border with the white matter, CSF and dura. For these edits, both the T1w and T2w MRI of the same subject were opened side by side in ITK-SNAP so that boundaries can be determined using information from both modalities (the fifth image of each example in Figure 2). Note that only the outer borders with surrounding regions were adjusted, not the borders between the different MTL cortices. Only the last slice of the ERC was adjusted, as a transition slice, extending half the length of one slice anterior (note that these two slices translate to one slice on the T2w MRI). This is similar to the procedure in Berron et al. (2017). BA35 and BA36 (same as in original protocol) and the most posterior slice is fourth most posterior 1.3 mm slice of the hippocampus (was second most posterior 2.6 mm slice in original protocol). All subjects were visually checked and the segmentations were adjusted to match these boundary rules. Any given label needed to be extended at most two slices, where borders were matched to adjacent slices. In none of the cases the anatomy changes dramatically from one slice to the next, making these adjustments feasible.
Importantly, along the full length of the MTL cortex, a label for the dura mater was assigned to the voxels inferior to the corrected MTL cortex labels that have gray appearance in the registered SR T1w MRI and dark appearance in the resampled T2w MRI. Of note, the segmentation of the dura was informed by the registered T2w MRI, from which the boundary between dura and the cortex can more easily be identified. This is especially crucial for the situation when dura is completely attached to the cortex and is difficult to visualize in T1w MRI (Example 1 in Figure 2). In some cases, a thin layer of CSF between the dura and gray matter is visible in SR T1w MRI (green arrow in Example 2 in Figure 2), that is, a layer of voxels that have much darker intensity between the dura and gray matter in SR T1w MRI, which helps guide the dura segmentation. The CSF voxels were assigned a miscellaneous label. Moreover, in some cases, this layer of CSF is not visible; however, the dura is not completely attached to the cortex either (Example 3 in Figure 2). The portion of the dura near the brain stem and inferior to the CS that is not adjacent to the cortex (white arrows in Figure 2) also provides clues for automatic and manual segmentation of the dura. The anterior and posterior extents of the dura are limited to the slices with MTL cortex labels (ERC, BA35, BA36, and PHC).

| Segmentation of the hippocampus
The European Alzheimer's Disease Consortium and ADNI harmonized protocol (HarP) Frisoni et al., 2015) is a well-  Yushkevich, Pluta, et al. (2015) and summarized briefly in the following steps: 1. An unbiased whole brain population template is built using the T1w MRI of all the subjects.

The region of interest (ROI) of each hemisphere was identified by
averaging the corresponding manual segmentations that are warped to the space of the template.
3. Each SR T1w MRI and the corresponding segmentation were warped to the space of the template and cropped around the ROI.

4.
Pairwise registrations between all the subjects were performed between the warped and cropped scans.

5.
Label fusion was performed for each atlas in its native space using the rest of the atlases as candidates.
6. AdaBoost classifiers were trained to learn the systematic error between the automatic segmentation and the manual segmentations.

| Application of ASHS-T1 atlas to new images
Once the ASHS-T1 atlas is trained, we can use the ASHS segmentation pipeline to automatically segment the T1w MRI scan of a new subject. Different from the pipeline described in Yushkevich, Pluta, et al. (2015), the proposed pipeline only takes the T1w MRI scan as input and does not require the T2w MRI scan. In brief, it involves the following steps: 1. The T1w MRI of the target subject is first upsampled to 0.5 × 0.5 × 1 mm 3 using the SR technique (Manjón et al., 2010).
2. The ROI around the left and right MTL are identified in the target SR T1w image by registering to a whole-brain template generated in the training pipeline.
3. For each target ROI, the corresponding ROIs in the atlas set are registered to it using ANTs with normalized cross-correlation metric (Avants et al., 2008).

4.
Atlas labels are then warped to the target ROI and combined using the joint label fusion algorithm .

5.
The process is repeated in a bootstrapping fashion, where the initial segmentation of the target structures is used to initialize affine alignment between the atlas and target ROIs. This bootstrapping results in fewer failed atlas-to-target registrations and better overall segmentation accuracy. The automatic segmentation generated from this step is referred to as the "Heur" output (The name "Heur" stands for heuristic rules that can be specified in ASHS. In our prior work on T2-weighted MRI (Yushkevich, Pluta, et al., 2015), we apply some heuristics to cut off anterior/posterior parts of cortical labels. Although no heuristic rules were used in this study, we keep the naming convention the same to be consistent with the outputs of the ASHS software.).
6. Two Adaboost classifiers, which were trained on shape features (the output referred to as the "NoGray") or shape and gray-scale intensity features ("UseGray") to correct for systematic errors generated in the multiatlas label fusion step, are applied to further improve the automatic segmentation. Since the classifiers were trained on the images of the atlas set, they may not generalize well to images acquired with different MRI imaging protocols. Therefore, using the "UseGray" output is only recommended if the target T1w MRI scan is acquired with a similar protocol as the atlas set.
Final bilateral automatic segmentations are generated in the target SR T1w MRI space. For the atlas set of 29 subjects, the automatic segmentation in the space of the SR T1w MRI was generated in a leave-one-out manner using the remaining 28 subjects as atlases. The segmentation accuracy of the "UseGray" output is reported in Table 3 and those of all the three outputs ("Heur", "NoGray," and "UseGray") are also computed and reported in Table S1.
When segmenting the baseline scans of the ADNI cohort, the whole 29-subject ASHS-T1 atlas set was used. Because of the difference in imaging protocol between the ADNI and the PMC-T1 atlas set, it is not appropriate to use the "UseGray" output. When quality control (Section 2.4.3) was performed, we observed that the dura and MTL cortex segmentation is of better quality when using the "Heur" output compared to the "NoGray" and thus the former is used.
The segmentation accuracy in terms of DSC between automatic and manual segmentations of "Heur" is comparable but slightly lower (1.5% maximum DSC) than "UseGray" shown in Table S1. Volumetric and thickness (see Section 2.5.3) measurements of bilateral anterior/posterior hippocampus, ERC, BA35, BA36, and PHC were extracted for each subject.

| Quality control
The quality of all the automatic segmentations generated by ASHS-T1 was visually checked. The pipeline successfully labeled the baseline  12 aMCI) and the corresponding manual ICV segmentations. The manual labels in this atlas set were generated with the guidance of the coregistered computer tomographic (CT) scans of the same subjects.
Since the boundary between the skull and the soft tissue is clear in CT scans, we were able to obtain a relatively accurate manual segmentation of the ICV. Supplementary Material C describes the detail of ICV automatic segmentation pipeline.
2.5.2 | Cross-validation experiment in the atlas set in the space of the T2w MRI (ASHS-T2) To compare the segmentation accuracy of the MTL cortices of the proposed pipeline that only utilizes T1w MRI to that using both T1w and T2w MRI (Yushkevich, Amaral, et al., 2015;Yushkevich, Pluta, et al., 2015), leave-one-out cross validation was also performed using the PMC-T2 atlas (comparisons were performed between the automatic and manual segmentations in the space of the T2w MRI). The same experiment has been done in Yushkevich, Pluta, et al. (2015).
However, since we have updated the ASHS software (ASHS version 2.0.0 rather than 1.0.0, https://www.nitrc.org/frs/?group_id=370) and the atlas manual segmentation [the PHC and OTS labels were added as described in Xie et al. (2017)], the results are slightly different from that in Yushkevich, Pluta, et al. (2015). Note that we did not perform this analysis for the hippocampus, as the segmentation protocol for the T1w and T2w hippocampus were different.

| Thickness measures of the MTL cortices extracted from the ASHS-T1 automatic segmentation
For MTL cortices, thickness measures may be more appropriate compared to volume because they are less sensitive to uncertainty in boundary estimation between cortical regions. A multitemplate thickness analysis pipeline (Xie et al., 2017(Xie et al., , 2014

| Volume and thickness measures of hippocampus, ERC, and PRC using FreeSurfer
In order to compare the volume and thickness measurements extracted from the proposed pipeline to that from an established paradigm for T1w MRI, FreeSurfer version 6.0 (Fischl, 2012) was applied to the T1w MRI scans of both the 29 subjects in the atlas set and the ADNI data set. Volume measurements of the hippocampus were extracted from the "aseg.stats" file and volume and thickness measures of the ERC and PRC were extracted from "lh.BA_exvivo.thresh.
stats" and "rh.BA_exvivo.thresh.stats" files. The location of the ERC and PRC is estimated using a probabilistic framework with templates constructed from ex vivo atlases described in Fischl et al. (2009) and Augustinack et al. (2013), respectively.

| Statistical analysis
All statistical analyses in this article are two-tailed with significance levels of p = .05 unless stated otherwise. Bilateral measurements of each subregion were averaged.

| Analysis of demographic and MMSE data
To test the differences of demographic and MMSE between diagnosis groups, that is, aMCI-NC of the PMC atlas set and each patientcontrol pair of the ADNI data set, independent two-sample t test (age), Wilcoxon rank sum test (education, MMSE), and contingency χ 2 test (gender) were performed.

| Evaluate the accuracy of the automatic segmentation
To evaluate the automated segmentations generated by ASHS-T1 and ASHS-T2, average DSC (Dice, 1945) between the leave-one-out automatic segmentations and the corresponding manual segmentations of each image in the PMC atlas sets were computed. In addition, we also computed the intraclass correlation (ICC) between volume measurements of the MTL subregions extracted from the automatic segmentations in the PMC-T1 atlas set and those obtained using the edited manual segmentations in T1w MRI space. To compare the ICC for the ASHS-T1 pipeline with that of the ASHS-T2 pipeline, similar analysis was performed for the MTL cortex labels (ERC, BA35, BA36, and PHC) for the PMC-T2 atlas set as well. ICC is computed using the "icc" function with the R package psy 1.1 implementing the method described in Shrout and Fleiss (1979).

| Group analysis between patients and Aβ− controls in ADNI
To evaluate the clinical utility, the four patient groups were compared to Aβ− controls separately. For each volume measure, a general linear model with group membership as the factor of interest, age and ICV as covariates, was fitted to obtain the t statistics for the controlpatient contrast. Bonferroni corrected significance level (p < .05/10) is used to determine significant effects. For thickness measures, similar analysis was performed but only age was used as covariate and the Bonferroni correction significance level was set to p < .05/6.

| EVALUATION EXPERIMENTS AND RESULTS
We first evaluated the accuracy of the automatic segmentation of ASHS-T1 with the manual ones in the space of the T1w MRI and compare the performance of ASHS-T1 with that of ASHS-T2 (Section 3.1). Then, we investigated the extent to which an established analysis method for T1w MRI, that is, FreeSurfer, mislabels the dura mater and the cortex (Section 3.2). Finally, to demonstrate clinical utility of the proposed pipeline, we compared the volume and thickness measures extracted using the proposed pipeline between patients and controls using a large data set from the ADNI and compared this with FreeSurfer (Section 3.3).

| Evaluate accuracy of the automatic segmentation with manual segmentation
Primary validation of segmentation accuracy was performed on the set of 29 subjects from the PMC atlas for whom T1w MRI, T2w MRI, and both automatic and manual segmentations of the SR T1w MRI and T2w MRI are available.
The DSC results and volume measurements are summarized in to that of the ASHS-T2 (Yushkevich, Pluta, et al., 2015). The DSC of the proposed pipeline in segmenting ERC (DSC = 0.76) is slightly lower than that in T2w MRI (DSC = 0.79), which could be due to the limited ability to resolve gray matter boundaries because of the lower resolution and the confound of dura in T1w MRI. Importantly, we observe that the volume of the dura mater is larger than that of the ERC and BA35, indicating that segmenting dura mater as cortex could significantly confound volume measures of subregions in the MTL cortex.
From the ICC results, as reported in Table 3

| Dura mislabeling as cortex
In this section, we performed experiments to test the two hypotheses that were introduced in the Introduction, that is, (a) the MTL cortex is commonly oversegmented by FreeSurfer because of the mislabeling of the adjacent dura mater and (b) the degree of dura mislabeling as cortex by FreeSurfer is different between patients and controls.
To test the first hypothesis, among subjects in the PMC-T1 atlas set, we first resampled the FreeSurfer whole brain segmentations to the space of the SR T1w MRI and then computed the average percentage of voxels labeled as dura in the manual segmentations that were mislabeled as gray matter or other by the proposed pipeline and FreeSurfer. The results, shown in Table 4, support the notion that a large proportion of dura (62.4%) is segmented as gray matter by FreeSurfer. We note that FreeSurfer does not have a specific label for the dura and therefore has to label the dura voxels as something else; including them in the gray matter introduces error to cortical thickness computations. On the other hand, the majority (71.9%) of dura voxels are correctly labeled by the proposed pipeline, only 6.5% of them are labeled as gray matter and the amount of dura mislabeling as cortex is not significantly different between aMCI and NC (6.8 ± 3.1% vs. 6.2 ± 4.2%, p > .1, revealed by two-sample t test).
The second hypothesis can be tested using the ADNI data set with controls and patients at different stages of AD. Since manual segmentation of the MTL cortex and dura is not available in the ADNI data set, the degree of dura mislabeling as gray matter by FreeSurfer is computed using the automatic ASHS-T1 segmentation, that is, the average percentage of voxels labeled as dura by the ASHS-T1 that are labeled as gray matter. We believe this is a suitable measure because of the following evidence: 1. In the PMC-T1 atlas set, we computed the degree of dura mislabeling as gray matter by FreeSurfer relative to the dura label in the automatic segmentations generated by ASHS-T1 and relative to the dura label from the manual segmentations. These measurements were highly correlated (Pearson correlation r = .946, p = 9.3 e-15), shown in Figure S2.

2.
In the PMC-T1 atlas set, no significant differences were observed between aMCI and controls in segmentation accuracy of dura (DSC reported in Table 3, 0.74 vs. 0.76) or for mislabeling of dura as cortex (6.8 vs. 6.2%) using the automatic dura segmentations generated by ASHS-T1. Therefore, it is unlikely that it will introduce bias between patients and controls.
3. All the segmentations of the ADNI subjects generated by the T1 pipeline used in this analysis were visually checked and only segmentations that have high-quality MTL cortex segmentation were used in this analysis and thus the bias induced by segmentation errors is limited. Figure 4 summarizes the percentage of dura voxels segmented as gray matter by FreeSurfer in Aβ− controls and the four patient groups.
The amount of dura mislabeling as cortex decreases with increasing disease severity, probably due the more distinct separation between the MTL cortex and the dura (Figure 4). The proportion of mislabeling is significantly different between Aβ− controls and patients at early prodromal AD, late prodromal AD, and dementia stages revealed by two-sample t tests. Since manual segmentation of the ADNI data set is not available, it is not feasible to evaluate the amount of dura mislabeling as cortex of the proposed method. However, since we did not see large difference of dura mislabeling as cortex between aMCI and controls in the PMC-T1 atlas set (0.6%), it seems unlikely that the observed large differences of FreeSurfer dura mislabeling between groups (3.5, 6.5, and 8.6% between patients at early, late prodromal AD, dementia, and controls, respectively) are mainly due to imperfect automatic segmentation of ASHS-T1.

| MTL atrophy in early stages of AD in ADNI
We compared the volume and thickness measures extracted using the proposed pipeline between patients at different stages of AD and Aβ − controls in ADNI and performed a comparison with FreeSurfer. To make this comparison fair, we report here the ASHS-T1 results on the full ADNI data set, without excluding subjects based on the quality control procedure described in Section 2.4.3. However, excluding the subjects with poor ASHS-T1 segmentation quality did not significantly alter the comparison with FreeSurfer, as shown in Table S2.

T A B L E 4 Comparisons of different analysis methods in labeling the dura mater in the PMC atlas set
Method % of dura voxels in manual segmentation labeled as Dura Gray matter Background and CSF ASHS-T1 71.9 ± 6.4 6.5 ± 3.7 21.6 ± 5.9 T A B L E 5 Statistical analysis results using volumetric measurements, adjusted for age and intracranial volume, in discriminating patient groups from Aβ− controls in ADNI. Measurements that survived Bonferroni correction (p < .05/10) are highlighted in bold  thickness were consistently about 50% thicker than the corresponding measurements (ERC and BA35) by ASHS-T1, which is probably due to the mislabeling of dura as cortex. In addition, the mislabeling of dura seems to introduce instability of FreeSurfer measurements of the MTL cortex in Aβ− controls and early stages of AD (preclinical and early prodromal AD). For example, FreeSurfer ERC volume decreased from Aβ− controls (802.5 mm 3 ) to preclinical AD (768.2 mm 3 ) but became slightly higher in early prodromal AD (804.1 mm 3 ). Also, FreeSurfer volume and thickness measurements were more variable (higher SD) than the corresponding measurements generated by the proposal pipeline.

| DISCUSSION
In this article, we present an automatic segmentation pipeline for T1w MRI for measuring MTL subregions accounting for the confound of dura and variable anatomy of the MTL cortex. The cross-validation accuracy of ASHS-T1 relative to manual segmentation was relatively high, with DSC ranging from 0.71 to 0.93. The segmentation accuracy of the ASHS-T1 pipeline is comparable to that of our T2 pipeline (except for ERC for which the accuracy is slightly lower, shown in Table 3). Cross-validation experiments in the PMC-T1 atlas showed that ASHS-T1 can reliably separate dura from gray matter, only mislabeling 6.5% of the dura as gray matter, whereas the FreeSurfer mislabels 62.4% of dura as gray matter, leading to about 50% thicker cortex in ERC and PRC. In the ADNI data set, we showed that the degree of dura mislabeling in FreeSurfer decreases with increasing disease severity, indicating a bias where the cortex is oversegmented to a larger extent in Aβ− controls than in patients. This could potentially lead to an overestimation of group differences in later stages of the disease. Finally, in the ADNI data set, we demonstrated that our pipeline picks up changes in early prodromal AD in the MTL, including in ERC and BA35, which agrees with the known progression of NFT pathology, but also in the posterior hippocampus. Moreover, the volume and thickness loss become more severe and widespread with increasing disease severity.
The ASHS-T1 pipeline has several unique aspects and strengths.
First, it provides granular measures of the MTL, including subdivision of the PRC and hippocampus, for T1w MRI. It could therefore be very useful in clinical trials and large-scale studies (e.g., ADNI) in older populations in the interrogation of, for example, AD or age-related effects on the MTL, the anterior and posterior MTL networks and memory processes that differentially depend on these regions. In contrast to most previous methods for T1w MRI, our multiatlas approach for labeling MTL cortical regions takes into account the anatomical variability of the MTL cortex, which is known to influence the locations of borders between MTL cortical regions. The accuracy of the reported DSC values in the same range (Hu, Coupé, Pruessner, & Collins, 2014;Kim, Caldairou, Bernasconi, & Bernasconi, 2018). With regard to hippocampus, our pipeline performs comparable to state-ofthe-art methods (Collins & Pruessner, 2010;Coupé et al., 2011;Leung et al., 2010;Platero & Tobar, 2016;. In addition, to the best of our knowledge, this is the first automated pipeline that directly labels dura when segmenting MTL subre-  (Chan et al., 2001;Davies, Halliday, Xuereb, Kril, & Hodges, 2009 In light of above described strengths and limitations, there are certain guidelines that should be followed when using ASHS-T1. Careful assessment of the MRI scans and segmentations is important, with common segmentation errors involving minor mislabeling of the dura and infrequent mislabeling of the lateral aspect of the hippocampus, which was observed in a small number of ADNI subjects. Because of the composition of the PMC-T1 atlas set, the most appropriate target population is older adults and MCI patients. However, we also applied the atlas to images of patients with early AD dementia, and careful quality assessment indicated that the atlas performed well in this population. This matches our recent findings that varying the composition of an atlas set between only controls, only MCI and/or AD patients, or a mixture of the two groups, did not significantly affect segmentation accuracy (Xie et al., 2018). However, care is warranted when this atlas and pipeline are applied to other populations including other ages and diseases, or very different imaging protocols. When this atlas is applied to images acquired at a different platform or with a different MRI protocol, it is recommended to use the "Heur" output (Step 5 in Section 2.4.2).
To assess the clinical validity and utility of our pipeline, we applied it to the ADNI data set and compared different stages of AD with Aβ− controls on MTL subregional volume and thickness. Compared to the Aβ− controls, we observed a trend difference in BA35 thickness in preclinical AD (Aβ+ controls), a significant difference in ERC volume, BA35 volume and thickness and posterior and total hippocampal volume in early prodromal AD and in all regions in late prodromal AD and dementia. The observed earliest effect on BA35 is consistent with the earliest accumulation of NFT pathology in this region (Braak & Braak, 1995, 1991Ding et al., 2009). A recent study in a different, only partially overlapping, subset of ADNI showed a similar, but significant, decrease in BA35 thickness in preclinical AD (Wolk et al., 2017) using T2w MRI. The difference in significance may be due to more reliable segmentation of the MTL cortex because of a better contrast and separation of dura in T2w MRI as compared to T1w MRI. In light of the recently published A/T/N model (Jack et al., 2016), in future work, it will be interesting to further select cases who are also tau-positive and investigate whether these subjects show increased neurodegeneration in BA35.
The spreading of atrophy to adjacent ERC and hippocampus in early prodromal AD also matches the known spreading of NFT pathology (Braak & Braak, 1995, 1991Ding et al., 2009) and other studies investigating MTL atrophy patterns in the early stages (Killiany et al., 2002;Krumm et al., 2016;Olsen et al., 2017;Stoub, Rogalski, Leurgans, Bennett, & deToledo-Morrell, 2010;Xu et al., 2000;Yushkevich, Pluta, et al., 2015). The volume loss in posterior hippocampus, rather than anterior hippocampus, was surprising, given that pathology starts in BA35, part of the PRC, which is thought to be more strongly connected to the anterior hippocampus, at least in the primate MTL (Aggleton, 2012) [although some inconsistency in the literature exists (Witter, Van Hoesen, & Amaral, 1989)]. One might therefore speculate that the anterior hippocampus could be affected earlier than the posterior hippocampus in AD. Only a few studies investigated atrophy in the anterior and posterior hippocampus in MCI, where one study reported specific atrophy in anterior regions (Martin, Smith, Collins, Schmitt, & Gold, 2010), but another did not (Greene & Killiany, 2012). Moreover, a qualitative inspection of studies using shape analysis of the hippocampus to investigate granular effects of MCI shows inconsistent findings not clearly pointing towards an anterior-to-posterior gradient of atrophy in MCI (Apostolova et al., 2012;Chételat et al., 2008;Qiu et al., 2009). Additionally, tractography studies in primates indicate that the posterior hippocampus is more strongly connected with the PHC which is in turn connected via the cingulum bundle with regions such as the posterior cingulate cortex and precuneus [this has also been supported by fMRI studies (Aggleton, 2012;Das et al., 2014;Mufson & Pandya, 1984;Poppenk, Evensmoen, Moscovitch, & Nadel, 2013)] which have been indicated recently to show the earliest amyloid pathology (Palmqvist et al., 2017). This amyloid pathology, which is likely already present for years by the time subjects reach the early MCI stage, may have indirectly affected posterior hippocampal integrity. Moreover, the posterior hippocampus is part of the posterior MTL network (Ranganath & Ritchey, 2012), which has been found to already show atrophy in early MCI (Das, Mancuso, Olson, Arnold, & Wolk, 2016).
In general, FreeSurfer performed fairly similar in this data set consisting only of T1w MRI scans in characterizing the MTL atrophy pattern in the different AD stages by finding morphometric changes in PRC and hippocampus in early prodromal AD and increasing atrophy, including ERC, at later stages. The most evident difference in the early stages is a lack of significant ERC volume or thickness loss in early prodromal AD using FreeSurfer. In fact, when looking carefully at the ERC volume measures, a fluctuation can be observed where ERC volume loss is observed in preclinical AD compared to controls but then an increase is observed in early prodromal AD, where ERC volumes again match those in the control group. This may be due to mislabeling dura as ERC which may introduce additional noise. Given that ERC atrophy is expected to be subtle at this stage, and that a bias with regard to the dura mislabeling was observed at later disease stages, the inclusion of dura in the ERC label may lead only to increased measurement error. Surprisingly, even though we observed a bias in FreeSurfer of decreasing mislabeling of dura, this did not lead to larger effect sizes for group differences between late prodromal AD or dementia compared to controls. Perhaps this effect is counterbalanced by some other features of the labels, for example, the effect size may be weakened by the larger anterior extent of ERC other. Having a granular label of BA35 rather than including it in ERC or a larger PRC label is advantageous, especially in the earliest stages of AD where NFT pathology is only thought to affect the transentorhinal cortex, which approximates our BA35 label, and a small portion of the lateral ERC. We did observe BA35 thinning in preclinical AD compared to Aβ− controls with our pipeline, although only at a trend level, which could potentially due to the heterogeneity in disease severity of the preclinical group.

| CONCLUSIONS
In conclusion, we presented a reliable automated pipeline for obtaining granular measures of MTL subregions in T1w MRI, explicitly accounting for the confound of the dura. We demonstrated the clinical utility of this approach by showing atrophy of early Braak regions in early prodromal AD which becomes more severe and widespread in later stages. These findings should be replicated in other cohorts.
Interesting and important future directions are establishing change in MTL regions over time, as longitudinal atrophy is more closely linked to clinical status and is important for tracking disease progression or as potential marker in clinical trials and establishing the association with, for example, tau-PET uptake to better understand the drivers of neurodegeneration. This pipeline could be particularly useful for investigating tau-PET tracers that show high uptake in the dura. We hope that this publicly available atlas and software including a cloudbased service (https://sites.google.com/view/ashs-dox/home and https://sites.google.com/view/ashs-dox/cloud-ashs/overview) will serve the scientific community and enable the interrogation of the role of the MTL in aging, dementia, and cognition.