Reply on RC2

The report by Crick et al. presents new information, in the form of sulfur isotope composition (33S and 34S) of volcanic eruption-sourced sulfate, on a number of volcanic events dated at about 74,000 years BP in two Antarctica ice cores. The sulfur isotope data show unambiguously that most of these events are explosive eruptions in the low latitudes that injected substantial amounts of sulfurous gases into the stratosphere (above the ozone layer). This is based on the findings in the early 2000s (e.g., Savarino et al. 2003; Baroni et al. 2007) that sulfate formed in the stratosphere from oxidation of certain sulfur species (mainly SO2) possesses nonzero sulfur mass-independent fractionation (S-MIF) signatures. The experimental procedures used in this study, including ice core sampling (multiple samples in an event), sulfur isotope ratio measurement, and correction of isotope contribution from non-volcanic sulfate background in the calculation of S-MIF in ice core samples containing both background and volcanic sulfate, follow previously tested and verified methodology and, therefore, the data appear to be robust and of high quality.

The above summary (by me) indicates that this study confirms results from previous work and adds significant new information. However, the main objective of this study  appears to be to identify, or to narrow the range of, the signal of the Toba eruption in ice cores. This intent is also suggested by the title of the paper ("the 74 ka Toba eruption"). Unfortunately, sulfur isotope signatures, similar to contemporaneous bipolar sulfate events, do not provide undisputable evidence of a specific eruption, even for Toba. Unlike tephra matching, unambiguous S-MIF data are not a "smoking gun".
While we agree that we cannot provide a smoking gun for Toba without tephra, we can use sulfur isotopes to rule out candidate eruptions based on a muted or non-existent MIF signal. We think that systematic analysis of the candidate sulfate peaks T1-9 provides new and valuable information in the identification of the Toba eruption in the ice core record and provides new insights into its timing and relationship to other paleoclimate records.
In this study, the identification of events T1, T2 and/or T3 as resulting from the Toba eruption is based on two pieces of evidence: the precise timing and the stratospheric nature of the events. Would there be other eruptions that meet these criteria? Or, in other words, can we eliminate the possibility that other eruptions left the volcanic sulfate of these events? The fact that at least three events (T1, T2 and T3) meet these criteria, with the small possibility that they were all left by the same Toba eruption, suggests we cannot be highly confident that the answer is yes. In fact, there are reasons to suspect that none of the candidate events is Toba.
We have added further discussion regarding other eruptions which may have resulted in the T1-9 sulfate peaks in section 4.1, lines 280-296: "The combination of an incomplete geological record of past volcanism along with large uncertainties in dating of geological samples mean that it is not possible to unambiguously attribute a volcanic event with a sulfate deposition event in ice cores in the geological past unless there is a tephra confirmation of the source. We take the approach that given the age estimates of Toba, we can investigate all possible candidates within the age uncertainty and rule out candidate eruptions if they have a muted or weak MIF signal. Although we cannot rule out the possibility that other eruptions deposited these sulfate peaks, providing that the dates of the YTT are accurate, and that it emitted substantial sulfur, then at least one of the candidates we investigated is very likely to be Toba. Using the VOGRIPA database (Crosweller et al., 2012;Brown et al., 2014) we have identified other volcanic events over the age range of T1-9 (when considered on the AICC2012 age model). These volcanic events and their associated dates are detailed in the supplementary Table S4. There are 9 events with VEI ≥ 6 at around 74 ka in VOGRIPA, however they often have large age uncertainties associated with the eruption dates (over 10 ka). Thus, there are many more peaks in addition to T1-T9 in the ice core record that could have been deposited by these eruptions. One of the few with a smaller error is a VEI 6 eruption from the Coatepeque Caldera dated to 72 ± 2 ka (Rose et al., 1999). However, the Toba eruption is the largest of the candidate eruptions over the age range encompassed by T1-T9, and in order to find an eruption with significantly larger S deposition than those considered here at EDC, one would have to extend the search to 79.5 ka, which is well outside the uncertainty in the age of the Toba eruption. Therefore, unless the YTT age and its uncertainty is not accurate (or it had an exceedingly small sulfur emission for its eruptive size), Toba must have resulted in at least one of the T1-T9 candidates." The Toba eruption ejected a huge amount of materials -3,800 km3 DRE (Costa et al., 2014). This is more than three orders of magnitude that of the 1815 CE Tambora eruption (~ 1.2 km3 DRE, Self et al., 2004). Estimates of the sulfur (aerosol) output of Toba are also several orders of magnitude larger than that of Tambora. (However, I would discount the aerosol estimates from petrological/volcanological data or ice core data, as these rely on scaling factors (multipliers) that are poorly constrained.) One would expect that the Toba sulfate signal would be exponentially larger than that of Tambora in the same ice core. The volcanic sulfate flux/deposition of all of the three potential Toba events ( Figure 2 and Table S1), except for T2 in EDML, is not overwhelmingly large: they are approximately 1-to-2 times that of Tambora.
The cited volume for Tambora is incorrect by a factor of 40 and referencing is not up to date. From Kandlbauer and Sparks, (2014), the Tambora erupted volume is 41 ± 4 km 3 DRE and so roughly two (not three) orders of magnitude different to Toba. We are unclear the basis of some of the statements made here. We would not expect Toba sulfate signal to be exponentially larger and are not aware of literature estimating the sulfur output of Toba as several orders of magnitude greater than Tambora; indeed this does not make petrological or geochemical sense given nature of the respective magmas.
With the addition of sulfate data from the B32 Antarctic ice core we have recalculated our sulfur loading estimates to account for the deposition to both EDML and EDC cores (L312-315). We have now included further discussion regarding the potential explanations for a lower sulfur yield for the Toba eruption, see below and lines 331-378 in the revised text: "The calculated dense rock equivalent (DRE) for the Toba eruption is considerably higher than events in the Common Era (~3800 km 3 DRE compared to ∼5 and ∼40 km 3 for Pinatubo and Tambora, respectively (Costa et al., 2014;Holasek et al., 1996;Kandlbauer and Sparks, 2014)). Previous sulfur yield estimates for Toba (Rose and Chesner, 1990;Chesner and Luhr, 2010) are based on experimental and petrological data. Chesner and Luhr (2010) analysed pre-eruptive melt inclusions from YTT magma and found low sulfur contents (< 32 ppm). Using 32 ppm, 3800 km 3 DRE volume, a magma density of 2500 kg m -3 , and a crystal content of 30% results in a calculation of 213 Tg S. Thus, the various estimates for Toba using petrological methods are only up to a factor of a few times greater than estimates for much smaller magnitude eruptions (e.g. Tambora 27-29 Tg S (Self et al., 2004) and Pinatubo ~ 10 Tg S (Guo et al., 2004)).
However, estimating the sulfur yield from explosive silicic eruptions is a complex matter and eruption magnitude is only one of several factors that control yield. In silicic magmatic systems, sulfur is partitioned between solid, melt, and fluid phases (Masotta et al., 2016). A major uncertainty in petrologically based estimates is how much sulfur is stored in exsolved fluids in the magma chamber. This latter form of sulfur may be significant or even dominant because of the strong partitioning of S between melt and fluid phase. Furthermore, this partitioning is strongly influenced by speciation of sulfur between sulfide and sulfate and is sensitive to redox conditions (Binder et al., 2018). Redox estimates for YTT suggest relatively reduced conditions around the Ni-NiO buffer (Chesner, 1998). In these conditions, sulfur is very strongly partitioned into the fluid phase from melt (Binder et al. 2018).
Neither the amount nor composition of exsolved sulfur in the magma just prior to eruption are typically well constrained leading to a major uncertainty in estimating yields. The exsolved sulfur source is very hard to estimate but could be comparable to or significantly greater than the sulfur content in the melt. Iacovino et al. (2016) developed a geochemical method to estimate exsolved sulfur in the 'Millennium Eruption' (ME) of Changbaishan, which has an estimated volume of 24 km 3 DRE (Horn and Schmincke, 2000). They used incompatible trace element contents such as U and S content in melt inclusions to estimate a maximum yield of 46 Tg S with most (~90%) of the S in an exsolved fluid phase. The sulfur yield using only S contents in melt inclusions gives only about 5 Tg S. One significant problem with this approach is that large magma chambers that lead to major explosive eruptions take long periods of time to assemble, as exemplified by Toba with estimates of 100's of ka (Reid and Vazquez, 2017). During this time there can be substantial loses of exsolved S from the magma chamber into the hydrothermal system and atmosphere.
Due to the issues discussed above, yields gleaned from ice core records are likely more reliable than petrologically based estimates. Three examples are instructive. For Tambora 1815 ice core yield is estimated at 28.1 Tg S, in good agreement with the petrological estimate and implying that exsolved S did not make a major contribution. For Changbaishan we estimate a sulfur loading of 5.7 Tg S following the methodology of Toohey and Sigl, (2017) from sulfur deposited in the ice core records (Sigl et al., 2015;Sun et al., 2014). This estimate is similar to that based on S dissolved in the melt and so do not support a large fraction of exsolved S. In contrast for Krakatoa 1883 the S yield by the petrological method is only 2.8 Tg S (Mandeville et al., 1996) but the ice core analysis is 7.3 Tg S (Gao et al., 2007). One possible explanation for the discrepancy is the presence of exsolved S in the magma chamber.
To illustrate the variability of the relationship between sulfur loading and DRE, we have calculated a sulfur loading to DRE ratio for a variety of eruptions in Table 1 below. These ratios can vary over an order of magnitude between different events, governed by magmatic processes and conditions, plume dynamics, and preservation. In summary scaling of sulfur yield with magnitude is not simple, even though one might expect sulfur yield to increase with magnitude, with all other controlling factors being equal. On present evidence we would expect Toba to have a yield a few times larger than Tambora and possibly more if exsolved S were significant. Our estimates for T1, T2 and T3 S yield are 5.5, 8.5 and 2.5 times greater than the Tambora eruption respectively." An area of future study would be to look at S isotopes in Toba degassed matrix and melt inclusions to reconstruct S degassing of this magma body (Taylor, 1986).
For T2, the much smaller flux (46.2 mg per square m) for EDC suggests that the flux (424) in EDML (about 9 times that of Tambora) may be an outlier.
This could be the result of a preservation issue leading to a disparity between the two sites due to the low accumulation rate at EDC site. This issue of preservation has been shown before, for example some cores from the EDC site do not record the 1815 CE Tambora event (Gautier et al., 2016). This clarification has been included in the revised text at lines 177-180.
The sulfate flux data of these events in Greenland cores (Svensenn et al., 2013) are also approximately 1-to-2 times that of Tambora. If the Toba eruption resulted in one of the three events, why is its sulfate flux so much smaller than what would be expected? Toba would have to be an exceptionally sulfur-poor eruption to leave one of the three volcanic sulfate signals in ice cores.
As detailed above, some eruptions -such as the Millennium Eruption -despite being large magnitude can deposit comparatively little sulfate to the ice core. In addition, if the Toba eruption was comprised of multiple eruptions, the bulk rock estimates would suggest a single large event because of the temporal resolution of geological ages, while the ice cores could record multiple sulfur peaks. Totalling the sulfur loading estimated from the ice cores for T1, T2 and T3 returns estimates of nearly 8 times that of the sulfur loading due to Samalas (463 Tg S vs 59.4 Tg S) and over 16 times greater than Tambora (28.1 Tg S). This is further clarified at lines 446-449.
The much-smaller-than-expected sulfate signal could be the enigma for identifying Toba in ice cores in ice cores.
Indeed, we agree this is a difficulty if the sulfate deposition to the ice cores due to Toba is small in relative to its estimated magnitude. However, the resulting sulfate peaks would still need to be within the dating estimate and bipolar, leaving T1, T2 and T3 as the best candidates for the Toba eruption in the ice cores.

Estimating eruption plume altitude
The authors of this discussion paper use the extreme cap-delta-33-S values of T1 and T2 to infer that the plume altitude of the eruption clouds must be exceptional high. In fact, they estimate the plume altitude to be at least 45 km for T1 and T2 (Lines 329-330). The estimate is derived or extrapolated from an empirical quantitative relationship between cap-delta-33-S and plume altitude ( Figure 6). I question the validity of the extrapolation for two reasons. First, the quantitative relationship is based on four eruptions (Agung, Pinatubo, Samalas and Tambora) or data points with very large uncertainties. The maximum magnitude of cap-delta-33-S for a volcanic event depends strongly on the sampling resolution during the event, as the value of cap-delta-33-S evolves from positive to negative. This is analogous to peak height measurement dependent on sampling resolution during the peak. As a result, I suspect that the uncertainties for maximum capdelta-33-S values are larger than seen in Figure 6.
Alongside alterations to Figure 5 as recommended by the first reviewer to show the sample resolution for each study cited there, we clarify in the revised text the effect of sampling resolution upon the magnitude of S-MIF signal measured (L420-425). As such for Figure 6 we have reported S-MIF values for studies with high sampling resolution so we are more likely to record close to the maximum Δ 33 S for a given eruption (Agung 1.13 years/sample, Pinatubo 0.7 years/sample, Tambora 0.16 years/sample and Samalas 0.23 years/sample). As an additional test we averaged values from the highest resolution eruptions (Tambora and Samalas) to mimic a reduction in sample resolution. From this test the magnitude of the Δ 33 S signal decreased slightly, we have included a figure in the supplement (Fig. S9) which demonstrates this test. Following helpful discussions with Thomas Aubry (see below), we have also amended Figure 6 to show the distinction between the SO 2 dispersion height and the plume top height, with additional discussion (L412-417).
Second, the authors cite the study of Lin et al. (2018)  We interpret the Lin et al., (2018) study to conclude that due to the presence of cosmogenic 35 S in the samples, that they are measuring sulfur that derived from the stratosphere, even though they collected it in the troposphere. As written in Lin et al.
(2018) "We do not rule out the possibility that there is an unknown SO2 oxidation mechanism which mass-independently enriches 33 S in sulfate products in the free troposphere, but, at present, there is no evidence for the existence of such a process. Consequently, we favor the explanation by which downward transport of stratospheric sulfates is the most plausible source of positive Δ 33 S values in tropospheric sulfates (11-15)".

Second, Lin et al. explained that the relationship is the result of downward transport of stratospheric sulfate with non-zero cap-delta-33-S; this transport from the stratosphere is supported by an altitude-dependent trend of 35S which is only produced in the stratosphere or above. I think it is on a very shaky ground to use the altitude-S-MIF relationship found by Lin et al. to justify a similar relationship for volcanic sulfate in the stratosphere and to estimate the plume altitude of the volcanic eruption. In my view, interpretation of the volcanic S-MIF magnitude is premature; much more research is required to understand the significance of the volcanic S-MIF magnitude.
We consider that the simplest explanation for the Lin data is an altitude dependent Δ 33 S in the modern atmosphere. As written in Lin et al. (2018) "The altitude-dependent variation of Δ 33 S revealed by enrichment of stratospherically sourced 35 S indicates that sulfate aerosols originating from the higher atmosphere possess a greater Δ 33 S value than the boundary layer".
Although we agree that much more research is needed, the data from the eruptions we do have measurements from supports our interpretation. We have refrained from fitting a line to the data, since there are large uncertainties, but we use it to highlight future potential research avenues for S-MIF studies. We have specifically highlighted these large uncertainties with this interpretation in the text below. Further analysis, such as Δ 17 O measurements, would be required to make more robust conclusions regarding the plume height achieved by the Toba eruption. We agree that this interpretation is still in its speculative stage. Thus, we have edited the corresponding section of text to read as follows: "When we compare the S-MIF signals for our Toba candidates, particularly T1, T2, and T3, to previous studies of Common Era events we find that the larger magnitude events, such as Samalas and Tambora, have larger magnitude MIF signals (Fig. 5). Independent geological estimates of eruption plume height are available for numerous Common Era eruptions (Aubry et al., 2021), determined by a range of methods including using cloud positions from satellite sensors for the SO 2 injection altitude (Guo et al., 2004) and modelling lithic clast dispersal for the plume top height (Sigurdsson and Carey, 1989 Costa et al., (2014). A more appropriate eruptive parameter to compare with the maximum D 33 Sis the SO 2 dispersion height, as this is the altitude of the sulfur plume and thus is likely be the altitude at which the S-MIF is inherited. However the SO 2 dispersion height is not as well constrained for past eruptions as the plume top height (Aubry et al., 2021). Where available we have used the SO 2 dispersion altitudes from the literature, compiled in the IVESPA database (Aubry et al., 2021;ivespa.co.uk). For the Tambora and Samalas eruptions we have calculated the SO 2 dispersion height using the ratio of SO 2 height to plume top height for the Pinatubo eruption (0.64). With this data we place the SO 2 dispersion height for Toba at over 30 km (Fig. 6).
Although this tentative relationship between Δ 33 S and plume height supports the conclusion of an altitude dependence of S-MIF by Lin et al., (2018), there are some important caveats to note. For instance, the maximum magnitude of the D 33 S measured in the ice will depend on the sampling resolution and preservation in the core. We have tested the impact of decreasing sampling resolution by averaging samples from Tambora and Samalas from Burke et al. (2019). In these two instances, the lower sample resolution reduces the Δ 33 S volc signal slightly as expected, but the average remains within error of the measured maximum S-MIF value (see Fig. S9). This exercise further illustrates the importance of maximizing sampling resolution, which is made possible by measurement with MC-ICP-MS.
Furthermore, our estimates of plume height or SO 2 dispersion height are based on a tentative relationship from only a handful of Common Era eruptions with large uncertainties associated with whether the S-MIF measurements captured the maximum magnitude of the S-MIF signal (see Supplementary Info, Fig. S9). Thus this relationship should be further investigated and validated. One such method that could provide additional information on plume height is sulfate oxygen MIF measurements (Δ 17 O), since O-MIF in OH in the stratosphere varies with altitude (Zahn et al., 2006) which can then be inherited by sulfate during oxidation (Gautier et al., 2019)." (L402-432).

Recommendation
I would recommend that the paper be revised to (1) (2) reconsider including estimating eruption plume altitude from the S-MIF data.
Thank you for your thorough analysis and recommendations. To address point 1), if none of the sulfate signals measured were from Toba then either 1) the sulfur loading from Toba is even lower than we have suggested here or 2) the Ar/Ar dates are inaccurate and/or their uncertainties are underestimated. Since there are no other large bipolar events in the ice cores within the age uncertainty of the YTT date, one would have to extend the search to 79.5 ka, which is well outside the reported uncertainty in the age of the Toba eruption. Our comparison to radiometrically dated speleothem records shows that the absolute age of the AICC2012 timescale is within the uncertainty of the U-series ages (~200 years). Thus, if the Ar/Ar dates and uncertainties are accurate then at least one of these candidates must be associated with the Toba eruption. Given that other large eruptions such as the Millennium Eruption also have relatively low sulfur yields, we think that a low sulfur yield for Toba is a more likely explanation than inaccurate Ar/Ar ages. Regarding the estimate of plume altitude, as noted above we have expanded further upon the caveats associated with this estimate and reiterate the need for further research regarding the relationship between S-MIF and eruption plume altitude. Following discussions with Thomas Aubry we have modified Figure 6 to display the plume top height and have included further data with the SO 2 dispersion height for each Common Era eruption, an important distinction for climate modelling.
Finally, we have included additional analysis of the B32 Antarctic core which, due to its proximity to the EDML ice core, has allowed us to further compare the Toba candidates preserved in EDML to Common Era eruptions in B32 and recalculate the sulfur loading due to Toba (L176-178; 312-315).