Demonstrating change from a drop-in engagement activity through pre- and post- graffiti walls: Quantitative linguistics and thematic analysis applied to a space soundscape exhibit

Impact evaluation in public engagement necessarily requires measuring change, however this is extremely challenging for drop-in activities due to their very nature. We present a soundscape exhibit, where young families experienced the usually inaudible sounds of near-Earth space, which used a novel method of evaluation integrating preand postgraffiti walls into the activity. We apply two analysis techniques to the captured before and after data: 1) Quantitative linguistics — Applying Zipf’s law (the power law statistics of words) reveals an increased diversity of language concerning space afterwards, 5 highlighting participants engaged with and reflected upon the sounds; 2) Thematic analysis — Finding and grouping patterns in the qualitative data shows altered conceptions of space around aspects of sound, dynamism, emptiness and electricity, areas highly relevant to the underlying space plasma physics of the sonified data. Therefore, we demonstrate that this novel approach to drop-in activity evaluation has the power to capture change from before to after, and thus short-term impact — specifically in this case showing the power of data sonification in innately communicating science. We suggest the method could be adopted 10 by others in their drop-in engagement activities more broadly.

activity, under typical usages (post-activity only) they are limited in their ability to routinely demonstrate change from, and thus the impact of, the engagement on participants in general.
This paper presents a novel implementation of graffiti walls integrated into both the start and end of a drop-in activity.
This was a soundscape experience surrounding current space science research that used sonified satellite data. We show that 25 this evaluation method (through its design, data collection, and analysis) can indeed capture immediate impact -changed language and conceptions of space in this case. Appendices include details of statistical and qualitative coding techniques employed throughout.

Background
A common misconception is that space is a true vacuum completely devoid of matter and thus there is no activity other than 30 that of the celestial bodies, e.g. planets or asteroids. However, the solar system is permeated by tenuous plasmas -gases formed of electrically charged ions and electrons that generate and interact with electromagnetic fields (e.g. Baumjohann and Treumann, 2012). One such example is the solar wind streaming from the Sun, something which only 58 ± 2% of the UK adult population are aware of (3KQ and Collingwood Environmental Planning, 2015). The solar wind is highly dynamic and as it buffets against Earth's magnetic field generates plasma wave analogues to ordinary sound at ultra-low frequencies (fractions of 35 milliHertz up to 1 Hz) that play key roles within space weather (e.g. Keiling et al., 2016). This contradicts the common belief, perhaps stemming from school science demonstrations such as the bell-jar experiment (see Caleon et al., 2013, for a nuanced discussion) or even popular culture such as the marketing to the movie 'Alien', that there is absolutely no sound in space due to it being "empty".
Sonification -the use of non-speech audio to convey information or perceptualise data (Kramer, 1994) -can be used to 40 convert satellite measurements of these usually inaudible space sounds into audible signals, simply by dramatically speeding up their playback. This has already been leveraged in public engagement projects for both scientific and artistic outputs (Archer et al., 2018;Archer, 2020). Sonification in general has been applied to various scientific datasets (Feder, 2012). Supper (2014) posits that through the public experiencing data in this way it can grip their imagination and produce sublime experiences because of sound's immersive and emotional nature. These arguments, however, are mostly based on reflections from researchers 45 and artists, rather than through the evaluation of participants' own thoughts and feelings. This paper evaluates the short-term impact on participants of experiencing the sounds of space using pre-and post-graffiti walls.

Space Soundscape Exhibit
The space soundscape exhibit was held at the free Science Museum in London (United Kingdom) whose informal learning adopts an inclusive, accessible 'science capital' approach that attracts a diverse range of audiences (Science Musuem Group, 50 2017, 2020). The exhibit formed part of their 'Summer of Space Season', held in celebration of the 50th anniversary of the Apollo moon landings, for which the museum solicited drop-in space-themed activities aimed at young families. It ran between Entrance Undergraduate Ambassadors  the hours 12:00-16:00 during the May 2019 half-term school holiday over the course of 4 days. Figure 1 shows the layout of the exhibit, which was integrated amongst the museum's usual collections, along with accompanying photos. The activity worked as follows: 55 1. Museum attendees are invited to participate at the entrance by undergraduate ambassadors. They are first asked to write or draw on a post-it note what they think space around our planet is like. Some younger children required further prompting beyond this broad question however, with ambassadors often asking "what do you think space sounds like?" The participants place their responses on the pre-graffiti wall and are handed bluetooth wireless headphones playing the sounds of space. 60 2. Participants go on a journey while listening to the sounds, following a set of coloured arrows marked out on the floor. A number of banner stands with further information about the sounds were placed along this path, though it was observed that few people read these. This may be either because participants preferred to listen to the sounds or that it was not clear the stands were part of the experience given the exhibit's location amongst other collections.
3. Near the end of the journey, researchers take participants' headphones and ask them to reflect on what they think about space after having listened to the sounds, again recording their thoughts on post-it notes and placing these on the postgraffiti wall. The researchers would use what they had written or drawn to prompt a short dialogue about aspects of the space environment around Earth and space weather research -a method informed by the 'science capital' research (Archer and DeWitt, 2017).
4. Finally, researchers would change the channel on the headphones so that participants could watch on a large TV screen a 70 series of creative short films inspired by and incorporating the sounds featuring epilogue text reinforcing the importance and relevance of space weather research (Archer, 2020). Surprisingly, these artistic films proved much more popular than anticipated.
While graffiti walls are a common evaluation tool, we are unaware of any public engagement activity that has captured data both before and after a drop-in activity using them. This makes the integrated evaluation within the exhibit novel. The space 75 soundscape was experienced by 1,003 people, recorded using a tally counter. No characteristics about the participants were solicited, though the majority were in family groups (approximately three-quarters were children) with some independent adults also. It was observed that in families typically only the children contributed to the graffiti walls and in many cases accompanying adults did not take headphones when offered, perceiving the activity as just for their children. There were 535 and 446 responses (predominantly textual) on the pre-and post-graffiti walls respectively, rates of 53 ± 2% and 44 ± 2% -80 some 3-10 times greater than reported for typical graffiti walls (Public Engagement with Research team, 2019) likely due to their integration into the soundscape activity here.

Results and Analysis
The data captured on the pre-and post-graffiti walls are displayed in Figure 2. Two approaches are taken in analysing it, namely quantitative linguistics and thematic analysis.

Quantitative linguistics
Quantitative linguistics investigates language using statistical methods and has uncovered several linguistic laws that mathematically formulate empirical properties of languages. One of these is Zipf's law -the frequency of words are inversely proportional to their rank, i.e. the distribution is a power law with exponent −1 (Zipf, 1935(Zipf, , 1949). Zipf's law holds well for almost all languages as well as many other human-created systems (Piantadosi, 2014). The Zipf exponent is a measure of the 90 diversity of words and Baixeries et al. (2013) showed that children's exponents become less-negative / shallower with age, demonstrating increasing variety of language and thus linguistic complexity. However, we are not aware of Zipf's law being exploited in public engagement evaluation before. Figure 3 shows rank-frequency plots of the textual responses to the soundscape before and after the experience. It is clear from these plots that the distributions follow broken power laws (apart from the top word which is of similar frequency before 95 and after), with the break points and exponents being ascertained by a piecewise regression (see Appendix A). Interestingly, the breaks in the two datasets occur at similar ranks namely ∼2-3 and ∼9-10. While the exponents in the lowest ranked segments are consistent with one another, those in the higher rank segment show clear differences -the after dataset exhibits a much shallower exponent. This indicates significant increased diversity of words resulted following the soundscape, signifying that participants engaged with and reflected on the experience rather than perhaps drawing from common associations concerning space. We have therefore demonstrated language change in participants resulting from a public engagement activity through the novel usage of Zipf's law applied to graffiti wall responses.

Thematic analysis
Thematic analysis (Braun and Clarke, 2006) was used to find, group, and analyse the meaning behind both textual and drawn responses. Instead of using pre-determined qualitative codes, the analysis drew on grounded theory (Robson, 2011;Silverman, 105 2010), allowing the themes to emerge from the data as outlined in Appendix B. The main themes and underlying codes determined by the first author were: -Sound: an expression of space being either "silent"/"quiet" or "loud"/"noisy" -Emptiness: relating to "nothing" in the "empty" vacuum or conversely filled with material or activity such as "wind" -Dynamism: whether space is slow ("calm"/"peaceful") or highly dynamic exhibiting busy movement An internal consistency test across the five themes gives a Cronbach's alpha of 0.23, indicating they are largely independent of one another as desired. We therefore quantify the number of responses in each theme and qualitative code (cf. Sandelowski, 2001;Sandelowski et al., 2009;Maxwell, 2010) to investigate any changes from before to after the soundscape experience as 115 shown in Figure 4 relative to the total responses (panel a) and within each theme (panel b).
The theme of sound is highly relevant to the activity and was commonly expressed both before and after the activity.
Responses before mostly considered space to be quiet/silent (61 ± 3% within the theme) but a non-negligible fraction thought it to be loud, which may be due to participants second-guessing the question because of the nature of the activity and/or the phrasing by undergraduate ambassadors. Nonetheless, the overwhelming majority (97 ± 1% within the theme) after the 120 experience expressed space to be a noisy environment -a considerable change to beforehand. We note that the theme of dynamism exhibits quantitatively similar results to that of sound -a clear majority (59 ± 3% within the theme) thought space to be slow beforehand, whereas the vast majority (96 ± 1%) consider it highly dynamic afterwards.
The theme of emptiness (including both of its underlying codes) was quite common in responses beforehand, however it was expressed much less often following the soundscape. As a proportion of all responses, space being full was communicated a 125 similar number of times both before and after. In contrast, the prevailing opinion beforehand was that space is empty and this dramatically reduced following the soundscape, both relative to the total responses (from 47 ± 2% to 2 ± 1% ) and within the theme (from 70 ± 3% to 8 ± 4%).  There was a clear increase in the proportion of responses relating to electricity following the event, from 5 ± 1% to 36 ± 2%. While common space objects (typically expressed through drawings) may appear at first glance of Figure 2 to be more 130 frequent before the soundscape than after, as a fraction of the total number of responses this difference is small and not strictly statistically significant (p = 0.057).
We checked the reliability of all these trends resulting from the qualitative coding by applying log-linear analysis to a subset of the data additionally coded by the co-authors (see appendices for details). Using the notation that I denotes the qualitative codes, J the time (i.e. before or after), and K the different coders, for the results to be consistent one would 135 expect that the IJ(K) test be statistically significant, constituting the reported trends in codes with time, but the IK(J) and JK(I) interactions should not be, indicating independence from individual coders. These statistics are displayed in Figure 4 for each theme (apart from space objects which was less common) indicating the expected behaviour apart from in the case 7 https://doi.org/10.5194/gc-2020-41 Preprint. Discussion started: 13 October 2020 c Author(s) 2020. CC BY 4.0 License. of emptiness -this theme showed some inconsistency between coders for the "full" code, whereas when only "empty" was considered coders were in agreement (G 2 = 32.2, 3.42, 2.06 respectively). Therefore, the main results of the paper are robust 140 and hence we have demonstrated a change in conceptions of space resulting from a drop-in engagement activity.

Conclusions
A challenge within public engagement is evaluating the impact of drop-in activities since this necessitates a measure of change that is appropriate to and commensurate with the engagement (Jensen, 2014;King et al., 2015;Grand and Sardo, 2017).
We have presented a novel implementation of a common evaluation tool, graffiti walls (e.g. Public Engagement with Research 145 team, 2019), which were integrated both before and after a soundscape exhibit on space science research using sonified satellite data. The pre-and post-graffiti walls provided data on participants' conceptions of space and, through their integration into the activity itself, had much higher response rates than is typical. The captured data was analysed in two different ways.
We investigated the statistical properties of the words expressed using Zipf's law from quantitative linguistics -that the frequency of words in languages typically follow power laws whose exponents give a measure of the diversity of words, where 150 shallower exponents indicate greater variety. The distributions from the graffiti walls showed that the exponent for the top ∼2-10 words became significantly shallower from before to after. This demonstrates increased linguistic complexity concerning participants' thoughts about space after the activity. We are unaware of Zipf's law being used in impact evaluation for public engagement before.
We also investigated themes present in the responses, which again yielded significant and robust changes from before to after.

155
While beforehand participants typically expressed common misconceptions of space being completely empty, silent, and with little activity; after experiencing the space sounds they thought space was a noisy and dynamic environment with electrical phenomena present. It is astounding that simply by listening to the sounds these aspects of the underlying space plasma physics were successfully communicated to participants. This therefore demonstrates the power of sonification for audiences, which had been argued by Supper (2014) based on reflections from researchers and artists, however here we have shown it 160 from evaluating participants' experiences directly. The measured changes in conceptions will have been further reinforced by researchers drawing from participants' own reflections in the subsequent dialogues (cf. Archer and DeWitt, 2017).
Overall, integrating existing evaluation tools suitable for drop-in engagement activities, such as graffiti walls, both before and after a drop-in activity can enable practitioners to demonstrate changes resulting from the engagement and therefore its short-term impact. We suggest that our approach, both in terms of data capture and analysis, could be adopted for a range of 165 different drop-in activities beyond just soundscape exhibits.

Appendix A: Statistical techniques
Statistical uncertainties in proportions are estimated using the Clopper and Pearson (1934) conservative method based on the binomial distribution, where standard (68%) errors are shown throughout.
A piecewise linear regression in log-log space was used to minimise the sum of squared error between the data and a model 170 made up of a specified number of line segments whose break points could be varied iteratively. This was performed for an increasing number of segments, each time calculating the degrees-of-freedom-adjusted R 2 which accounts for the number of explanatory variables added to the model: where R 2 is the usual coefficient of determination, n is the number samples, and m = 2s − 1 is the total number of explanatory 175 variables in the piecewise linear model with s segments. The final model was selected as the first peak in R 2 with s. Any segments with only two datapoints are later ignored. The statistical significance of the slopes was determined by ANCOVA with a multiple comparison procedure (Hochberg and Tamhane, 1987) quoting standard errors.
Cronbach's alpha (α C ) is a measure of internal consistency based on average inter-item covariances (Cho, 2016). It can be computed as where σ 2 i denotes the variance of the item i out of k and σ 2 T is the variance of the sum over all items. Cronbach's alpha is typically between 0 and 1, where a value of 1 indicates all items essentially measure the same underlying quantity and thus correlate whereas 0 results from items being independent and uncorrelated.
Finally, log-linear analysis is employed to check the consistency of the changes in coding with time across the different 185 coders. This extension of the χ 2 test of independence to higher dimensions uses a similarly distributed statistic, the deviance, given by  Table B1. Number of responses in each theme before and after the soundscape. 4. Reliability: Codes are applied to a subset of data by second coders to check reliability of results. 5. Finalisation: Theoretical interpretation and narrative are formulated from final coding. Table B1 shows the number of coded responses across words and pictures in each theme and its underlying codes both before 200 and after the soundscape experience. Table B2 shows the number of each coding in the top 16 before (58% of responses) and 15 after (49%) words respectively across the three coders, which is used in the log-linear analysis. The codes' association to the raw data can be found in the supplementary material, along with results from second coders applied to a subset of the data.