Demonstrating change from a drop-in space soundscape exhibit by using graffiti walls both before and after

Impact evaluation in public engagement necessarily requires measuring change. However, this is extremely challenging for drop-in activities due to their very nature. We present a novel method of impact evaluation which integrates graffiti walls into the experience both before and after the main drop-in activity. The activity in question was a soundscape exhibit, where young families experienced the usually inaudible sounds of near-Earth space in an immersive and accessible way. We apply two analysis techniques to the captured before and after data - quantitative linguistics and thematic analysis. These analyses reveal significant changes in participants' responses after the activity compared to before, namely an increased diversity in language used to describe space and altered conceptions of what space is like. The results demonstrate that the soundscape was surprisingly effective at innately communicating key aspects of the underlying science simply through the act of listening. The impacts also highlight the power of sonification in stimulating public engagement, which, through reflection, can lead to altered associations, perceptions, and understanding. Therefore, we show that this novel approach to drop-in activity evaluation, using graffiti walls both before and after the activity and applying rigorous analysis to this data, has the power to capture change and, thus, have a short-term impact. We suggest that commonly used evaluation tools suitable for drop-in activities, such as graffiti walls, should be integrated both before and after the main activity in general, rather than only using them afterwards as is typically the case.


Introduction
Drop-in activities -short, interactive, two-way engagements -tend to form a significant fraction of all non-school public engagement, e.g. 31 ± 3 % of all public activities across the UK's South East Physics Network in 2017-2018 were less than 30 min in duration per individual (Galliano, 2018). Such activities, however, are difficult to effectively evaluate the impact of, since this necessitates a measure of change in the participants (King et al., 2015). While surveys both before and after may be one of the most robust methods of impact evaluation in general (Jensen, 2014), these are neither appropriate for, nor commensurate with, drop-in activities. This is because participants are arriving all the time, the engagement duration is so short, and surveys risk affecting the participants' experience (Grand and Sardo, 2017). A number of evaluation tools more suitable for drop-in activities have been reported including feedback cards, rating cards, snapshot interviews, and graffiti walls (e.g. Grand and Sardo, 2017;Public Engagement with Research team, 2019). Graffiti walls are large areas (often a wall, whiteboard, or large piece of paper) on which participants are free to write or draw responses in reaction to the engagement activity or some prompt question, either directly on the area itself or by sticking responses to it. All of these evaluation methods for drop-ins are particularly useful in process evaluation, i.e. assessing the implementation of the activity. Under typical usages (post-activity only), however, they are limited in their ability to routinely demonstrate a change caused by, and thus the impact of, the engagement activity on participants in general. This paper presents a novel implementation of graffiti walls for impact evaluation, integrating them into both the start and end of a drop-in activity. The activity was a soundscape experience surrounding current space science research that used geostationary satellite data converted into audible sound. We show that this evaluation method (through its design, data collection, and analysis) can indeed capture immediate impact -in this case, it was changed language and conceptions of space. The Appendices include details of the statistical and qualitative coding techniques employed throughout.

Background
A common misconception is that space is a true vacuum completely devoid of matter, and thus, there is no activity other than that of the celestial bodies, e.g. planets or asteroids. However, the universe is permeated by tenuous plasmas -gases formed from electrically charged ions and electrons that generate and interact with electromagnetic fields (e.g. Baumjohann and Treumann, 2012). One such example is the solar wind streaming at several hundreds of kilometres a second from the Sun to the edge of the heliosphere, something of which only 58±2 % of the UK adult population is aware (3KQ and Collingwood Environmental Planning, 2015). Space plasmas are not just limited to our solar system, with other stars also having their own stellar winds (e.g. Lamers and Cassinelli, 1999), and the interstellar medium bridges the gap between these plasma bubbles in outer space (Gurnett et al., 2013).
The presence of a medium in space allows for plasma wave analogues to ordinary sound (pressure waves) that occur at ultra-low frequencies -fractions of millihertz up to 1 Hz. They are routinely measured by many space missions and can have perturbations that are significant fractions of the background values. For a further discussion of the equivalence of these plasma waves to sound, see Archer (2020a). One way in which ultra-low frequency waves are generated is through the highly dynamic solar wind buffeting against Earth's magnetic field. This process plays a key role within space weather and, thus, how phenomena from space can affect our everyday lives (e.g. Keiling et al., 2016). However, the belief by the public that space is completely empty in turn leads many to incorrectly think that there is absolutely no sound in space, and this is reinforced by school science demonstrations such as the bell jar experiment (see Caleon et al., 2013 for a nuanced discussion of this experiment and sound in near-vacuum conditions) or even popular culture like in the marketing for the movie Alien (which stated "in space no one can hear you scream"). Public engagement with this research area may help correct this fallacy.
Sonification -the use of non-speech audio to convey information or perceptualise data (Kramer, 1994) -can be used to convert satellite measurements of these usually in-audible space sounds into audible signals simply by dramatically speeding up their playback (Alexander et al., 2011(Alexander et al., , 2014. This has already been leveraged in public engagement projects for both scientific and artistic outputs (Archer et al., 2018;Archer, 2020b). Sonification in general has been applied to various scientific data sets (Feder, 2012). Supper (2014) posits that, through the public experiencing data in this way, it can grip their imagination and produce sublime experiences because of sound's immersive and emotional nature. These arguments, however, are mostly based on reflections from researchers and artists rather than being based on the evaluation of the participants' own thoughts and feelings. This paper evaluates the short-term impact on participants who experienced the sounds of space by using graffiti walls both before and after a soundscape.

Space soundscape exhibit
The space soundscape exhibit was held at the free Science Museum in London (United Kingdom), whose informal learning adopts an inclusive, accessible science capital approach that attracts a diverse range of audiences (Science Museum Group, 2017, 2020. Science capital is defined as the total science-related knowledge, attitudes, experiences, and resources that a person has built up over their life (Archer and DeWitt, 2017). This includes what science they know about, what they think and feel about science, the people they know and their relation to science, and the day-to-day engagement they have with science. The exhibit formed part of the museum's "Summer of Space Season", held in celebration of the 50th anniversary of the Apollo moon landings, for which the museum both solicited and subsequently hosted drop-in space-themed activities aimed at young families. It ran between the hours 12:00-16:00 local time, during the May 2019 half-term school holiday, over the course of 4 d.
The purpose of the space soundscape was primarily to provide young children and their parents/carers (as key influences upon them) with an accessible and immersive experience of space research that would enable participation and spark discussion. Such experiences may, when taken in conjunction with all the other formal and informal interactions with science afforded to a young person, contribute towards developing their science identity and, hence, help build their science capital. Using a generic learning outcomes framework (Hooper-Green, 2004), the main intentions of the activity fell within the realms of "enjoyment, inspiration, creativity" and "attitudes and values", with the explicit enhancement of "knowledge and understanding" being only a secondary aim. Figure 1 shows the layout of the exhibit, which was integrated amongst the museum's usual collections, along with accompanying photos. The activity worked as follows: 1. Museum visitors were invited to participate in the activity at the entrance by undergraduate ambassadors. They were first asked to write or draw on a post-it note what they thought space around our planet is like. Some younger children required further prompting beyond this broad question, however, with ambassadors often asking "what do you think space sounds like?" The participants placed their responses on the pre-soundscape graffiti wall and were handed Bluetooth wireless headphones that played the sounds of space.
2. Participants went on a journey while listening to the sounds, following a set of coloured arrows marked out on the floor. A number of banner stands with further information about the sounds were placed along this path, though it was observed that few people read these. This may be either because participants preferred to listen to the sounds or because it was not clear the stands were part of the experience, given the exhibit's location amongst other collections.
3. Near the end of the journey, researchers took back the participants' headphones and asked them to reflect on what they think about space after having listened to the sounds. Participants then recorded their thoughts on post-it notes again and placed these on the postsoundscape graffiti wall. The researchers would use what they had written or drawn to prompt a short dialogue about aspects of the space environment around Earth and space weather research. This method was informed by the science capital research (Archer and De-Witt, 2017), which recommends scientists use and value participants' own experiences within their engagement practice to help enable lower science capital audiences to feel included in science and feel that science is for "people like me". These discussions provided an opportunity to solidify, or in some cases clarify, the associations that participants made from the soundscape experience in a tailored and audience-focused way (e.g. only going into an appropriate level of detail, depending on the individual or group).
4. Finally, researchers would change the channel on the headphones so that the participants could watch a series of creative short films inspired by and incorporating the sounds on a large TV screen (Archer, 2020b). The films also featured epilogue text that reinforced the importance and relevance of space weather research. Surprisingly, these artistic films proved much more popular than anticipated.
The graffiti walls were used as an open opportunity for participants to reflect upon their perceptions and associations with space both before and after the soundscape, with this being intentionally left broad to elicit a wide range of possible responses and, thus, potential impacts. This method was chosen specifically due to its suitability for evaluating dropin activities, its ability to be integrated within the activity itself, and its alignment with our intended overall experience for participants. While graffiti walls are a common evaluation tool, we are unaware of any published public engagement activity that has used them to capture and analyse data both before and after a drop-in activity. This makes our evaluation approach for the exhibit novel. Ethical considerations in the design of the exhibit and its evaluation followed the British Educational Research Association (BERA, 2018) guidelines and were discussed with institutional funders and the Science Museum before the activity occurred. All respondents consented to providing graffiti wall responses as these were not mandatory for participa-tion in the soundscape exhibit. Children only participated in any of the activities when accompanied by their appropriate adult. All data collected were anonymous and no characteristics about participants were solicited. Overall, it was determined that (due to the nature of the exhibit, its design, and the types of responses being collected) there was very little risk of harm arising from participation.
The space soundscape was experienced by 1003 people, which was recorded using a tally counter. The majority were in family groups (approximately three-quarters were children, based on observations) with some independent adults too. It was observed that in families typically only the children contributed to the graffiti walls (with no substantive difference in respondents before and after), and in many cases, accompanying adults did not take headphones when offered, perceiving the activity as just for their children. There were 535 and 446 responses (predominantly textual) on the preand post-soundscape graffiti walls, respectively, corresponding to rates of 53±2 % and 44±2 %. This is some 3-10 times greater than reported for typical graffiti walls (Public Engagement with Research team, 2019), likely due to their integration into the overall activity here.

Results and analysis
The data captured on the pre-and post-soundscape graffiti walls are displayed in Fig. 2. However, simply presenting the data is insufficient to robustly demonstrate any potential changes and, thus, impacts. Instead, analysis is required, and two approaches are taken here, namely quantitative linguistics and thematic analysis.

Quantitative linguistics
Quantitative linguistics investigates language using statistical methods and has uncovered several linguistic laws that mathematically formulate the empirical properties of languages. One of these is Zipf's law, which states that the frequency of words is approximately inversely proportional to their rank (where the more often a word is used, the higher its rank, i.e. closer to 1; Zipf, 1935Zipf, , 1949). An alternative way in which this law is stated is that the statistical distribution of word ranks follows a power law with an exponent that is typically quoted as −1. Zipf's law holds well for almost all languages and many other human-created systems (Piantadosi, 2014). The Zipf exponent, however, can vary and is a measure of the diversity of words. Baixeries et al. (2013) showed that children's Zipf exponents become less negative/shallower with age, demonstrating an increasing variety of language and, thus, linguistic complexity as they develop. However, we are not aware of Zipf's law being exploited in a public engagement evaluation before. Figure 3 shows the rank frequency plots of the textual responses to the soundscape before and after the experience. This particular analysis, thus, omits any purely pictorial re-sponses. Ties in ranks have been accounted for by using standard competition ranking (also known as the "1224" ranking, where a gap is left following the tie). It is clear from these plots that the distributions follow broken power laws (apart from the top word which is of similar frequency before and after). Break points and exponents have been ascertained by a piecewise regression (see Appendix A). Interestingly, the breaks in the two data sets occur at similar ranks, namely ∼ 2-3 and ∼ 9-10. We are not concerned with the specific values of the Zipf exponents, which could depend on the demographics of the participants, but simply whether they changed from before to after and in what sense. The exponents in the higher rank segments show clear differencesthe after data set exhibits a much shallower exponent. The lowest ranked segments are, in contrast, consistent with one another. The top 10 ranks constitute 62 ± 2 % of the words before and 45 ± 3 % after, making the two entire distributions significantly different (p = 8 × 10 −11 in a two-sample Kolmogorov-Smirnov test; see Appendix A). The overall result is that there was an increased diversity of words resulting following the soundscape. We interpret this positive impact as signifying that the participants engaged with and reflected on the stimulating experience afterwards rather than continuing to draw from common associations concerning space, which they likely did beforehand. We have, therefore, demonstrated language change in participants resulting from a public engagement activity through the novel usage of Zipf's law applied to graffiti wall responses.

Thematic analysis
Thematic analysis (Braun and Clarke, 2006) was used to analyse the meaning behind both textual and drawn responses. This finds patterns, known as qualitative codes, in the data which are then grouped into broader related themes. Instead of using pre-determined codes, the analysis drew on grounded theory (Robson, 2011;Silverman, 2010), allowing the themes to emerge from the data, as outlined in Appendix B. This more exploratory and data-driven approach enables unexpected outcomes and impacts (whether positive or negative) to come to light, rather than analysing the qualitative data only through a particular lens based on specific intended outcomes. The main themes and underlying (typically antithetical) codes determined by the first author are given in Table 1.
We quantify the number of responses in each theme and qualitative code (see Sandelowski, 2001;Sandelowski et al., 2009;Maxwell, 2010) to investigate any changes from before to after the soundscape experience. These are shown in Fig. 4 relative to the total responses (Fig. 4a) and within each theme (Fig. 4b).
The theme of sound is highly relevant to the activity and was commonly expressed both before and after. Responses beforehand mostly considered space to be quiet and/or silent (61 ± 3 % within the theme). However, a non-negligible frac-  tion thought it to be loud, which may be due to participants second-guessing the question because of the nature of the activity and/or the phrasing by undergraduate ambassadors. Nonetheless, the overwhelming majority (97±1 % within the theme) of participants expressed that space is a noisy environment after the experience -a considerable change to beforehand. The perceived loudness of sound, both in terms of human hearing and measurement, necessitates logarithmic scales (Robinson and Dadson, 1956). Such scales, like the decibel, therefore require some reference base level. For sound, this is typically set at the threshold pressure for human hearing of 20 µPa (Roeser et al., 2007). One must remember though that pressure fluctuations depend on the background pressure level too (100 000 Pa at sea level). Therefore, while the absolute amplitude of variations in space are clearly small, relative to the background they are large (as was noted in Sect. 2), and thus, one can consider space to be noisy in this sense. Another equally valid perspective is that the process of sonification has revealed the presence of sound that would otherwise not be audible, and thus, participants have discovered, thanks to the exhibit, that space is noisier than they had previously imagined. We note that the theme of dynamism exhibits quantitatively similar results to that of sound -a clear majority (59 ± 3 % within the theme) thought space to be slow beforehand, whereas the vast majority (96±1 %) considered it to be highly dynamic afterwards. The dynamism of Earth's magnetosphere is relative to the natural timescales of the system. The typical periods of oscillations are of the order of several minutes to tens of minutes, and the properties of the waves (and even their drivers) can significantly change within just a few wave periods (e.g. Keiling et al., 2016). This is unlike most sounds we are used to on Earth, which often remain coherent for many hundreds or even thousands of oscillations. Therefore, just like with sound, space around our planet can be considered dynamic, both relative to the properties of the environment and relative to participants' prior expectations.
The theme of emptiness (including both of its underlying codes) was quite common in responses beforehand, however it was expressed much less often following the soundscape. The prevailing opinion before was that space is empty and this dramatically reduced following the soundscape, both relative to the total responses (from 47 ± 2 % to 2 ± 1 % ) and within the theme (from 70 ± 3 % to 8 ± 4 %). In contrast, the expression of space being full was communicated a similar number of times both before and after. Therefore, participants who had previously thought space was empty typically went on to write words that fell within a different theme, rather than a response signifying space as being filled with material. Since space is not absolutely devoid of material, as it is permeated by tenuous plasmas, the exhibit successfully challenged this common misconception.
There was a clear increase in the proportion of responses relating to electricity following the event, from 5 ± 1 % to 36 ± 2 %. Electricity is of fundamental importance to the plasma state, and thus, the increased realisation of this by participants is a welcome change resulting from the exhibit.
At first glance, common space objects, such as planets, stars, or satellites (typically expressed through drawings), may appear to be more frequent before the soundscape than after in Fig. 2. As a fraction of the total number of responses, though, this difference is small and not strictly statistically significant (p = 0.057).
We checked the reliability of all these trends resulting from the qualitative coding by applying a log-linear analysis to a subset of the data additionally coded by the coauthors (see the Appendices for details). Using the notation that I denotes the qualitative codes, J the time (i.e. before or after), and K the different coders for the results to be consistent, one would expect that the I J (K) test be statistically significant, constituting the reported trends in codes with time, but the I K(J ) and J K(I ) interactions should not be, indicating independence from individual coders. These statistics are displayed in Fig. 4 for each theme (apart from space objects, which were less common), indicating the expected behaviour -apart from in the case of emptiness. This theme showed some inconsistency between coders for the full code, whereas when only the empty code was considered, the coders were in agreement (G 2 = 32.2, 3.42, and 2.06 respectively). Therefore, the main results of the paper are robust, and hence, we have demonstrated a change in participants' conceptions of space, well-aligned with the underpinning research, that resulted from this drop-in engagement activity.

Conclusions
A challenge within public engagement is evaluating the impact of drop-in activities, since this necessitates a measure of change using evaluative tools that are appropriate to and commensurate with the engagement (Jensen, 2014;King et al., 2015;Grand and Sardo, 2017). We have presented a novel implementation and analysis stemming from a common evaluation tool, namely graffiti walls (e.g. Public Engagement with Research team, 2019). These were integrated both before and after a soundscape exhibit on space science research using sonified satellite data. The pre-and postsoundscape graffiti walls provided data on participants' conceptions of space and, through their integration into the activity itself, had much higher response rates than is typical. The captured data were analysed in two different ways.
We investigated the statistical properties of the words expressed by using Zipf's law from quantitative linguistics. This states that the frequency of words in languages typically follow power laws for which the exponents give a measure of the diversity of words and where shallower exponents indi- cate greater variety. The distributions from the graffiti walls showed that the exponent for the top ∼ 10 words (constituting 62 ± 2 % of the responses before and 45 ± 3 % after) became significantly shallower from before to after, whereas the exponents were consistent for the remaining words. This demonstrates an overall increased linguistic complexity concerning participants' thoughts about space following the activity. This positive result aligns with the exhibit's aims in the realm of "enjoyment, inspiration, creativity" (see Hooper-Green, 2004), since being exposed to the sounds of space led to stimulation, reflection, and, ultimately, a more diverse and creative set of words about space than had been expressed beforehand. We are unaware of Zipf's law being used in impact evaluation for public engagement before.
We also investigated themes present in the responses, which again yielded significant and robust positive changes from before to after. Beforehand, participants typically expressed common misconceptions of space being completely empty, silent, and with little activity. However, after experiencing the space sounds, they felt space was a noisy and dynamic environment with electrical phenomena present. It is astounding that, simply by listening to the sounds, these simple aspects of the underlying space plasma physics were successfully and innately communicated to participants before they even spoke to the researchers. This therefore demon-strates the power of sonification for audiences. While this had been argued by Supper (2014) based on reflections from researchers and artists, here we have shown it from a direct evaluation of the participants' experiences. Therefore, we have shown positive effects in the realms of "knowledge and understanding" and "attitudes and values" (see Hooper-Green, 2004) resulting from the soundscape. The measured changes in associations, conceptions, and perceptions will have been further reinforced by researchers drawing from participants' own reflections in their subsequent dialogues (see Archer and DeWitt, 2017).
Overall, integrating existing evaluation tools suitable for drop-in engagement activities, such as graffiti walls, both before and after a drop-in activity can enable practitioners to demonstrate changes resulting from the engagement and, therefore, its short-term impact. However, such tools are typically only used following activities, which limits the ability to demonstrate some measure of change and, thus, impact. We suggest that our approach, both in terms of data capture and analysis, should be adopted more regularly -not just for soundscape exhibits but for a range of different drop-in activities in general. Statistical uncertainties in proportions are estimated using the Clopper and Pearson (1934) conservative method based on the binomial distribution, where standard (68 %) errors are shown throughout.
A piecewise linear regression in log-log space was used to minimise the sum of the squared error between the data and a model made up of a specified number of line segments whose break points could be varied iteratively. This was performed for an increasing number of segments, each time calculating the degrees-of-freedom-adjusted R 2 which accounts for the number of explanatory variables added to the model as follows: where R 2 is the usual coefficient of determination, n is the number samples, and m = 2s − 1 is the total number of explanatory variables in the piecewise linear model with s segments. The final model was selected as the first peak in R 2 with s. Any segments with only two data points are later ignored. The statistical significance of the slopes was determined by an analysis of covariance (ANCOVA), with a multiple comparison procedure (Hochberg and Tamhane, 1987). The standard errors in the slopes quoted are derived from a propagation of uncertainty in the proportions within the linear regression. A two-sample Kolmogorov-Smirnov test is used to nonparametrically test the equality of two probability distributions. It quantifies the distance between two 1D empirical (cumulative) distribution functions F 1,n (x) and F 2,m (x) as follows: where sup is the supremum function (Massey, 1951). The critical value of this statistic is given by − 1 2 ln (α/2) (m + n) /mn for desired significance α. Finally, log-linear analysis is employed to check the consistency of the changes in coding with time across the different coders. This extension of the χ 2 test of independence to higher dimensions uses a similarly distributed statistic, the deviance, given by the following: for observed O ij k and expected E ij k frequencies (Agresti, 2007). Here we assess conditionally independent models denoted as I J (K), which tests the two-way I J interaction with the effects of the I K and J K interactions removed. Computationally, this calculates G 2 for each level of K summing the results, with G 2 having (n I − 1)(n J − 1)n K degrees of freedom. Table B1 shows the number of responses (both unique and total) across words and pictures in each theme and its underlying codes both before and after the soundscape experience.
To ensure the reliability of the main qualitative coding of the entire data set, second coders applied the thematic analysis to a subset of the data. This subset constituted the top 16 words before (58 % of total responses) and 15 words after (49 %), with the slightly different number of words used in the two data sets being due to ties in the ranking of words making it impossible to have exactly the same number in both. Table B2 shows the totals of how these unique words were grouped across all three coders. These results are used in the log-linear analysis to test reliability, which we note does not require equally sized data sets. The code's association to the raw data can be found in the Supplement, both for the main and second coders.  Table B2. Statistical comparison of the number of unique words in each qualitative code as judged by different coders across a subset of the data (the top 16 words before and 15 words after).
Data availability. Data supporting the findings are contained within the article and the Supplement.
Author contributions. MOA conceived the project and its evaluation, performed the analysis, and wrote the paper. ND and SB assisted with the analysis.