# SAXS (Small Angle X-ray Scattering)

• SAXS enables us to obtain X-ray scattering curves for various samples of our box (varying closing hinge torque and opening bracing angle). Diffraction peaks are the key features of these scattering curves and serve as the basis of all of our analysis
• Ab initio modelling suggests that an increase in opening bracing angle in brace position 2 correspond to a more open box. However, the modelling does seem to suggest the brace in brace position 1 and closing hinge torque may not be working as per design
• We developed a unique approach to fitting theoretical scattering curves to experimental data. By using a derivative-based algorithm instead of a value-to-value comparison method such as RMSD, the alignment of characteristic diffraction peaks is prioritized over an often encountered intersection problem
• A sawtooth trend in the phase shift (location) of mid $q$ diffraction peaks for boxes with increasing closing hinge torque suggests that the closing hinge torque may only be effective to an angle of 35 degrees before functional failure

Raw data, scripts and other SAXSy material can be found here

## Background

### Sample Analysis Using Electromagnetic Radiation (EMR)

When electromagnetic radiation (EMR) encounters an obstacle (a beaker filled with a solution, an object, a slit, etc.), it can be transmitted, absorbed, scattered, refracted or diffracted. Spectroscopy techniques (such as FRET) rely on the absorption and/or re-emission of EMR, but they cannot generate or verify an image, model or ‘molecular envelope’ (a hypothetical three-dimensional surface that surrounds a molecule – Figure 1) of our boxes. Images and models are crucial to verifying proper origami formation and characterising their behaviour, particularly if the data is obtained from the structures in solution.

Figure 1: The concept of a molecular envelope

In order to ‘see’ our boxes, our analysis technique must involve wavelengths in the order of magnitude of the size of our boxes (~10nm). TEM exploits the wave-particle duality of electrons, which have a de Broglie wavelength of 1.23nm. With wavelengths between 0.01-10nm, X-rays are also an ideal candidate. X-ray crystallography and small angle X-ray scattering (SAXS) are two analysis techniques which might enable us to characterise our boxes. However, as there is no robust method for crystallising DNA origami structures and since we are interested in the behaviour of monodisperse boxes in solution, SAXS is the best approach we can take to obtain or verify a solution structure of the origami structures.

It is important to understand how X-rays are scattered by molecules in solution because the mechanism justifies the data analysis approach (specifically, the use of Fourier transforms). EMR can be scattered by a particle directly (Rayleigh and Mie scattering) or, in the case of X-ray scattering, by interactions with an electron cloud (Thomson scattering – Figure 2). In the latter case, absorption of a photon by an electron results in the electron being accelerated. An accelerating free charge (electron) in turn emits radiation in a spherical wave that has the same wavelength as the incident wave (elastic scattering).

Figure 2: Thomson scattering of X-rays. Note that while the wavelength of the scattered wave does not change, a spherical wave is re-emitted

### What is SAXS?

Small angle X-ray scattering (SAXS) is a ‘low resolution’ imaging technique that exploits the elastic scattering of X-rays to provide information about the shape and structure of a molecular envelope within a solution. A sample is exposed to what is ideally a monochromatic and collimated beam of X-rays (from an X-ray source such as a synchrotron). While most of the radiation is transmitted through the sample, some X-rays are scattered. The angle at which the X-rays are scattered depends on particle size – larger particles scatter at smaller angles and smaller particles scatter at larger angles. A detector records the intensity of radiation across a two-dimensional space.

A SAXS result is obtained by spherically integrating the intensity for a given magnitude of the scattering vector $q$ (i.e. across an iso-radial space – this should represent one angle of scattering from the solution). The intensity is then plotted against the magnitude of the scattering vector $q$ for one frame (one period of exposure) – the result is a curve in reciprocal space that represents intensity as a function of the magnitude of $q$ (see Figure 3). These curves serve as the basis for all the possible avenues of SAXS data analysis – as such, obtaining good quality data is crucial to the elucidation of the structure of the origami boxes in solution.

Figure 3: Thomson scattering of X-rays. Note that while the wavelength of the scattered wave does not change, a spherical wave is re-emitted

### Sample Limitations and Constraints

Like any other analysis technique, SAXS has its limitations:

• Traditional SAXS analysis relies on mono-dispersity. If the curve is non-linear at low $q$ values (i.e. it does not follow the Guinier relationship), it is likely that the sample is not mono-disperse or that the sample is aggregating/polymerising

• There needs to be sufficient contrast between the buffer and the sample. This means that there must be a large difference between the electron density of the buffer and the electron density of the origami. If this is not the case, the signal to noise ratio will be too low to elucidate any structural information about the boxes

## The Overall Aim of SAXS

The overall aim of SAXS is to functionally characterise our boxes, that is, investigate the effects of varying hinge torque and bracing angle. This can be achieved by looking at changes in the maximum observable dimension or the radius of gyration, a phase shift in characteristic diffraction peaks or by observing a change in the angle of opening of models that agree with experimental data.

## SAXS Data Analysis (Results and Discussion)

A traditional approach to SAXS data analysis involves carrying out a Fourier transformation of a scattering curve in reciprocal space. $I(q)$ is a product of a scaling factor $K$, a form factor $P(q)$ and a structure (or solution) factor $S(q)$ (i.e. $I(q)=K \times P(q) \times S(q)$). If it is shown that the sample is mono-disperse, then $S(q)=1$ and the scaled form factor curve ($I(q)=K \times P(q)$) can be used to generate a pair distance distribution function (or $P(r)$ curve) in real space. This distribution can give us information related to the shape, structure, radius of gyration and maximum diameter (or length) or our structures. Much of this analysis can be carried out by programs in the ATSAS software suite (particularly Primus).

The Guinier relationship is used to determine if a sample contains aggregates – if a linear fit cannot be applied to a plot of $ln(I)$ vs. $q^2$ at low $q$ values, it can be concluded that the sample is not mono-disperse. This is the case in Figure 4, which is indicative of all of the sample curves obtained from the data processing procedure. As we have aggregates/polymeric boxes in solution, the traditional avenues of analysis are unavailable to us.

Figure 4: Primus attempts to fit a linear fit (red line) to the first 50 points of one set of experimental data (BM01_d1, blue dots) in a Guinier plot. It is clear that this sample data, much like the others, is not linear at low $q$ values.

### Ab Initio Approach

Despite evidence for the presence of aggregates, clear diffraction peaks in the mid $q$ range suggest that information about the structure of our box was retained in scattering data. As such, we generated a set of models representing our box open to varying degrees. Using these models, we were then able to generate theoretical scattering patterns and fit them to experimental data (i.e. align their diffraction peaks). This process enabled us to determine the best model for the experimental data corresponding to a particular sample. Once we have models for each sample, we can complete a rigorous functional characterisation of our boxes in solution. We have taken an ab-initio (lit. ‘from the beginning’) approach by using dummy atom models of our boxes to produce theoretical scattering curves. These models were produced by taking a .bild file of a control box (no torqued hinge, no brace) from the cANDo simulations and generating a .pdb file for every possible angular displacement between the two halves of the box (in the range of 0 to 195 degrees with 5 degree increments, Figure 5). This was completed by running two C shell scripts – cando2atom and FlyTrap.

Figure 5: A dummy atom model of a box with an angle of opening of 95 degrees

These models were then used to generate theoretical scattering curves (via CrySol, part of the ATSAS software suite) using a C shell script (crysol.sh contains all the parameters used in CrySol). The model scattering curves were scaled to the buffer-subtracted experimental data of interest by CrySol. After running a Python script that compared the theoretical scattering curves to the experimental data, it was determined that open box models produced scattering patterns that were consistent with experimental data. It also became apparent that the location of the mid $q$ region diffraction peaks in the theoretical scattering curves varied depending on the extent to which the model box was open.

This sort of analysis methodology relies on the model being accurate. It is possible that some incomplete or misassembled boxes were retained after PEG purification and shot with X-rays. If this is the case, the experimental curve could be represented as a linear combination of the scattering curves of complete and incomplete boxes. The resultant curve would have diminished peaks and troughs as the relative scattering intensity of a particular configuration in an inhomogeneous solution would be reduced. Furthermore, structural flexibility and solution scattering can be responsible for the diminishing or ‘muting’ of diffraction peaks. While these errors are minimised by averaging multiples frames (shots) of the sample, placing an emphasis on aligning the diffraction peaks (rather than value-to-value comparison) enables us to ignore solution effects and sample outliers.

To compare theoretical curves to experimental scattering curves, and thereby fit models to sample data, we needed to implement a curve fitting/comparison algorithm (Figure 6). There are two methods which we can use for curve fitting – root mean square distance (RMSD) or a derivative-based fitting algorithm. A good fit has been achieved when the diffraction peaks of both the model and the sample are aligned.

Figure 6: By attempting to fit model scattering curves, we can characterize our samples, confirm brace and hinge functionality and make general observations from trends

#### Root Mean Square Distance

Initially, a root mean square distance (RMSD) algorithm was implemented in Python.

1. For a given value of $q=q_i$, $RMSD_i=\sqrt{\hat y_i-y_i}^2$, where $\hat y_i$ is the theoretical scattering intensity of the model and $y_i$ is the experimental scattering intensity of a sample

2. The total RMSD for a model fit to a particular sample scattering curve is given by $RMSD_{model}=\sum_{i} RMSD_i$

3. For a particular sample, the model with the lowest $RMSD_{model}$ is returned as the best fit (see Figure 10)

RMSD should, theoretically, return the best fitting model. However, due to the sinusoidal nature of both the model and experimental data, the RMSD algorithm can return an inappropriate fit due to the model curve repeatedly intersecting with or deviating about the experimental data. In Figure 7, the orange line (80 degrees) is the fit returned by the RMSD algorithm for this particular sample curve (brown). When we plot higher angle models (120-140 degrees), we see that they seem to offer a better fit (from inspection).

Figure 7: Experimental data (brown) plotted against the RMSD fit (orange) and higher model angles (blue, aqua, pink yellow, etc.). It is clear that the RMSD fit does not match the sinusoidal characteristics of the experimental data

As the alignment of diffraction peaks is indicative of a good fit, a derivative-based algorithm seems like a logical choice in attempting to fit sample SAXS curves.

#### Derivative Approach

Since we were most interested in how well aligned the diffraction peaks were between theoretical and experimental scattering data, RMSD fitting is not the best approach (value-to-value comparison). The derivative of a function is often used to locate peaks and troughs and, in our case, this makes it invaluable in determining a goodness of fit.

Implementing a derivative-based approach is difficult as the signal to noise ratio of the scattering data decreases with higher $q$ values. As such, a ‘smoothened’ or ‘regularised’ curve must be obtained before applying any difference approximations to the experimental data. GNOM, another program within the ATSAS suite, uses a priori information and user input (maximum diameter) to produce a plausible $P(r)$ curve.

By using the auto-GNOM function and trial and error, it was found that a maximum diameter of 2900 angstroms (due to our boxes aggregating in solution) and an initialized alpha value of 0 gave a good but smooth fit to the experimental data. This initial $P(r)$ curve serves as the basis of an iterative loop that is terminated once a fit that is both accurate (least squares and RMSD) and smooth (Tikhonov regularization and a Lagrangian) is obtained. The bash script gnom_fit.bsh contains the parameters fed into GNOM for each experimental scattering curve. The $I(q)$ vs. $q$ curve that is produced in this process can then be fed into the derivative algorithm:

1. Using the forward difference approximation, calculate the derivative at $q_i$ by $\frac{df}{dt}=\frac{f(q_{i+1})-f(q_i)}{\Delta q}$ for both the GNOM fit and the model scattering curves

2. For a given value of $q=q_i$, derivative error (DE) is calculated (difference between derivative of the GNOM fit and the derivative of the model). The derivative error takes derivatives of the same magnitude but opposite sign into account

3. The total DE for a model fit to a particular sample scattering curve is given by $DE_{model}=\sum_{i} DE_i$

4. For a particular sample, the model with the lowest $DE_{model}$ is returned as the best fit

Figure 8 is a flowchart of the derivative fitting process:

Figure 8: The modelling process for one sample scattering curve

From visual inspection, the returned fits were better than those offered by RMSD. Figures 9 and 10 are comparisons between the fits returned by the RMSD algorithm and the derivative error algorithm, and Figure 11 demonstrates the quality of the derivative error fit.

Figure 9: Experimental data (brown, BM08_d1) plotted against the RMSD fit (green) derivative algorithm fit (blue). A good fit is obtained when diffraction peaks are aligned

Figure 10: Experimental data (red, BM19_d1) plotted against the RMSD fit (green) derivative algorithm fit (blue). A good fit is obtained when diffraction peaks are aligned

Figure 11: A selection of Kratky plots ($I(q) \times q^2$ vs $q$). In each plot, the curve with noise is the experimental data, the solid line that follows it is the GNOM fit and the curve with data at high $q$ values is the model fit according to the derivative algorithm. A good fit is obtained when diffraction peaks are aligned. Note that the amplitude of a peak in the experimental data may be muted due to solution effects or flexibility in a particular component of the structure (third peak)

Results can be viewed in the Excel spreadsheet ‘Derivative Algorithm Error’ in the ‘SAXS Data + Script Repository’. There are two key outcomes from the ab initio approach:

1. Most of the experimental data (with varying closing hinge torques and varying opening bracing angles) was fit to the same, or similar, models (75 or 85 degrees between the two halves). This is likely to be a result of the subtle differences in diffraction peak phase and magnitude between the sample experimental curves

2. An increase in opening bracing angle (in both brace positions) and an increase in closing hinge torque were plotted against the fitted model’s angle of opening. As implied by the first result, there were no observable trends in the plots for brace position 1 (Figure 12) and closing hinge torque (Figure 13). This suggests that the braces in position 1 and the closing hinge torque may not be working as expected. However, a clear trend of increasing model opening angle was seen when increasing the opening bracing angle (Figure 14). This suggests that the braces are functional in position 2 (they are able to force a higher mean distance between the two halves of the box)

Figure 12: An increase in opening bracing angle in Brace Position 1 does not necessarily correspond to a higher model opening angle for a given closing hinge torque

Figure 13: An increase in closing hinge torque does not correspond to a lower model opening angle

Figure 14: An increase in opening bracing angle in Brace Position 2 corresponds to a higher model opening angle for a given closing hinge torque

It should be noted that the primary limitation of this method of analysis is that it heavily relies upon the use of an appropriate model, where poor theoretical models can lead to ambiguous conclusions (‘Garbage In, Garbage Out’). Whilst our derivative curve fitting strategy showed that our model scattering curves did show agreement with the position of experimental scattering diffraction peaks, none of the dummy atom models were able to account for the features of the sealed box (Figure 15).

Figure 15: A Kratky plot ($I(q) \times q^2$ vs $q$) of the sealed box scattering curve and its GNOM fit (bold blue and red respectively). The other curves are the scattering curves of all of the dummy atom models. Note that none of the models (including the closed dummy atom model box) account for the characteristics of the experimental data between $q$ values of 0.02 and 0.04

This suggests that our model can be improved, and as such, several in silico experiments were conducted in order to determine what the effect of varying model parameters is and if a better model could be used. Figures 16-21 depict three different experiments on the dummy atom models (filled cavity, dimerization and an increase in distance between the two halves of the box). None of the modified models (on their own) seemed to fit the sealed box sample scattering curves.

Figure 16: Dummy atom model of sealed box with filled cavity in PyMol

Figure 17: Scattering curves of the original sealed box (pink) and a sealed box with a filled cavity (orange). The diffraction peaks are aligned, but the box with the filled cavity is ‘scaled down’. Experimental data is included for scaling purposes

Figure 18: Dummy atom model of a sealed box dimer in PyMol

Figure 19: Scattering curves of the original sealed box (blue) and a sealed box dimer (green), with a pronounced trough in the mid $q$ region. Experimental data is included for scaling purposes

Figure 20: Dummy atom model of a sealed box with one half shifted 4nm away from the other half in PyMol

Figure 21: Scattering curves of the sealed sample BM23_d1 (gold), the original sealed box (red), a sealed box with one half translated away from the other by 2nm (blue) and a sealed box with one half translated away from the other by 4nm (pink). Translation seems to suggest an improved fit, but anything beyond 4nm is improbable (considering the two halves are bound by crossovers)

Other factors yet to be considered in the model and characterized using scattering curves from CrySol include interhelical spacing, helix length and the angle between the two halves. It may also be that the model is intrinsically flawed – the box might be structurally deformed or a dummy-atom model may be inappropriate (compared to a cylindrical model, for example). Possible future work in this direction includes developing a better structural model and an absolute measure of the goodness of fit in the fitting algorithm.

### Phase Shift in Diffraction Peaks

Between the various sample scattering curves, there is a small but noticeable phase shift in the mid $q$ diffraction peak (i.e. its location) and this phenomenon could be attributed to a change in the envelope of the box. The difference in the location of the diffraction peak between sample scattering curves is subtle yet significant and is not accounted for in the ab initio approach (which returns models with the same, or similar, opening angles for most of the samples). Figure 22 is an example of a possible physical phenomenon (closing of the box) associated with a lag (negative) phase shift in the diffraction peak between samples.

Figure 22: An example of a physical phenomenon that could possibly be tied to a phase shift in diffraction peaks between samples

Without any modelling or curve fitting, a rather simple analysis can give us an assessment of the efficacy of the hinge torque – by plotting the position (associated $q$ value) of the mid $q$ diffraction peak against hinge torque (i.e. unbraced samples), any directional trend will allow us to determine the efficacy of our hinge torque design.

Figures 23 and 24 depict the methodology and results of the experiment, respectively.

Figure 23: Magnified mid $q$ region of the scattering curves of boxes with varying hinge torque and no brace. The $q-value$ associated with the diffraction peak is given by the intersection of the drop line from each curve with the x-axis

Figure 24: Position of the mid $q$ diffrac of the scattering curves of boxes with varying hinge torque and no brace

Figure 24 shows a clear lead (positive) phase shift between no closing hinge torque and 35 degrees closing hinge torque, suggesting that the increase in closing hinge torque may have changed the molecular envelope of the DNA box. However, the mid $q$ diffraction peak returns to and remains close to its original position in boxes with a closing hinge torque of 70 degrees and 105 degrees respectively (Figure 22). This implies that the closing hinge torque may only be effective up to a 35 degree closing torqueing angle. This limit could be refined if there were a viable method to increase the closing hinge torque resolution (e.g. 40 degrees, 45 degrees, etc.).

We hypothesise that beyond this limit (i.e. at 70 degrees, 105 degrees closing hinge torque), the high degree of underwinding and overwinding of the DNA helix results in the hinge staples denaturing or zippering. This would be a result of the torsional strain on the structure being sufficient to disrupt base pairing interactions. Electrostatic repulsion between the two halves of the box may also contribute to the forces on this helix, thereby aiding the functional failure of the hinge. As such, a hinge may not have the desired functionality past a closing a hinge torque of 35 degrees and latches may be required to overcome electrostatic repulsion between the two halves of the box.

## Obtaining Box Scattering Curves (Materials and Method)

SAXS was carried out at the SAXS/WAXS beamline at the Australian Synchrotron. All possible sample permutations (sealed or unsealed, varying hinge torque, varying brace angle as well as two samples incubated with EcoRI) were synthesised at a nominal DNA origami concentration of 15nM and PEG purified twice (see DNA Design for specific synthesis protocol). A dilution series was then prepared for each sample permutation (15nM, 2 x concentration, 4 x concentration). Samples were then pipetted into a 96 well plate which was then used to fill 1.5mm diameter quartz capillaries. The samples were then transferred from the capillaries into the sample holders on the beamline. Monochromatic X-rays were then fired through the beamline.

Each sample was shot multiple times – this gives us multiple frames. From here, the scattering data needs to be processed in order to be analysed. Outlier frames (scattering due to striking the meniscus, air and those that may have been the result of shooting abnormalities/aggregation in solution) were discarded. The remaining frames were averaged to produce a single scattering curve for each sample dilution. Synthesis buffer was also shot as a control in order to account for any ‘background’/solution scattering. Three buffer solutions (without PEG, see Buffer Subtraction section) were shot and it was determined that they all produced a similar scattering curve to each other. As such, one buffer scattering curve was subtracted from each of the sample dilutions to give us buffer-subtracted SAXS data of our origami structures. Figure 23 depicts the data processing involved in producing our origami box SAXS curves.

Figure 25: SAXS Data Processing Flowchart

### Buffer Subtraction

To obtain the scattering curve of our boxes, we need to subtract the background scattering (buffer scattering). This means that we need to shoot the buffer separately – ideally, we separate the origami from its buffer and shoot the buffer alone. This is especially important as errors in PEG purification can result in PEG being retained with the origami. However, separation would have been quite difficult (SEC columns or dialysis). Instead, we shot synthesis buffer (with no PEG) and various dilutions of the PEG buffer in synthesis buffer. If one microliter of PEG buffer were retained (realistic, as all but a minute amount is pipetted out during purification) and subsequently included in the resuspended solution, it would correspond to a fifty times dilution. After comparing the scattering curves of 64 times diluted PEG buffer (similar to a dilution factor of 50) and synthesis buffer, it appears as though PEG has no significant effect on X-ray scattering (see Figure 24). As such, SAXS data from synthesis buffer without PEG can be used in the data analysis process.

Figure 26: SAXS data for synthesis buffer with and without PEG

## References

Fischer, S., Hartl, C., Frank, K., Rädler, J., Liedl, T. and Nickel, B. (2016). Shape and Interhelical Spacing of DNA Origami Nanostructures Studied by Small-Angle X-ray Scattering. Nano Letters, 16(7), pp.4282-4287.

Glatter, O. (1977). A new method for the evaluation of small-angle scattering data. J Appl Cryst, 10(5), pp.415-421.

Glatter, O. (1988). Comparison of two different methods for direct structure analysis from small-angle scattering data. J Appl Cryst, 21(6), pp.886-890.

Schnablegger, H. and Singh, Y. (2013). The SAXS Guide. 3rd ed. Austria: Anton Paar GmbH.

Svergun, D. (1992). Determination of the regularization parameter in indirect-transform methods using perceptual criteria. J Appl Cryst, 25(4), pp.495-503.

Svergun, D., Koch, M., Timmins, P. and May, R. (n.d.). Small angle X-ray and neutron scattering from solutions of biological macromolecules.

Svergun, D., Barberato, C. and Koch, M. (1995). CRYSOL– a Program to Evaluate X-ray Solution Scattering of Biological Macromolecules from Atomic Coordinates. J Appl Cryst, 28(6), pp.768-773.