Interpreting scratch assays using pair density dynamics and approximate Bayesian computation

Quantifying the impact of biochemical compounds on collective cell spreading is an essential element of drug design, with various applications including developing treatments for chronic wounds and cancer. Scratch assays are a technically simple and inexpensive method used to study collective cell spreading; however, most previous interpretations of scratch assays are qualitative and do not provide estimates of the cell diffusivity, D, or the cell proliferation rate, λ. Estimating D and λ is important for investigating the efficacy of a potential treatment and provides insight into the mechanism through which the potential treatment acts. While a few methods for estimating D and λ have been proposed, these previous methods lead to point estimates of D and λ, and provide no insight into the uncertainty in these estimates. Here, we compare various types of information that can be extracted from images of a scratch assay, and quantify D and λ using discrete computational simulations and approximate Bayesian computation. We show that it is possible to robustly recover estimates of D and λ from synthetic data, as well as a new set of experimental data. For the first time, our approach also provides a method to estimate the uncertainty in our estimates of D and λ. We anticipate that our approach can be generalized to deal with more realistic experimental scenarios in which we are interested in estimating D and λ, as well as additional relevant parameters such as the strength of cell-to-cell adhesion or the strength of cell-to-substrate adhesion.


Lattice mapping
To map the positions of cells from the experimental image, where cell position is a continuous variable, to a discrete lattice we first calculate the position of each cell. We then define a mapping from cell position (x c , y c ) to lattice site (x L , y L ) through the relationship where ⌈x⌉ denotes the ceiling function.

ABC Algorithm
Marjoram et al. [24] provide a full description of this approach and here we only give a brief outline of the algorithm used in our study.

R2
Simulate β ′ from the model using θ ′ and calculate the summary statistic S(β ′ ).

R7
Return to R1 until M steps have been attempted.
Initially, we sample θ randomly from the prior distribution, until the corresponding summary statistic is sufficiently close to the experimental summary statistic. We define the transition kernel that proposes θ ′ values as a bivariate uniform distribution, so that θ ′ ∈ θ ± Γ, where Γ defines the width of the uniform distribution. The transition kernel ensures that P m ∈ [0, 1], P p ∈ [0, 1] by truncating the bivariate uniform distribution at the boundaries of the parameter space, if necessary. To measure the differences between two summary statistics we define where S(β) i is the i th data point in S(β) and Q is the number of data points in S(β). We note that we take the average d [S(β)] value when there are summary statistics taken at multiple time points.

Distribution Convergence
To examine whether the posterior distribution generated from our ABC algorithm approximates f (θ|β), we consider posterior distributions generated with different ϵ values. The posterior distributions are calculated using the same data sets and identically prepared simulations. If the change in the posterior distribution between ϵ values is insignificant then the estimated posterior distribution provides a close approximation to f (θ|β) [23]. In Figures  1(a)-(c) we present results using three different ϵ values that demonstrate that the posterior distribution approximately converges for ϵ = 0.012. We observe that the distribution is centred at approximately the same position, with regard to P m and P p , and that the spread of the distribution in the P m and P p directions are consistent between Figures 1(b)-(c). We repeat this process in Figures 2-7 for all distributions presented in this work and demonstrate the values of ϵ chosen are appropriate.  Figure 6: Convergence of the averaged posterior distribution for ten identically prepared synthetic data sets for the summary statistic consisting of the counts of pair distances c(i). Synthetic data was generated with P m = 0.25, P p = 2 × 10 −3 . The maximum distance between summary statistics for θ to be accepted was (a) ϵ = 0.075, (b) ϵ = 0.065, (c) ϵ = 0.06. Red indicates high relative frequency while blue indicates low relative frequency. Synthetic data was generated with P m = 0.25, P p = 2 × 10 −3 . The maximum distance between summary statistics for θ to be accepted was (a) ϵ = 0.02, (b) ϵ = 0.0175, (c) ϵ = 0.015. Red indicates high relative frequency while blue indicates low relative frequency.