Simulating connectivity.

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What’s with that paper from Steve Smith?

  • How can I simulate functional connectivity between regions with R?

Objectives
  • Have a basic understanding of where the data comes from.

Steve Smith’s paper

In the previous episode, we’ve seen how to compute connectivity. If we indeed assume that the atlas is perfect, that a connection is present or absent, that there are no external inputs, the approach using correlations makes perfect sense and has therefore been used for years (decades?). There are also more advanced methods that can also estimate the direction of a certain connection, by counting on a time lag. If this time lag really exists and is constant, the approach should work. However, in 2011, Steve Smith did a huge simulation study. He investigated the performance of these methods under perfect circumstances, and looked at the influence of violations of the assumptions.

For example: we know that we don’t have the perfect atlas yet and people are working really hard to find a good delineation of the different parts of the brain, and even with all the hard work, it is only the question if it is even possible. As such, we are pretty sure that our signals are somewhat mixed. The Smith paper showed that mixing signals has a detrimental effect on the false positive rate.

In what follows, we will do Smith light, with very basic simulations. The goal is not to replicate the results, or find groundbreaking results. The goal is to enable you to think about assumptions that are potentially untrue, and to convince you to check the robustness of a method against model misspecification / deviations from the assumptions.

Loading necessary libraries

library(pcalg)

Simulating network data using rmvDAG

We’re going to simulate time series with predefined correlations

A <- matrix(
  c(0,1,0,0,0,
    0,0,1,0,0,
    0,0,0,1,0,
    0,0,0,0,1,
    0,0,0,0,0),
  nrow=5, byrow=TRUE
  )
levelplot(A,col.regions = heat.colors(100))

fmrifig

Now with the rmvDAG package we can simulate timeseries.

dag <- getGraph(A)
timeseries <- rmvDAG(300,dag,errDist="cauchy")
plot(timeseries[,1],col=1,type='l',ylim=c(-5,5),xlim=c(0,50))
for (t in 2:5){lines(timeseries[,t],col=t)}

fmrifig

Let’s estimate the partial correlations from the simulated time series.

estimate <- partial.cor(timeseries,tests=TRUE)
estimate
Partial correlations:
         1        2        3        4        5
1  0.00000  0.85156  0.00771 -0.02899  0.02567
2  0.85156  0.00000  0.46612 -0.00529  0.01003
3  0.00771  0.46612  0.00000  0.24697 -0.05361
4 -0.02899 -0.00529  0.24697  0.00000  0.88994
5  0.02567  0.01003 -0.05361  0.88994  0.00000
 
 Number of observations: 300
 
 Pairwise two-sided p-values:
  1      2      3      4      5

1        <.0001 0.8947 0.6187 0.6595
2 <.0001        <.0001 0.9276 0.8633
3 0.8947 <.0001        <.0001 0.3572
4 0.6187 0.9276 <.0001        <.0001
5 0.6595 0.8633 0.3572 <.0001
 
 Adjusted p-values (Holm's method)
  1      2      3      4      5
1        <.0001 1.0000 1.0000 1.0000
2 <.0001        <.0001 1.0000 1.0000
3 1.0000 <.0001        0.0001 1.0000
4 1.0000 1.0000 0.0001        <.0001
5 1.0000 1.0000 1.0000 <.0001

We want to save the false positives and the false negatives. For that, we can extract the p-values.

out <- estimate$P
out
1        2        3        4        5
1 ""       "<.0001" "1.0000" "1.0000" "1.0000"
2 "<.0001" ""       "<.0001" "1.0000" "1.0000"
3 "1.0000" "<.0001" ""       "0.0001" "1.0000"
4 "1.0000" "1.0000" "0.0001" ""       "<.0001"
5 "1.0000" "1.0000" "1.0000" "<.0001" ""

The p-values are strings and have "<.0001", so we need to clean that a bit.

out <- ifelse(out=="<.0001",0.0001,out)
class(out) <- "numeric"
levelplot(out,col.regions = heat.colors(100))

fmrifig

Now let’s only look at the upper triangle. The lower triangle is the same.

mask <- upper.tri(A)
input <- A[mask]
output <- out[mask]
print(input)
print(output)
> print(input)
 [1] 1 0 1 0 0 1 0 0 0 1
> print(output)
 [1] 0e+00 1e+00 0e+00 1e+00 1e+00 1e-04 1e+00 1e+00 1e+00 0e+00
alpha <- 0.05/9
TP <- sum(input==1 & output<alpha)
FP <- sum(input==0 & output<alpha)
paste("True positives:",TP,"- false positives:",FP)
"True positives: 4 - false positives: 0"
TPR <- TP/sum(input==1)
FPR <- FP/sum(input==0)
paste("True positive rate:",TPR,"- false positive rate:",FPR)
[1] "True positives: 4 - false positives: 0"

Let’s write this simulation in a function, so that we can re-use it multiple times.

simulated_pvals <- function(dag){
  timeseries <- rmvDAG(300,dag,errDist="cauchy")
  estimate <- partial.cor(timeseries,tests=TRUE)
  out <- estimate$P
  out <- ifelse(out=="<.0001",0,out)
  class(out) <- "numeric"
  out
}

Check if this works.

out <- simulated_pvals(dag)
levelplot(out)

fmrifig

Now we can run a simulation and repeat this 1000 times.

tp <- fp <- c()
for (sim in 1:1000){
  out <- simulated_pvals(dag)
  output <- out[mask]
  tp[sim] <- sum(input==1 & output<alpha)
  fp[sim] <- sum(input==0 & output<alpha)
}
print(paste("True positive rate:",mean(tp/sum(input==1)),
    "- false positive rate",mean(fp>0)))
[1] "True positive rate: 0.7695 - false positive rate 0.06"

Now let’s look what happens when we change our function so that two signals are somewhat mixed. This is parallel to an atlas where the boundary would be somewhat off.

simulated_pvals_mixed <- function(dag){
  timeseries <- rmvDAG(300,dag,errDist="cauchy")
  hlp <- 0.95*timeseries[,1]+0.05*timeseries[,3]
  hlp2 <- 0.05*timeseries[,1]+0.95*timeseries[,3]
  timeseries[,1] <- hlp
  timeseries[,3] <- hlp2
  estimate <- partial.cor(timeseries,tests=TRUE)
  out <- estimate$P
  out <- ifelse(out=="<.0001",0,out)
  class(out) <- "numeric"
  out
}
tp <- fp <- c()
for (sim in 1:1000){
  out <- simulated_pvals_mixed(dag)
  output <- out[mask]
  output <- ifelse(output<alpha,1,0)
  tp[sim] <- sum(input==1 & output==1)
  fp[sim] <- sum(input==0 & output==1)
}
print(paste("True positive rate:",mean(tp/sum(input==1)),
    "- false positive rate",mean(fp>0)))
[1] "True positive rate: 0.7475 - false positive rate 0.227"

Key Points

  • fMRI data is a time series of 3D images of the brain.

  • Visualising and manipulating fMRI nifti images can be done with neurobase.