Fast Algorithms for Membrane and Synapse Detection

  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1


We are four students from Harvard SEAS

Cole Diamond

I am a Master's student in IACS. I did my undergrad at Columbia in Computer Science.

Hallvard Nydal

I am a student in SEAS from Norway. I used to be a professional cross-country skier.

Raphaël Pestourie

I'm a PhD student in Applied Physics. I have lost count of all the degrees I have. Also, I am French.

Mina Nassif

I'm Mina. I'm a Master's student in IACS, originally from Egypt. Fun fact: passport control mispelled my name when I was applying for a visa, so now I am legally registered in the US as Mena instead of Mina.

Full Paper

Download Paper

Fast Algorithms for Membrane and Synapse
Hallvard Moian NYDAL
Project Parnter:
Teaching Fellow:
Thouis Ray JONES
May 15, 2015
Fast Algorithms for Membrane and Synapse
Hallvard Moian Nydal
Institute for Applied
Computational Science
Harvard University
Cole Diamond
Institute for Applied
Computational Science
Harvard University
el Pestourie
School of Engineering
and Applied Science
Harvard University
Mina Nassif
Institute for Applied
Computational Science
Harvard University
AbstractConnectomics is the study of the structural
and functional connections among brain cells. By com-
paring diseased connectomes and healthy connectomes,
physicians and researchers alike can gain valuable insight
into neurodegenerative disorders like Alzheimers. As au-
tomated approaches to scanning-electron microscopy at
nano-resolution have yielded increasingly larger data-sets,
the need for scalable data processing has become crucial.
Using a set of manually-annotated images, we perform
binary classification of the pixels to detect cell membranes
and synaptic clefts. Compared to state-of-the-art pixel-wise
prediction method, we show that predicting multiple pixels
in batches cuts training and prediction time by a factor of
100, while sacrificing little by way of predictive accuracy.
Index Terms—Connectomics, Convolutional Neural Net-
works, Window Prediction
The connectome is the wiring diagram of the
brain and nervous system. Mapping the network of
neurons in organisms is vital for discovering the
underlying architecture of the brain and investigat-
ing the physical underpinning of cognition, intelli-
gence, and consciousness. It is also an important
step in understanding how connectivity patterns are
altered by mental illnesses and various pathologies.
Advancements in scanning electron microscopy has
made feasible the 3D reconstruction of neuronal
processes at the nano-scale (see figure 1).
However, connectomics at the microscale has
proven to be a trying task. With manual annotation
and reconstruction, mapping cannot be completed
within a realistic time frame [1,2]. As automated ap-
proaches to scanning-electron microscopy at nano-
resolution have yielded increasingly larger data-sets,
there is strong interest in finding more scalable
Fig. 1: Automated Reconstruction of Neuronal Processes.
Image Credit: Kaynig et al. [5]
approaches to automating identification of neurons
and synapses.
The main contribution of our research is a fast and
scalable algorithm for detecting membrane edges
and synapses. By predicting multiple pixels simul-
taneously, we are able to cut training and prediction
time while sacrificing little accuracy.
Of seminal importance in the realm of large-scale
reconstruction of neuron mappings is the work done
by Kaynig et al. [5]. The authors propose a pipeline
for automatically reconstructing neuronal processes
from large-scale electron microscopy image data.
The pipeline is represented graphically below. Our
focus is on the stage of the pipeline concerning
recognition of cell membranes and synapses.
The proposed pipeline is a viable solution for
large-scale connectomics, although training time is
prohibitively long [5]. From critical path analysis,
we see that the bottleneck of training occurs during
the membrane classification stage. In the implemen-
tation by Kaynig et al., each pixel is classified.
Fig. 2: Complete workflow for large-scale neuron reconstruc-
tion. Image Credit: Kaynig et al. [5]
Drawing from the work of Hinton in road classifi-
cation, predicting patches of pixels simultaneously
may improve the throughput of a system with minor
compromise to accuracy [6]. We also draw from
Hinton in applying Gaussian thickening to edges to
improve predictive accuracy.
Recently, synapses have become the focus of
classification and region segmentation efforts. An
example synapse is shown in figure 3. In a bench-
mark study by Becker et al., a high Jaccard index
for different values of ’exclusion zone thickness’
for synapse classification was achieved [4]. Our
work attempts to apply batch prediction methods
to membrane detection as well as synapse synapse
detection, where we will use Becker et al. as a point
of comparison.
Fig. 3: A synapse’s morphology is marked by vesicles, a
synaptic cleft and a post-synaptic region.
In order to augment the predictive power of our
models, we pre-process our training, testing and
validation data using a variety of techniques. Pre-
processing reduces noise from the original data and
underscores important features. We employ seven
steps in our pre-processing pipeline: feature normal-
ization, edge-detection, stochastic rotation, adaptive
histogram equalization, shuffling, and dataset bal-
A. Feature Normalization
The first step of our feature-normalization proce-
dure is reshaping our images to 1024 1024 pixels.
Therefore, we perform standardization along the
columns where we subtract the mean of the column
and divide by its standard deviation, z =
B. Edge-Detection
For cell-membrane detection, we use edge-
detection to binarize our data. To do so, we perform
a 2D convolution using the scharr kernel [10]. We
also utilize edge-thickening as in Hinton [6] to
reduce the sparsity of the feature. To do so, we grow
the edges by converting all pixels in a neighborhood
of an existing edge to edge pixels. Thereafter, we
apply a gaussian blur to our data to make our
convolutional neural network more robust to noise.
C. Stochastic Rotation
Rowley et al. showed that stochastic rotations
can introduce rotation invariance [2]. In order to
introduce rotation invariance, we randomly rotate
our training data by 90, 180 or 270 degrees.
D. Shuffling
To avoid over-fitting, we shuffled our data using
two different schemes. At first, we permuted the
order of the sub-windows which we use for training
to ensure that we did not train on the same sample
twice within a batch. To simplify matters, we then
decided to randomly sample the training data to use
in our batch. Since the probability that we train
on the same input twice within a batch is small as
the amount of data grows, we opted for the latter,
simpler shuffling scheme.
E. Adaptive Histogram Equalization
Adaptive Histogram Equalization (AHE) is a pop-
ular and effective algorithm for improving local
contrast of the images which contain bright or dark
regions. An example
AHE differs from ordinary histogram equaliza-
tion in the respect that AHE computes several
Fig. 4: Comparison of an image without (left) and with
(right) adaptive histogram equalization. Image credit: Sujin
histograms, each corresponding to a distinct section
of the image; the method then uses the various
histograms to redistribute the lightness values of
the image. The algorithm of AHE is depicted in
Algorithm 1 in the appendix [15].
F. Data Balancing
For both synapse detection and cell-membrane
detection, it is important that the training set con-
tains an equal number of positive and negative ex-
amples. Accordingly, we center half of our training-
data on edges and synapses to effectively bias our
A. Background
Convolutional neural networks have become state
of the art in a wide range of machine learning
problems, including object recognition and image
classification [9]. Convolutional neural networks
are able to achieve exceptional performance for a
multitude of different problems. Since convolutional
neural networks exhibit locality, translation invari-
ance, and hierarchical structure (see figure 5) , they
make for a natural pairing for object recognition.
The challenge with convolutional neural networks
is the relatively long prediction and training time.
A typical convolutional network architecture con-
sists of several stages. In the first stage, a series of
convolutions are performed in parallel to produce a
set of presynaptic activations. In the detection stage,
each pre-synaptic activation is run through a nonlin-
ear activation function, such as the rectified linear
activation function. Thereafter, a pooling function
replaces the output of the net at a certain location
with a summary statistic of the nearby outputs.
Pooling makes features invariant from the location
Fig. 5: CNN’s can capture the natural hierarchy of images.
Image credit: Lee et al., ICML, 2009
in the input image, and activations less sensitive
to neural network structure. Often, an additional
stage is added implementing the max-out [18] and
dropout [19] techniques discussed in subsequent
sections. Figure 6 shows the architecture of a typical
convolutional network.
Fig. 6: A typical Convolutional Neural Network consist-
ing of six layers for MNIST classification. Image credit: eigenman
1) Rectified Linear Units: There are several
choices when it comes to activation functions in
neural networks. All are variations of the output of a
single node in a neural network, z
= x
+ b
where W 2 R
and b 2 R
are learned
parameters. In contrast to alternative activation func-
tions such as sigmodial functions, f(z)=(1+
and tanh functions, f (z)=tanh(z),
rectifier linear units are unbound and can represent
any non negative real value.
Rectifier linear units use the activation function
rect(z)=max(0,z). The activation function has
good sparsity properties due to having a real zero
activation value, and therefore, suffers less from Di-
minishing Gradient Flow. As a result, training time
decreases by a substantial factor. Alex Krizhevsky
et al. [16] decreases in training speed of up to
600 % on benchmark simulations on the ImageNet
data set Ergo, we use rectified linear units as the
activation function of choice for our convolutional
neural networks.
2) Dropout: The large number of neurons and
their associated connections makes neural networks
prone to over-fitting. Dropout is a stochastic tech-
nique to reduce architectural dependence wherein
neurons and their incoming and outgoing connec-
tions are omitted with a predefined probability.
Empirical results from Hinton et al. [19] shows that
dropout yields a significant performance improve-
ment in predictive performance over a wide range
of data sets. Dropout can be understood as an effi-
cient way to perform model averaging with neural
networks and to prevent complex co-adaptation on
the training data.
Fig. 7: In dropout training, units are temporarily removed
from a network, along with all its incoming and outgoing
connections. Image credit: Joost Van Doorm, J.vanDoorn-
3) Maxout: In a convolutional network, a maxout
feature map can be constructed by taking the maxi-
mum across k affine feature maps (i.e., pool across
channels). A single maxout unit can be interpreted
as making a piecewise linear approximation to an
arbitrary convex function (see figure 8). The maxout
activation function is designed to never have a
gradient of zero, allowing the network using maxout
and dropout to achieve a good approximation to
model averaging.
Fig. 8: Maxout activation function can implement the rec-
tified linear, absolute value rectifier, and approximate the
quadratic activation function. Image credit: Goodfellow et al.
4) Root Mean Square Propagation: RMSProp
[21] uses a moving average over the root mean
squared gradients to normalize the current gradient.
Letting f
(t) be the derivative of the cost function
with respect to the parameters at time step t, be the
step rate, and the decay term, we perform the fol-
lowing updates: r
=(1 )f
+ r
We use RMSProp instead
of stochastic gradient descent in our convolutional
neural network as it achieves greater classification
performance on our dataset
B. Convolutional Neural Network Architecture
Our convolutional neural network consists of
twelve layers (see figure 9). For the input layer, we
take 64x64 pixel sub-windows from three images
slices located consecutively in the Z dimension.
Using image slices above and below the current im-
age slice enables us to leverage a three-dimensional
context for training and prediction.
1) Convolutional Filters: In our first convolu-
tional layer, we convolve our input with 64 5x5
kernels using a stride length of 1. As a result, we
reduce our input dimensions to (64-5+1)x(64-5+1)
= 60x60. For convenience, we notate this layer as
C(64, 5, 1). Thereafter, we apply max-out to reduce
the number of filters from 64 to 32. In doing so,
we combine filters by taking the maximum for each
pixel across 2 consecutive filters, noted M(2). This
reduces the number of filters by a factor of 2.
Thereafter, we sub-sample a filter with 2x2 windows
and pool the maximum, noted P(2). This process,
known as max-pooling, further reduces the input
space to 30x30. Finally, we repeat this process using
a different set of parameters. Using the notation
C(number of filters, kernel size, length of stride),
M(maxout pieces), P(pooling window shape), the