We are four students from Harvard SEAS

I am a student in SEAS from Norway. I used to be a professional cross-country skier.

I'm a PhD student in Applied Physics. I have lost count of all the degrees I have. Also, I am French.

I'm Mina. I'm a Master's student in IACS, originally from Egypt. Fun fact: passport control mispelled my name when I was applying for a visa, so now I am legally registered in the US as Mena instead of Mina.

HARVARD UNIVERSITY

APPLIED COMPUTATION 297R

CAPSTONE PROJEC T

Fast Algorithms for Membrane and Synapse

Detection

Authors:

Hallvard Moian NYDAL

Cole DIAMOND

Rapha

¨

el PESTOURIE

Mina NASSIF

Professor:

Pavlos PROTOPA PA S

Project Parnter:

Verena KAYNIG -FITTKAU

Teaching Fellow:

Thouis Ray JONES

May 15, 2015

Fast Algorithms for Membrane and Synapse

Detection

Hallvard Moian Nydal

Institute for Applied

Computational Science

Harvard University

Cole Diamond

Institute for Applied

Computational Science

Harvard University

Rapha

¨

el Pestourie

School of Engineering

and Applied Science

Harvard University

Mina Nassif

Institute for Applied

Computational Science

Harvard University

Abstract—Connectomics is the study of the structural

and functional connections among brain cells. By com-

paring diseased connectomes and healthy connectomes,

physicians and researchers alike can gain valuable insight

into neurodegenerative disorders like Alzheimers. As au-

tomated approaches to scanning-electron microscopy at

nano-resolution have yielded increasingly larger data-sets,

the need for scalable data processing has become crucial.

Using a set of manually-annotated images, we perform

binary classiﬁcation of the pixels to detect cell membranes

and synaptic clefts. Compared to state-of-the-art pixel-wise

prediction method, we show that predicting multiple pixels

in batches cuts training and prediction time by a factor of

100, while sacriﬁcing little by way of predictive accuracy.

Index Terms—Connectomics, Convolutional Neural Net-

works, Window Prediction

I. INTRODUCTION

The connectome is the wiring diagram of the

brain and nervous system. Mapping the network of

neurons in organisms is vital for discovering the

underlying architecture of the brain and investigat-

ing the physical underpinning of cognition, intelli-

gence, and consciousness. It is also an important

step in understanding how connectivity patterns are

altered by mental illnesses and various pathologies.

Advancements in scanning electron microscopy has

made feasible the 3D reconstruction of neuronal

processes at the nano-scale (see ﬁgure 1).

However, connectomics at the microscale has

proven to be a trying task. With manual annotation

and reconstruction, mapping cannot be completed

within a realistic time frame [1,2]. As automated ap-

proaches to scanning-electron microscopy at nano-

resolution have yielded increasingly larger data-sets,

there is strong interest in ﬁnding more scalable

Fig. 1: Automated Reconstruction of Neuronal Processes.

Image Credit: Kaynig et al. [5]

approaches to automating identiﬁcation of neurons

and synapses.

The main contribution of our research is a fast and

scalable algorithm for detecting membrane edges

and synapses. By predicting multiple pixels simul-

taneously, we are able to cut training and prediction

time while sacriﬁcing little accuracy.

II. PRIOR WORK

Of seminal importance in the realm of large-scale

reconstruction of neuron mappings is the work done

by Kaynig et al. [5]. The authors propose a pipeline

for automatically reconstructing neuronal processes

from large-scale electron microscopy image data.

The pipeline is represented graphically below. Our

focus is on the stage of the pipeline concerning

recognition of cell membranes and synapses.

The proposed pipeline is a viable solution for

large-scale connectomics, although training time is

prohibitively long [5]. From critical path analysis,

we see that the bottleneck of training occurs during

the membrane classiﬁcation stage. In the implemen-

tation by Kaynig et al., each pixel is classiﬁed.

Fig. 2: Complete workﬂow for large-scale neuron reconstruc-

tion. Image Credit: Kaynig et al. [5]

Drawing from the work of Hinton in road classiﬁ-

cation, predicting patches of pixels simultaneously

may improve the throughput of a system with minor

compromise to accuracy [6]. We also draw from

Hinton in applying Gaussian thickening to edges to

improve predictive accuracy.

Recently, synapses have become the focus of

classiﬁcation and region segmentation efforts. An

example synapse is shown in ﬁgure 3. In a bench-

mark study by Becker et al., a high Jaccard index

for different values of ’exclusion zone thickness’

for synapse classiﬁcation was achieved [4]. Our

work attempts to apply batch prediction methods

to membrane detection as well as synapse synapse

detection, where we will use Becker et al. as a point

of comparison.

Fig. 3: A synapse’s morphology is marked by vesicles, a

synaptic cleft and a post-synaptic region.

III. DATA SANITIZATION

In order to augment the predictive power of our

models, we pre-process our training, testing and

validation data using a variety of techniques. Pre-

processing reduces noise from the original data and

underscores important features. We employ seven

steps in our pre-processing pipeline: feature normal-

ization, edge-detection, stochastic rotation, adaptive

histogram equalization, shufﬂing, and dataset bal-

ancing.

A. Feature Normalization

The ﬁrst step of our feature-normalization proce-

dure is reshaping our images to 1024 ⇥ 1024 pixels.

Therefore, we perform standardization along the

columns where we subtract the mean of the column

and divide by its standard deviation, z =

xµ

.

B. Edge-Detection

For cell-membrane detection, we use edge-

detection to binarize our data. To do so, we perform

a 2D convolution using the scharr kernel [10]. We

also utilize edge-thickening as in Hinton [6] to

reduce the sparsity of the feature. To do so, we grow

the edges by converting all pixels in a neighborhood

of an existing edge to edge pixels. Thereafter, we

apply a gaussian blur to our data to make our

convolutional neural network more robust to noise.

C. Stochastic Rotation

Rowley et al. showed that stochastic rotations

can introduce rotation invariance [2]. In order to

introduce rotation invariance, we randomly rotate

our training data by 90, 180 or 270 degrees.

D. Shufﬂing

To avoid over-ﬁtting, we shufﬂed our data using

two different schemes. At ﬁrst, we permuted the

order of the sub-windows which we use for training

to ensure that we did not train on the same sample

twice within a batch. To simplify matters, we then

decided to randomly sample the training data to use

in our batch. Since the probability that we train

on the same input twice within a batch is small as

the amount of data grows, we opted for the latter,

simpler shufﬂing scheme.

E. Adaptive Histogram Equalization

Adaptive Histogram Equalization (AHE) is a pop-

ular and effective algorithm for improving local

contrast of the images which contain bright or dark

regions. An example

AHE differs from ordinary histogram equaliza-

tion in the respect that AHE computes several

Fig. 4: Comparison of an image without (left) and with

(right) adaptive histogram equalization. Image credit: Sujin

Philip sujin@cs.utah.edu

histograms, each corresponding to a distinct section

of the image; the method then uses the various

histograms to redistribute the lightness values of

the image. The algorithm of AHE is depicted in

Algorithm 1 in the appendix [15].

F. Data Balancing

For both synapse detection and cell-membrane

detection, it is important that the training set con-

tains an equal number of positive and negative ex-

amples. Accordingly, we center half of our training-

data on edges and synapses to effectively bias our

samples.

IV. CONVOLUTIONAL NEURAL NETWORKS

A. Background

Convolutional neural networks have become state

of the art in a wide range of machine learning

problems, including object recognition and image

classiﬁcation [9]. Convolutional neural networks

are able to achieve exceptional performance for a

multitude of different problems. Since convolutional

neural networks exhibit locality, translation invari-

ance, and hierarchical structure (see ﬁgure 5) , they

make for a natural pairing for object recognition.

The challenge with convolutional neural networks

is the relatively long prediction and training time.

A typical convolutional network architecture con-

sists of several stages. In the ﬁrst stage, a series of

convolutions are performed in parallel to produce a

set of presynaptic activations. In the detection stage,

each pre-synaptic activation is run through a nonlin-

ear activation function, such as the rectiﬁed linear

activation function. Thereafter, a pooling function

replaces the output of the net at a certain location

with a summary statistic of the nearby outputs.

Pooling makes features invariant from the location

Fig. 5: CNN’s can capture the natural hierarchy of images.

Image credit: Lee et al., ICML, 2009

in the input image, and activations less sensitive

to neural network structure. Often, an additional

stage is added implementing the max-out [18] and

dropout [19] techniques discussed in subsequent

sections. Figure 6 shows the architecture of a typical

convolutional network.

Fig. 6: A typical Convolutional Neural Network consist-

ing of six layers for MNIST classiﬁcation. Image credit:

https://engineering.purdue.edu/ eigenman

1) Rectiﬁed Linear Units: There are several

choices when it comes to activation functions in

neural networks. All are variations of the output of a

single node in a neural network, z

ij

= x

T

W

ij

+ b

ij

where W 2 R

d⇥m⇥k

and b 2 R

m⇥k

are learned

parameters. In contrast to alternative activation func-

tions such as sigmodial functions, f(z)=(1+

exp(z))

1

and tanh functions, f (z)=tanh(z),

rectiﬁer linear units are unbound and can represent

any non negative real value.

Rectiﬁer linear units use the activation function

rect(z)=max(0,z). The activation function has

good sparsity properties due to having a real zero

activation value, and therefore, suffers less from Di-

minishing Gradient Flow. As a result, training time

decreases by a substantial factor. Alex Krizhevsky

et al. [16] decreases in training speed of up to

600 % on benchmark simulations on the ImageNet

data set Ergo, we use rectiﬁed linear units as the

activation function of choice for our convolutional

neural networks.

2) Dropout: The large number of neurons and

their associated connections makes neural networks

prone to over-ﬁtting. Dropout is a stochastic tech-

nique to reduce architectural dependence wherein

neurons and their incoming and outgoing connec-

tions are omitted with a predeﬁned probability.

Empirical results from Hinton et al. [19] shows that

dropout yields a signiﬁcant performance improve-

ment in predictive performance over a wide range

of data sets. Dropout can be understood as an efﬁ-

cient way to perform model averaging with neural

networks and to prevent complex co-adaptation on

the training data.

Fig. 7: In dropout training, units are temporarily removed

from a network, along with all its incoming and outgoing

connections. Image credit: Joost Van Doorm, J.vanDoorn-

1@student.utwente.nl

3) Maxout: In a convolutional network, a maxout

feature map can be constructed by taking the maxi-

mum across k afﬁne feature maps (i.e., pool across

channels). A single maxout unit can be interpreted

as making a piecewise linear approximation to an

arbitrary convex function (see ﬁgure 8). The maxout

activation function is designed to never have a

gradient of zero, allowing the network using maxout

and dropout to achieve a good approximation to

model averaging.

Fig. 8: Maxout activation function can implement the rec-

tiﬁed linear, absolute value rectiﬁer, and approximate the

quadratic activation function. Image credit: Goodfellow et al.

[18]

4) Root Mean Square Propagation: RMSProp

[21] uses a moving average over the root mean

squared gradients to normalize the current gradient.

Letting f

0

(✓t) be the derivative of the cost function

with respect to the parameters at time step t, ↵ be the

step rate, and the decay term, we perform the fol-

lowing updates: r

t

=(1 )f

0

(✓

t

)

2

+ r

t1

; ⌫

t+1

=

↵

2

p

r

t

f

0

(✓

t

); ✓

t+1

= ✓

t

⌫

t+1

We use RMSProp instead

of stochastic gradient descent in our convolutional

neural network as it achieves greater classiﬁcation

performance on our dataset

B. Convolutional Neural Network Architecture

Our convolutional neural network consists of

twelve layers (see ﬁgure 9). For the input layer, we

take 64x64 pixel sub-windows from three images

slices located consecutively in the Z dimension.

Using image slices above and below the current im-

age slice enables us to leverage a three-dimensional

context for training and prediction.

1) Convolutional Filters: In our ﬁrst convolu-

tional layer, we convolve our input with 64 5x5

kernels using a stride length of 1. As a result, we

reduce our input dimensions to (64-5+1)x(64-5+1)

= 60x60. For convenience, we notate this layer as

C(64, 5, 1). Thereafter, we apply max-out to reduce

the number of ﬁlters from 64 to 32. In doing so,

we combine ﬁlters by taking the maximum for each

pixel across 2 consecutive ﬁlters, noted M(2). This

reduces the number of ﬁlters by a factor of 2.

Thereafter, we sub-sample a ﬁlter with 2x2 windows

and pool the maximum, noted P(2). This process,

known as max-pooling, further reduces the input

space to 30x30. Finally, we repeat this process using

a different set of parameters. Using the notation

C(number of ﬁlters, kernel size, length of stride),

M(maxout pieces), P(pooling window shape), the

## Cole Diamond

I am a Master's student in IACS. I did my undergrad at Columbia in Computer Science.