Fast Algorithms for Membrane and Synapse Detection

  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1
  • slidebg1

Team

We are four students from Harvard SEAS

Cole Diamond

I am a Master's student in IACS. I did my undergrad at Columbia in Computer Science.

Hallvard Nydal

I am a student in SEAS from Norway. I used to be a professional cross-country skier.

Raphaël Pestourie

I'm a PhD student in Applied Physics. I have lost count of all the degrees I have. Also, I am French.

Mina Nassif

I'm Mina. I'm a Master's student in IACS, originally from Egypt. Fun fact: passport control mispelled my name when I was applying for a visa, so now I am legally registered in the US as Mena instead of Mina.

Full Paper

Download Paper

HARVARD UNIVERSITY
APPLIED COMPUTATION 297R
CAPSTONE PROJEC T
Fast Algorithms for Membrane and Synapse
Detection
Authors:
Hallvard Moian NYDAL
Cole DIAMOND
Rapha
¨
el PESTOURIE
Mina NASSIF
Professor:
Pavlos PROTOPA PA S
Project Parnter:
Verena KAYNIG -FITTKAU
Teaching Fellow:
Thouis Ray JONES
May 15, 2015
Fast Algorithms for Membrane and Synapse
Detection
Hallvard Moian Nydal
Institute for Applied
Computational Science
Harvard University
Cole Diamond
Institute for Applied
Computational Science
Harvard University
Rapha
¨
el Pestourie
School of Engineering
and Applied Science
Harvard University
Mina Nassif
Institute for Applied
Computational Science
Harvard University
AbstractConnectomics is the study of the structural
and functional connections among brain cells. By com-
paring diseased connectomes and healthy connectomes,
physicians and researchers alike can gain valuable insight
into neurodegenerative disorders like Alzheimers. As au-
tomated approaches to scanning-electron microscopy at
nano-resolution have yielded increasingly larger data-sets,
the need for scalable data processing has become crucial.
Using a set of manually-annotated images, we perform
binary classification of the pixels to detect cell membranes
and synaptic clefts. Compared to state-of-the-art pixel-wise
prediction method, we show that predicting multiple pixels
in batches cuts training and prediction time by a factor of
100, while sacrificing little by way of predictive accuracy.
Index Terms—Connectomics, Convolutional Neural Net-
works, Window Prediction
I. INTRODUCTION
The connectome is the wiring diagram of the
brain and nervous system. Mapping the network of
neurons in organisms is vital for discovering the
underlying architecture of the brain and investigat-
ing the physical underpinning of cognition, intelli-
gence, and consciousness. It is also an important
step in understanding how connectivity patterns are
altered by mental illnesses and various pathologies.
Advancements in scanning electron microscopy has
made feasible the 3D reconstruction of neuronal
processes at the nano-scale (see figure 1).
However, connectomics at the microscale has
proven to be a trying task. With manual annotation
and reconstruction, mapping cannot be completed
within a realistic time frame [1,2]. As automated ap-
proaches to scanning-electron microscopy at nano-
resolution have yielded increasingly larger data-sets,
there is strong interest in finding more scalable
Fig. 1: Automated Reconstruction of Neuronal Processes.
Image Credit: Kaynig et al. [5]
approaches to automating identification of neurons
and synapses.
The main contribution of our research is a fast and
scalable algorithm for detecting membrane edges
and synapses. By predicting multiple pixels simul-
taneously, we are able to cut training and prediction
time while sacrificing little accuracy.
II. PRIOR WORK
Of seminal importance in the realm of large-scale
reconstruction of neuron mappings is the work done
by Kaynig et al. [5]. The authors propose a pipeline
for automatically reconstructing neuronal processes
from large-scale electron microscopy image data.
The pipeline is represented graphically below. Our
focus is on the stage of the pipeline concerning
recognition of cell membranes and synapses.
The proposed pipeline is a viable solution for
large-scale connectomics, although training time is
prohibitively long [5]. From critical path analysis,
we see that the bottleneck of training occurs during
the membrane classification stage. In the implemen-
tation by Kaynig et al., each pixel is classified.
Fig. 2: Complete workflow for large-scale neuron reconstruc-
tion. Image Credit: Kaynig et al. [5]
Drawing from the work of Hinton in road classifi-
cation, predicting patches of pixels simultaneously
may improve the throughput of a system with minor
compromise to accuracy [6]. We also draw from
Hinton in applying Gaussian thickening to edges to
improve predictive accuracy.
Recently, synapses have become the focus of
classification and region segmentation efforts. An
example synapse is shown in figure 3. In a bench-
mark study by Becker et al., a high Jaccard index
for different values of ’exclusion zone thickness’
for synapse classification was achieved [4]. Our
work attempts to apply batch prediction methods
to membrane detection as well as synapse synapse
detection, where we will use Becker et al. as a point
of comparison.
Fig. 3: A synapse’s morphology is marked by vesicles, a
synaptic cleft and a post-synaptic region.
III. DATA SANITIZATION
In order to augment the predictive power of our
models, we pre-process our training, testing and
validation data using a variety of techniques. Pre-
processing reduces noise from the original data and
underscores important features. We employ seven
steps in our pre-processing pipeline: feature normal-
ization, edge-detection, stochastic rotation, adaptive
histogram equalization, shuffling, and dataset bal-
ancing.
A. Feature Normalization
The first step of our feature-normalization proce-
dure is reshaping our images to 1024 1024 pixels.
Therefore, we perform standardization along the
columns where we subtract the mean of the column
and divide by its standard deviation, z =
xµ
.
B. Edge-Detection
For cell-membrane detection, we use edge-
detection to binarize our data. To do so, we perform
a 2D convolution using the scharr kernel [10]. We
also utilize edge-thickening as in Hinton [6] to
reduce the sparsity of the feature. To do so, we grow
the edges by converting all pixels in a neighborhood
of an existing edge to edge pixels. Thereafter, we
apply a gaussian blur to our data to make our
convolutional neural network more robust to noise.
C. Stochastic Rotation
Rowley et al. showed that stochastic rotations
can introduce rotation invariance [2]. In order to
introduce rotation invariance, we randomly rotate
our training data by 90, 180 or 270 degrees.
D. Shuffling
To avoid over-fitting, we shuffled our data using
two different schemes. At first, we permuted the
order of the sub-windows which we use for training
to ensure that we did not train on the same sample
twice within a batch. To simplify matters, we then
decided to randomly sample the training data to use
in our batch. Since the probability that we train
on the same input twice within a batch is small as
the amount of data grows, we opted for the latter,
simpler shuffling scheme.
E. Adaptive Histogram Equalization
Adaptive Histogram Equalization (AHE) is a pop-
ular and effective algorithm for improving local
contrast of the images which contain bright or dark
regions. An example
AHE differs from ordinary histogram equaliza-
tion in the respect that AHE computes several
Fig. 4: Comparison of an image without (left) and with
(right) adaptive histogram equalization. Image credit: Sujin
Philip sujin@cs.utah.edu
histograms, each corresponding to a distinct section
of the image; the method then uses the various
histograms to redistribute the lightness values of
the image. The algorithm of AHE is depicted in
Algorithm 1 in the appendix [15].
F. Data Balancing
For both synapse detection and cell-membrane
detection, it is important that the training set con-
tains an equal number of positive and negative ex-
amples. Accordingly, we center half of our training-
data on edges and synapses to effectively bias our
samples.
IV. CONVOLUTIONAL NEURAL NETWORKS
A. Background
Convolutional neural networks have become state
of the art in a wide range of machine learning
problems, including object recognition and image
classification [9]. Convolutional neural networks
are able to achieve exceptional performance for a
multitude of different problems. Since convolutional
neural networks exhibit locality, translation invari-
ance, and hierarchical structure (see figure 5) , they
make for a natural pairing for object recognition.
The challenge with convolutional neural networks
is the relatively long prediction and training time.
A typical convolutional network architecture con-
sists of several stages. In the first stage, a series of
convolutions are performed in parallel to produce a
set of presynaptic activations. In the detection stage,
each pre-synaptic activation is run through a nonlin-
ear activation function, such as the rectified linear
activation function. Thereafter, a pooling function
replaces the output of the net at a certain location
with a summary statistic of the nearby outputs.
Pooling makes features invariant from the location
Fig. 5: CNN’s can capture the natural hierarchy of images.
Image credit: Lee et al., ICML, 2009
in the input image, and activations less sensitive
to neural network structure. Often, an additional
stage is added implementing the max-out [18] and
dropout [19] techniques discussed in subsequent
sections. Figure 6 shows the architecture of a typical
convolutional network.
Fig. 6: A typical Convolutional Neural Network consist-
ing of six layers for MNIST classification. Image credit:
https://engineering.purdue.edu/ eigenman
1) Rectified Linear Units: There are several
choices when it comes to activation functions in
neural networks. All are variations of the output of a
single node in a neural network, z
ij
= x
T
W
ij
+ b
ij
where W 2 R
dmk
and b 2 R
mk
are learned
parameters. In contrast to alternative activation func-
tions such as sigmodial functions, f(z)=(1+
exp(z))
1
and tanh functions, f (z)=tanh(z),
rectifier linear units are unbound and can represent
any non negative real value.
Rectifier linear units use the activation function
rect(z)=max(0,z). The activation function has
good sparsity properties due to having a real zero
activation value, and therefore, suffers less from Di-
minishing Gradient Flow. As a result, training time
decreases by a substantial factor. Alex Krizhevsky
et al. [16] decreases in training speed of up to
600 % on benchmark simulations on the ImageNet
data set Ergo, we use rectified linear units as the
activation function of choice for our convolutional
neural networks.
2) Dropout: The large number of neurons and
their associated connections makes neural networks
prone to over-fitting. Dropout is a stochastic tech-
nique to reduce architectural dependence wherein
neurons and their incoming and outgoing connec-
tions are omitted with a predefined probability.
Empirical results from Hinton et al. [19] shows that
dropout yields a significant performance improve-
ment in predictive performance over a wide range
of data sets. Dropout can be understood as an effi-
cient way to perform model averaging with neural
networks and to prevent complex co-adaptation on
the training data.
Fig. 7: In dropout training, units are temporarily removed
from a network, along with all its incoming and outgoing
connections. Image credit: Joost Van Doorm, J.vanDoorn-
1@student.utwente.nl
3) Maxout: In a convolutional network, a maxout
feature map can be constructed by taking the maxi-
mum across k affine feature maps (i.e., pool across
channels). A single maxout unit can be interpreted
as making a piecewise linear approximation to an
arbitrary convex function (see figure 8). The maxout
activation function is designed to never have a
gradient of zero, allowing the network using maxout
and dropout to achieve a good approximation to
model averaging.
Fig. 8: Maxout activation function can implement the rec-
tified linear, absolute value rectifier, and approximate the
quadratic activation function. Image credit: Goodfellow et al.
[18]
4) Root Mean Square Propagation: RMSProp
[21] uses a moving average over the root mean
squared gradients to normalize the current gradient.
Letting f
0
(t) be the derivative of the cost function
with respect to the parameters at time step t, be the
step rate, and the decay term, we perform the fol-
lowing updates: r
t
=(1 )f
0
(
t
)
2
+ r
t1
;
t+1
=
2
p
r
t
f
0
(
t
);
t+1
=
t
t+1
We use RMSProp instead
of stochastic gradient descent in our convolutional
neural network as it achieves greater classification
performance on our dataset
B. Convolutional Neural Network Architecture
Our convolutional neural network consists of
twelve layers (see figure 9). For the input layer, we
take 64x64 pixel sub-windows from three images
slices located consecutively in the Z dimension.
Using image slices above and below the current im-
age slice enables us to leverage a three-dimensional
context for training and prediction.
1) Convolutional Filters: In our first convolu-
tional layer, we convolve our input with 64 5x5
kernels using a stride length of 1. As a result, we
reduce our input dimensions to (64-5+1)x(64-5+1)
= 60x60. For convenience, we notate this layer as
C(64, 5, 1). Thereafter, we apply max-out to reduce
the number of filters from 64 to 32. In doing so,
we combine filters by taking the maximum for each
pixel across 2 consecutive filters, noted M(2). This
reduces the number of filters by a factor of 2.
Thereafter, we sub-sample a filter with 2x2 windows
and pool the maximum, noted P(2). This process,
known as max-pooling, further reduces the input
space to 30x30. Finally, we repeat this process using
a different set of parameters. Using the notation
C(number of filters, kernel size, length of stride),
M(maxout pieces), P(pooling window shape), the
Fig. 9: An illustration of the architecture of our convolutional network.
entire convolution sequence can be expressed as
C(64, 3, 1), M(2), P(2), and C(128, 3, 1), M(4),
P(2).
2) Fully-Connected Network: After the convo-
lution stage, our input is forwarded into a fully-
connected network. The network consists of one
hidden layer containing 1152 hidden units, and an
output layer containing 48 48 = 2304 units. The
output layer uses a logistic regression function to
output predictions on a 48 48 window of pixels.
3) Cost Function: Drawing from the work of
Rudin et al., we use total variation regulariza-
tion (TVR) for our cost-function to reduce noise
[22]. The TVR problem amounts to minimizing
the following discrete function over the signal y
n
:
E(x, y)+V (y). Since our signal is anisotropic, we
implement TVR as follows:
V
aniso
(y)=
X
i,j
q
|y
i+1,j
y
i,j
|
2
+
q
|y
i,j+1
y
i,j
|
2
=
X
i,j
|y
i+1,j
y
i,j
| + |y
i,j+1
y
i,j
|.
Lastly, we use root mean square propagation to
update our parameters. The entire architecture is
summarized in table I.
C. Implementation
Theano [17] is Python library for defining, opti-
mizing and evaluating expressions involving high-
level operations on tensors. Developed to facilitate
research in Machine Learning, Theano is able to
TABLE I: Parameters of the Convolutional Deep
neural Network
Batch Size 30
Conv Layers 3
Conv Out Channels [64,64,128]
Conv Kernel Shape [5,3,3]
Kernel Stride [1,1,1]
Fully Connected Layers 1
Max Pooling [2,2,2]
Maxout [2,2,4]
Dropout [0.2,0.2,0.2,0.5]
Learning Rate 1.00E-07
Learning Rule RMSProp
Regularization coefficient 1
attain speeds rivaling hand-crafted C implementa-
tions for problems involving huge amounts of data.
It can also surpass C on a CPU by many orders
of magnitude by leveraging GPU computing. That
said, Theano can also generate customized C code
for many mathematical operations. In combining as
aspects of a computer algebra system (CAS) with
aspects of an optimizing compiler, Theano is partic-
ularly useful for tasks in which complicated math-
ematical expressions are evaluated repeatedly and
evaluation speed is critical. Consequently, Theano
is the software of choice for implementing deep
learning solutions.
V. E VA L UAT I O N METHODOLOGY
A. Evaluating Cell Membrane Segmentation
1) Watershed Segmentation: For membrane de-
tection, we first post-process the predicted synapses
using watershed segmentation. The watershed trans-
form treats its input as a topographic map, and
simulates flooding over the topography with water.
The watershed regions are the parts of the map
which ”hold water” without spilling into other re-
gions. Figure 10 below illustrates the output of the
watershed segmentation on our predicted synapses.
Fig. 10: The watershed of an image is similar to the notion
of a catchment basin in a contour-relief map; a drop of water
follows the gradient of a hill, flowing along a path to reach a
local minimum.
2) Variation of Information: Once we have ob-
tained a segmentation using the watershed algo-
rithm, we compare it to the ground-truth. To do so,
we use variation of information, which is defined as
follows:
Suppose we have two partitions of an image A:
1) X - our predicted segmentation and 2) Y - the
labeled segmentation, or ground truth.
X = {X
1
,X
2
,..,,X
k
},Y = {Y
1
,Y
2
,..,,Y
l
}
.
Let n =
i
|X
i
| =
j
|Y
j
| = |A|,p
i
= |X
i
|/n,
q
j
= |Y
j
|/n, r
ij
= |X
i
\ Y
j
|/n. Then, the variation
of information between the two partitions is:
VI(X; Y )=
X
i,j
r
ij
[log(r
ij
/p
i
)+log(r
ij
/q
j
)]
B. Evaluating Synapse Prediction
For synapse prediction, we use the F1 score,
which is the harmonic mean of precision and re-
call. More formally, the F1 score is defined as
F
1
=2·
precision·recall
precision+recall
. Moreover, precision is defined
as the ratio of true positives to the sum of true
positives and false positives, and recall as the ratio
of true positives to the sum of true positives and
false negatives.
To maximize the F1 score, we need to choose an
optimal threshold for binary classification. Although
the ROC (Receiver Operating Characteristics) curve
11 gives us valuable insights into where this thresh-
old might occur, our method is more systematic.
By iterating over equally-spaced threshold values
from zero to one, computed an F1 score at each
gradation, we can choose the threshold which yields
the highest F1 score on average.
Fig. 11: Figure showing the ROC curve for all the points
in all the images. The threshold should be around the point
where the slope changes.
VI. RESULTS
A. Batch Prediction
As we have seen, window-wise prediction allows
us to use variation regularization as a de-noising
mechanism. We also leverage window-wise predic-
tion to perform prediction averaging, resulting in
empirically smoother, and more accurate output. In
order to compute multiple predictions for each pixel,
we use a stride of 2412 for our 4848 prediction
window. Consequently, we have 4 to 16 predictions
per pixel, depending on a pixel’s location within the
image slice. Figure 13 evinces the qualitative impact
of the overlapping-window prediction scheme.
B. Computational Performance
To evaluate the effect of window predictions on
computational speed, we computed predictions on
a Tesla C2070 GPU for various window sizes.
The predictions used non-overlapping windows for
Fig. 12: Figure illustrating window-wise prediction. Each
64 64 pixel input yields a 48 48 pixel output prediction.
Fig. 13: (Left) Predicted membranes without overlapping
windows. (Right) Predicted membranes with overlapping win-
dows of 48 48 and a stride of 24 12.
the sake of expediency. As shown in Figure 14,
the results demonstrate a significant speedup with
increasing window sizes. For the largest window,
a speedup of 140 times the prediction speed of
single-pixel output is achieved.
Fig. 14: Prediction speed vs output window sizes on Tesla
C2070 GPU.
Interestingly, the increasing window sizes did not
seem to impact predictive accuracy in a meaningful
way (see figure 15). The implication of this result is
that we can perform prediction without sacrificing
accuracy.
C. Membrane Prediction
A sample output for membrane cell prediction
is shown in Figure 16. Although the results from
Fig. 15: Effect of output window size of 64 64 pixels
and 48 48 pixels on accuracy for membrane detection with
CNNs.
simulations with overlapping windows and TVR
look qualitatively better than simulations without
using these methods, no difference is seen for pixel-
wise accuracy for either membranes and synapses.
For this reason, we believe pixel-wise accuracy to
be a flawed evaluation metric.
D. Synapse Prediction
An sample output for membrane prediction is
shown in Figure 17. Overall, our F1 score for
synapse prediction is 0.48. Bear in mind that our
predictions are on image slices of dimension 8585
pixels rather than over the original 1024 1024
image. Examination of figure 17 reveals that our
prediction methodology can benefit from a post-
processing step to filter out the noise in output
prediction.
Fig. 17: Labeled synapses (left) and predicted synapses
(right). Prediction is performed using an output window of
size of 48 48 and a stride length of 24 12.
E. Multi-dimensional Context Cues
We compare the performance of a classifier using
two dimensional context against a classifier using
three dimensional context from neighboring image
slices above and below the current slice. Figure
Fig. 16: Input image (left), labeled membranes (middle) and predicted membranes (right). Prediction is performed using an
output window of 48 48 and a stride of 24 12.
18 evinces three-dimensional context outperforms
classifiers which use two-dimensional context.
Fig. 18: Accuracy of synapse classification with three-
dimensional context versus two-dimensional context
VII. CONCLUSION
We presented the design and architecture of a
convolutional neural network that utilizes batch
prediction methodologies to perform simultaneous
classification of synapses and cell membranes. The
proposed method is at least 100 times faster com-
pared to the state-of-art pixel-wise prediction. We
achieve a F1-score of .48 for synapse detection.
Using batch prediction on output window sizes up
to 48 48, we claim no qualitative loss of accuracy,
nor loss of pixel-wise accuracy.
VIII. FUTURE WORK
Additional gains in efficiency can be gleaned
from simultaneous prediction of both synapse and
cell membrane class membership. We propose two
approaches for simultaneous prediction. First, we
can re-architect our network to add more neurons
in the logistic regression layer, affording us the
predictive power to classify both edges and synapses
using just one network. Alternatively, we can im-
plement a new convolutional network to extract
various neuronal features such as pre-synaptic and
post-synatptic clefts, and vesicles. Thereafter, we
would wire the output of our multi-feature network
to the input of a second network which outputs
simultaneous multi-class classifications.
ACKNOWLEDGMENT
The authors would like to thank Professor Pro-
topapas for his valuable feedback and our Teach-
ing Fellow, Thouis Ray Jones, for his mentorship
throughout the course of the project. We also ac-
knowledge the efforts of our project partner, Verena
Kaynig-Fittkau.
REFERENCES
[1] Irimia, Andrei; Chambers, M.C., Torgerson, C.M., Filippou, M.,
Hovda, D.A., Alger, J.R., Gerig, G., Toga, A.W., Vespa, P.M.,
Kikinis, R., Van Horn, J.D. (6 February 2012). ”Patient-tailored
connectomics visualization for the assessment of white matter at-
rophy in traumatic brain injury”. Frontiers in Neurotrauma 3: 10.
doi:10.3389/fneur.2012.00010. PMC 3275792. PMID 22363313.
[2] Henry A. Rowley , Shumeet Baluja , Takeo Kanade, Neural
Network-Based Face Detection, IEEE Transactions on Pattern
Analysis and Machine Intelligence, v.20 n.1, p.23-38, January
1998
[3] J. G. White, E. Southgate, J. N. Thomson, S. Brenner ”The
Structure of the Nervous System of the Nematode Caenorhabditis
elegans” Phil. Trans. R. Soc. Lond. B: 1986 314 1-340; DOI:
10.1098/rstb.1986.0056. Published 12 November 1986
[4] Becker, C.; Ali, K.; Knott, G.; Fua, P., ”Learning Con-
text Cues for Synapse Segmentation, Medical Imaging, IEEE
Transactions on , vol.32, no.10, pp.1864,1877, Oct. 2013doi:
10.1109/TMI.2013.2267747
[5] V. Kaynig,, A. Vazquez-Reina,, S. Knowles-Barley,, M. Roberts,,
T. Jones,, N. Kasthuri,, E. Miller,, J. W. Lichtman, and H. Pfister.,
Large-scale automatic reconstruction of neuronal processes from
electron microscopy images. In arXiv: 1303.7186 [q-bio. NC),
2013.
[6] Mnih, V. Hinton, G. E. (2012), ”Learning to Label Aerial Images
from Noisy Data., in ’ICML , icml.cc / Omnipress
[7] Hinton, G., Salakhutdinov, R. (2006). ”Reducing the dimension-
ality of data with neural networks”. Science, 313(5786), 504-507
[8] Li Deng and Dong Yu (2014). ”DEEP LEARNING:
Methods and Applications”. Microsoft Research,
http://research.microsoft.com/apps/pubs/default.aspx?id=209355
[9] Alex Krizhevsky and Sutskever, Ilya and Geoffrey E. Hin-
ton (2012). ”ImageNet Classification with Deep Convolutional
Neural Networks”. Advances in Neural Information Processing
Systems 25, 1097–1105
[10] B. Jaehne, H. Scharr, and S. Koerkel. Principles of filter design.
In Handbook of Computer Vision and Applications. Academic
Press, 1999.
[11] Stark, L. (1980) Biological cybernetics
[12] Wiesel, T N (1968) ”Receptive Fields and Functional Archi-
tecture”. J. Physiol.215–243
[13] Hinton, Geoffrey E. and Srivastava, Nitish and Krizhevsky, Alex
and Sutskever, Ilya and Salakhutdinov, Ruslan R. (2012) Im-
proving neural networks by preventing co-adaptation of feature
detectors arXiv:1207.0580
[14] Mnih, Volodymyr and Hinton, Geoffrey E. (2010) Learning
to detect roads in high-resolution aerial images Lecture Notes
in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics)
[15] Wang Zhiming, Tao Jianhua. A Fast Implementation of Adaptive
Histogram Equalization. 8th international Conference on Signal
Processing, Nov, 2006.
[16] Karpathy, A. and Toderici, G. and Shetty, S. and Leung, T. and
Sukthankar, R. and Li Fei-Fei, Large-Scale Video Classification
with Convolutional Neural Networks, 10.1109/CVPR.2014.223
pages 1725-1732, 2014, June
[17] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G.
Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano:
A CPU and GPU math compiler in python. In S. van der Walt
and J. Millman, editors, Proceedings of the 9th Python in Science
Conference, pages 3-10, 2010.
[18] I. J. Goodfellow, D. Warde-farley, and A. Courville. Maxout
Networks. 2013.
[19] [4] Hinton, Geoffrey E. and Srivastava, Nitish and Krizhevsky,
Alex and Sutskever, Ilya and Salakhutdinov, Ruslan R. Improving
neural networks by preventing co-adaptation of feature detectors
arXiv:1207.0580 (2012)
[20] Deng, Li. ”The MNIST database of handwritten digit images
for machine learning research. IEEE Signal Processing Maga-
zine 29.6 (2012): 141-142.
[21] Dauphin, Y., de Vries, H., Chung J, Bengio, Y. ”RMSProp and
equilibrated adaptive learning rates for non-convex optimization,
CoRR, abs/1502.04390, 2015, http://arxiv.org/abs/1502.04390
[22] Rudin, L. I.; Osher, S.; Fatemi, E. (1992). ”Nonlinear total
variation based noise removal algorithms”. Physica D 60:
259–268. doi:10.1016/0167-2789(92)90242-f.
APPENDIX
Algorithm 1 Algorithm for Adaptive Histogram
Equalization
for every pixel i (with grey level l) in image do
Initialize array Hist to zero
for every contextual pixel j do
Hist[g(j)] = Hist[g(j)] + 1
end for
Sum: CHist
l
=
l
P
k=0
Hist(k)
l
0
= CHist
l
L/W
2
end for
Algorithm 2 Sketch of the Watershed Algorithm
1) A set of markers, pixels where the flooding
shall start, are chosen. Each is given a differ-
ent label.
2) The neighboring pixels of each marked area
are inserted into a priority queue with a pri-
ority level corresponding to the gray level of
the pixel.
3) The pixel with the highest priority level is
extracted from the priority queue. If the neigh-
bors of the extracted pixel that have already
been labeled all have the same label, then
the pixel is labeled with their label. All non-
marked neighbors that are not yet in the
priority queue are put into the priority queue.
4) Redo step 3 until the priority queue is empty.


































Contact

For more information, contact us!

52 Oxford Street, Cambridge, MA 02138, USA

617-495-1814