XMBSMDSJ

2026

< Back to index

Deep Learning

Background

What is deep learning?

Some typical architectures

Applications

Building blocks

Optimization mathods

Second order methods can take better steps, however, each step will take longer to calculate

Building blocks of Deep Networks

Deep netowrks combine smaller networks as building blocks. Here are some building blocks:

Some architectures

Deep belief networks (Overtaken by CNN)

DBNs are composed of layers of RBMs for the pretrain phase and then a feed-forward network for the fine-tuning phase.

Feature extraction with RBM Layers

We ask RBM to reconstruct, it generates something pretty close to the original input vector. (Machine dream about data)

This is to learn these high level featues of a dataset in an unsupervised training fashion.

Initializing the feed-forward network

We then use these layers of features as initial weights in a traditional bp driven feed-forward NN. These initializations help training algorithms guide the parameters of the traditional NN towards better regions of parameter search space. This phase is known as fine-tune phase.

Gentle bp

Generative Adversarial Networks

GANs are an example of a network that uses unsupervised learning to train two models in parallel. A key aspect of GANs is how they use a parameter count that is significantly smaller than normal wrt the amount of data on which they’re training the network. The network is forced to efficiently represent the training data, making it more efficient data similar to the training data.

Given a large corpus of training images, we could build a generative NN that outputs images. We’d consider these generated output images to be samples from the model. The generative model in GANs generages such images while a secondary discriminiator network tries to classify these generated images.

The discriminator network is typically a standard CNN. Using a secondary NN as discriminator network allows GAN to train both NN in parallel in an unsupervised fashion. These discriminator networks takes images as input and output a classification.

The generative network in GANs generates data with a special kind of layer called deconvolutional layer. During training, use BP on both net. The goal is to update the generative net’s parameters such that it fools the discrinimator net, because the output is so realistic compared to the real images.

CNN

Three major groups:

ReLU

Why use ReLU in CNN?

ReLU (Rectified Linear Unit) crop the value at 0. It is easy to calculate and can accelerate the convergence due to linear property. But the train can die due to large gradient flow.

Leaky ReLU introduce a very small slope when $x<0$

Activation Map

Also referred to as feature map, calculated by sliding the kernel.

Start doing deep learning

CNN

Convolutional neural networks is used for image classification. It consists of following components:

Estimator in tensorflow

In tensorflow you can use either pre-made estimators, which are fully baked, or customized estimators, which can be instanciated by overriding some functions. In particular, the only difference between pre-made estimator and customized ones is that you will need to write the model function.

There are three methodes to be handled in a model function, which are TRAIN, PREDICT and EVALUATE. The function must reuturn corresponding EstimatorSpec for each case.

How to use Estimator

This section uses training phase for example.

Basically there are two functions we are going to define. First one is the model function. The model function is where we define the logic of the model, say the computational graph. We will have features, labels and mode as input and define layers, loss function and optimizers in this function.

Here is an example of a simplest CNN model function in the training phase

def cnn_model_fn(features, labels, mode):
    input_layer = tf.reshape(features, [-1, 128, 128, 1])
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=32,
        kernel_size=(5, 5),
        padding='same',
        activation=tf.nn.relu
    )
    # Define the first convolutional layer
    pool1 = tf.layers.max_pooling2d(inputs=conv1,
                                    pool_size=(2, 2),
                                    strides=2,
                                    )
    # Do max pooling
    conv2 = tf.layers.conv2d(
        inputs=pool1,
        filters=32,
        kernel_size=(5, 5),
        padding='same',
        activation=tf.nn.relu
    )

    pool2 = tf.layers.max_pooling2d(
        inputs=conv2,
        pool_size=(2, 2),
        strides=2
    )

    pool2flat = tf.reshape(pool2, (-1, 32*32*32))
    # Flatten the pooling layer to on dimentional tensor


    dense = tf.layers.dense(
        inputs=pool2flat,
        units=1024,
        activation=tf.nn.relu
    )
    dropout = tf.layers.dropout(
        inputs=dense,
        rate=0.4,
        training=mode == tf.estimator.ModeKeys.TRAIN)
    # Define the unit dropping out
    logits = tf.layers.dense(inputs=dropout, units=10)
    # One hot class representation at the output layer

    labels = tf.convert_to_tensor(np.array([1]))
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
    # Define the loss function
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    # Put loss function into the optimizer
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
    # Every model function must return EstimatorSpec for each phase

The second function is the input function. This function returns a tuple which is (features_dictionary, target), where target can be a feature. Tensorflow has some built-in input function, for example, numpy_input_fn, which accpets the features, target, batch-size, number of epoches… and returns a input function instance.

In this section, we will use an example of image training dataset streamed from a very large .tar file, which is impossible to fit into computer memory.

First of all, we have to implement a generator which streams data from tar file. Click here for dataset

def stream_tar_dataset(filename):
    t = tarfile.open(filename,
                 mode='r|*')

    while True:
        m = t.next()
        if m is None:
            break
        if m.isfile():
            name = m.name
            # Name in the form of test/personid/imagename
            name_structure = name.split('/')
            person_id = name_structure[1]
            image = t.extractfile(m)
            im0 = Image.open(image)
            im0 = im0.resize((128, 128), Image.ANTIALIAS).convert("L")
            im0 = np.array(list(im0.getdata()))
            yield tuple(im0) + (person_id,)

On thing we should notice here is that streaming dataset in tensorflow does not support any array types. For example, if we stream a 128*128 image out, the output must be converted to a tuple of length 16384. Then we can utilize this function to build up out input function:


def input_func_gen():

    stream_data = tf.data.Dataset.from_generator(
        generator=generator,
        output_types=(tf.float32,) * 16384 + (tf.string,),
    )

    line = stream_data.make_one_shot_iterator().get_next()
    f = line[:16384]
    l = line[-1]
    return f, l

This is a simplest input function which only output one data instance at one time. If techniques like minibatch is used, some advanced operations must be done here.