Motivation from Biological Neuron

Introduction

Artificial neurons, the foundational units in artificial neural networks, find their roots in biological neurons, a term coined in the 1890s to describe the brain’s processing units.

Biological Neurons

Components

Dendrite: Functions as a signal receiver from other neurons.
Synapse: The connection point between neurons.
Soma: The central processing unit for information.
Axon: Transmits processed information to other neurons.

Illustration

In a simplified depiction, sense organs interact with the external environment, and neurons process this information, potentially resulting in physical responses, such as laughter.

Neural Network Architecture

Layered Structure: Neurons are organized into layers.
Interconnected Network: The human brain comprises approximately 100 billion neurons.
Division of Work: Neurons may specialize in processing specific information types.
Example: Neurons responding to visual, auditory, or textual stimuli.

Multi-Layer Perceptrons (MLPs)

Definition

Neural networks with multiple layers.

Information Processing

Initial neurons interact with sensory organs, and subsequent layers perform increasingly intricate processing.

Demonstration

Using a cartoon illustration: Neurons in the visual cortex detect edges, form features, and recognize objects.

Layer Functions

Layer 1: Detects edges and corners.
Subsequent Layers: Organize information into features and recognize complex objects.

Abstraction

Each layer processes more abstract representations of the input.

Information Flow

Input traverses through layers, resulting in a physical response.

McCulloch-Pitts Neuron and Thresholding Logic

Introduction

Objective: Comprehend the McCulloch-Pitts neuron, a simplified computational model inspired by biological neurons.
Historical Context: Proposed in 1943 by McCulloch (neuroscientist) and Pitts (logician).
Purpose: Emulate the brain’s complex processing for decision-making.

Neuron Structure

Components: Divided into two parts - g and f.
g (Aggregation): Aggregates binary inputs via a simple summation process.
f (Decision): Makes a binary decision based on the aggregation.
Excitatory and Inhibitory Inputs: Inputs can be either excitatory (positive) or inhibitory (negative).

Functionality

Aggregation Function g(x):
- Represents the sum of all inputs using the formula \(g(x) = \sum_{i=1}^{n} x_i\), where \(x_i\) is a binary input (0 or 1).
Decision Function f(g(x)):
- Utilizes a threshold parameter \(\theta\) to determine firing.
- Decision is \(f(g(x)) = \begin{cases} 1 & \text{if } g(x) \geq \theta \\ 0 & \text{otherwise} \end{cases}\).

Boolean Function Implementation

Examples:
- Implemented using McCulloch-Pitts neuron for boolean functions like AND, OR, NOR, and NOT.
- Excitatory and inhibitory inputs utilized based on boolean function logic.

Geometric Interpretation

In 2D:
- Draws a line to separate input space into two halves.
In 3D:
- Uses a plane for separation.
For n Inputs:
- Utilizes a hyperplane for linear separation.

Linear Separability

Definition: Boolean functions representable by a single McCulloch-Pitts neuron are linearly separable.
Implication: Implies the existence of a plane (or hyperplane) separating points with output 0 and 1.

Perceptrons and Boolean Functions

Introduction

Perceptrons, introduced by Frank Rosenblatt circa 1958, extend the concept of McCulloch-Pitts neurons with non-Boolean inputs, input weights, and a learning algorithm for weight adjustment.

Perceptron Model

Mathematical Representation

The perceptron is represented as \(y = 1\) if \(\sum_{i=1}^{n} w_i x_i \geq \text{threshold}\); otherwise, \(y = 0\).

Notable Differences

Inputs can be real, not just Boolean.
Introduction of weights, denoted by \(w_i\), indicating input importance.
Learning algorithm to adapt weights based on data.

Neater Formulation

The equation is rearranged for simplicity: \(\sum_{i=0}^{n} w_i x_i \geq 0\), where \(x_0 = 1\) and \(w_0 = -\text{threshold}\).

Motivation for Boolean Functions

Boolean functions provide a foundation for understanding perceptrons. For instance, predicting movie preferences using Boolean inputs such as actor, director, and genre.

Importance of Weights

Weights signify the importance of specific inputs in decision-making. Learning from data helps adjust weights, reflecting user preferences. For example, assigning a high weight to the director may heavily influence the decision to watch a movie.

Bias (\(w_0\))

\(w_0\) acts as a bias or prior, influencing decision-making. It represents the initial bias or prejudice in decision-making. Adjusting \(w_0\) alters the decision threshold, accommodating user preferences.

Implementing Boolean Functions

Perceptrons can implement Boolean functions with linear decision boundaries. For instance, implementing the OR function with a perceptron involves a geometric interpretation where a line separates positive and negative regions based on inputs.

Errors and Adjustments

Errors arise when the decision boundary misclassifies inputs. The learning algorithm adjusts weights iteratively to minimize errors and enhance accuracy. It’s an iterative process where weights are modified until the desired decision boundary is achieved.

Errors and Error Surfaces

Introduction

This section delves into errors within the context of perceptrons and introduces error surfaces as a recurring theme in the course, with a focus on understanding errors related to linear separability.

Perceptron for AND Function

Consideration of the AND function showcases an output of 1 for a specific input (green) and 0 for others (red). The decision is based on \(w_0 + w_1x_1 + w_2x_2 \geq 0\), with \(w_0\) fixed at -1. Exploration of the impact of \(w_1\) and \(w_2\) on the decision boundary is undertaken.

Errors and Decision Boundaries

Demonstration of errors occurs with specific \(w_1\) and \(w_2\) values, showcasing misclassified points due to incorrect decision boundaries. Variability in errors is noted based on different weight values.

Error Function

Viewing error as a function of \(w_1\) and \(w_2\) is introduced. The concept of error surfaces is brought in, where error is plotted against \(w_1\) and \(w_2\) values, each region on the surface corresponding to a distinct error level.

Visualizing the Error Surface

The error surface is plotted for \(w_1\) and \(w_2\) values in the range -4 to +4. Each region on the surface corresponds to a distinct error level, highlighting the utility of visualizations in comprehending perceptron behavior.

Perceptron Learning Algorithm

Exploration of the necessity for an algorithmic approach to finding optimal \(w_1\) and \(w_2\) values is undertaken. Limitations in visual inspection, especially in higher dimensions, are acknowledged. A teaser for the upcoming module on the perceptron learning algorithm is provided as a solution for finding suitable weight values algorithmically.

Perceptron Learning Algorithm

Overview

This module focuses on the Perceptron Learning Algorithm, building upon the perceptron’s concept and introducing a method to iteratively adjust weights for accurate binary classification.

Motivation

The perceptron, initially designed for boolean functions, finds practical application in real-world scenarios. Consider a movie recommendation system based on past preferences, where features include both boolean and real-valued inputs. The goal is to learn weights that enable accurate predictions for new inputs.

Algorithm

Notations

\(p\): Inputs with label 1 (positive points)
\(n\): Inputs with label 0 (negative points)

Convergence

Convergence is achieved when all positive points satisfy \(\sum w_i x_i > 0\) and all negative points satisfy \(\sum w_i x_i < 0\).

Steps

Initialization: Randomly initialize weights \(w\).
Iterative Update:
- While not converged:
  - Pick a random point \(x\) from \(p \cup n\).
  - If \(x\) is in \(p\) and \(w^T x < 0\), update \(w = w + x\).
  - If \(x\) is in \(n\) and \(w^T x \geq 0\), update \(w = w - x\).

Geometric Interpretation

Understanding the geometric relationship involves recognizing that the angle between \(w\) and a point on the decision boundary is 90 degrees. Positive points’ angles should be acute (< 90 degrees), and negative points’ angles should be obtuse (> 90 degrees). Iteratively adjusting \(w\) aligns it better with correctly classified points.

Perceptron Convergence Proof

Introduction

The objective of this lecture is to present a formal proof establishing the convergence of the perceptron learning algorithm. The primary focus is to rigorously determine whether the algorithm exhibits convergence or continues weight updates indefinitely.

Definitions

Absolutely Linearly Separable Sets
- Consider two sets, \(P\) and \(N\), in an \(n\)-dimensional space. They are deemed absolutely linearly separable if there exist \(n + 1\) real numbers \(w_0\) to \(w_n\) such that the following conditions hold: \[ w_0x_0 + w_1x_1 + \ldots + w_nx_n \geq 0 \quad \text{for every } \mathbf{x} \in P \] \[ w_0x_0 + w_1x_1 + \ldots + w_nx_n < 0 \quad \text{for every } \mathbf{x} \in N \]
Perceptron Learning Algorithm Convergence Theorem
- If sets \(P\) and \(N\) are finite and linearly separable, the perceptron learning algorithm will update the weight vector a finite number of times. This implies that after a finite number of steps, the algorithm will find a weight vector \(\mathbf{w}\) capable of separating sets \(P\) and \(N\).

Proof

Setup

Define \(P'\) as the union of \(P\) and the negation of \(N\). Normalize all inputs for convenience.

Assumptions and Definitions

Assume the existence of a normalized solution vector \(\mathbf{w^*}\). Define the minimum dot product, \(\delta\), as the minimum value obtained by dot products between \(\mathbf{w^*}\) and points in \(P'\).

Perceptron Learning Algorithm

The perceptron learning algorithm can be expressed as follows:

Initialization:
- Initialize weight vector \(\mathbf{w}\) randomly.
Iteration:
- At each iteration, randomly select a point \(\mathbf{p}\) from \(P'\).
- If the condition \(\mathbf{w}^T\mathbf{p} \geq 0\) is not satisfied, update \(\mathbf{w}\) by \(\mathbf{w} = \mathbf{w} + \mathbf{p}\).

Normalization and Definitions

Normalize all inputs, ensuring the norm of \(\mathbf{p}\) is 1. Define the numerator of \(\cos \beta\) as the dot product between \(\mathbf{w^*}\) and the updated weight vector at each iteration.

Numerator Analysis

Show that the numerator is greater than or equal to \(\delta\) for each iteration.

For a randomly selected \(\mathbf{p}\), if \(\mathbf{w}^T\mathbf{p} < 0\) and an update is performed, the numerator is:

\[ \mathbf{w^*} \cdot (\mathbf{w} + \mathbf{p}) \geq \delta \]

Denominator Analysis

Expand the denominator, the square of the norm of the updated weight vector:

\[ \|\mathbf{w} + \mathbf{p}\|^2 = \|\mathbf{w}\|^2 + 2\mathbf{w}^T\mathbf{p} + \|\mathbf{p}\|^2 \]

Show that the denominator is less than or equal to a value involving \(k\), the number of updates made:

\[ \|\mathbf{w} + \mathbf{p}\|^2 \leq \|\mathbf{w^*}\|^2 + k \]

Combining Numerator and Denominator

Use the definition of \(\cos \beta\) to conclude that \(\cos \beta\) is greater than or equal to a certain quantity involving the square root of \(k\):

\[ \cos \beta \geq \frac{\delta}{\sqrt{k}} \]

Conclusion

In this week’s lectures on deep learning, we explored fundamental concepts related to artificial neural networks, focusing on the motivation drawn from biological neurons, the structure of McCulloch-Pitts neurons, and the evolution into perceptrons. We delved into the mathematical representations, functionalities, and the learning algorithm associated with perceptrons. Additionally, we discussed errors and error surfaces, highlighting their significance in understanding the behavior of perceptrons.

The exploration continued with an in-depth examination of the Perceptron Learning Algorithm, emphasizing its application in real-world scenarios. The algorithm’s convergence was rigorously presented, providing a formal proof for its finite number of weight updates when dealing with linearly separable data.

Points to Remember

Biological Neurons: Understanding the components of biological neurons, including dendrites, synapses, soma, and axons, served as the foundation for artificial neural networks.
McCulloch-Pitts Neuron: The simplified computational model, inspired by biological neurons, introduced the aggregation and decision functions, showcasing its application in implementing boolean functions.
Perceptrons: Extending the McCulloch-Pitts model, perceptrons introduced real-valued inputs, weights, and a learning algorithm for weight adjustments, emphasizing their significance in decision-making.
Perceptron Learning Algorithm: The iterative algorithm for adjusting weights in perceptrons was presented, highlighting its application in scenarios with boolean and real-valued inputs.
Convergence Proof: A formal proof established the convergence of the Perceptron Learning Algorithm for linearly separable data, emphasizing its practical applicability in real-world scenarios.
Geometric Interpretation: Recognizing the geometric relationship between the weight vector and decision boundaries provided insights into how the algorithm aligns with correctly classified points.
Error Surfaces: Visualizing error surfaces proved essential in comprehending perceptron behavior, showcasing the impact of weight adjustments on error levels.
Practical Applications: Motivation for the algorithm was drawn from real-world applications, such as movie recommendation systems, emphasizing the practicality and relevance of the discussed concepts.
Necessity of Learning Algorithm: The iterative nature of weight adjustments was emphasized, recognizing the limitations of visual inspection, especially in higher dimensions.
Implications of Convergence Proof: The convergence proof provided assurance that, when dealing with linearly separable data, the perceptron learning algorithm will converge after a finite number of updates.