Perceptron
Introduction
The Perceptron Learning Algorithm (PLA) is a supervised learning algorithm widely employed for binary classification tasks. Its primary objective is to determine a decision boundary that effectively separates the two classes in the dataset. This algorithm belongs to the class of discriminative classification methods as it focuses on modeling the boundary between classes instead of characterizing the underlying probability distribution of each class.
Let
The algorithm is founded on the following assumptions:
if , otherwise .- Linear Separability Assumption: The Linear Separability Assumption is a fundamental assumption made in various machine learning algorithms, including the Perceptron Learning Algorithm. It posits that the classes to be classified can be accurately separated by a linear decision boundary. In other words, there exists a hyperplane in the feature space that can effectively segregate the data points of the two classes.
The objective function is defined as follows:
Even if
Under the Linear Separability Assumption, assuming the existence of
- Initialize
- Until Convergence:
- Select a
pair from the dataset - If
- Do nothing
- Else
- Update the weight vector:
- Update the weight vector:
- End
- Select a
Analysis of the Update Rule
For a given training example
- If the perceptron’s prediction on
is correct (i.e., ), no update is performed. - If the perceptron’s prediction on
is incorrect (i.e., ), the weights are updated by adding the product of the input vector and the correct output to the current weight vector: . - It is important to note that the update occurs solely in response to the current data point. Consequently, data points that were previously classified correctly may not be classified similarly in future iterations.
This update rule effectively adjusts the decision boundary in the direction of correct classification for the misclassified example. The algorithm is guaranteed to converge to a linearly separable solution if the data is indeed linearly separable. However, if the data is not linearly separable, the perceptron algorithm may not converge to a solution.
Further Assumptions
We introduce three additional assumptions:
- Linear Separability with
-Margin: A dataset is considered linearly separable with a -margin if there exists such that holds for all , where .
- Radius Assumption: Let
be a constant such that , . In other words, denotes the length of the data point farthest from the center. - Normal Length for
: Assume that has unit length.
Proof of Convergence of Perceptron Algorithm
We denote the current mistake number as
Furthermore, we have:
Combining equations
Consequently, the above equation establishes an upper bound on the number of mistakes for datasets conforming to the Linear Separability with