# Partial by-product of a multiclass straight classifier

I'm presently considering the formula for a multiclass classifier of the kind

$$\sum\limits_{i=1}^{N_I} \sum\limits_{k=1,\atop k \neq y_i}^{N_K} L(1+ \mathbf{w_k}\cdot\mathbf{x_i}-\mathbf{w_{y_i}}\cdot\mathbf{x_i})$$

where

$X$ is a $N_K \times N_F$ information matrix

$y$ is a vector of class tags

$W$ is an $N_K \times N_I$ matrix where each represents the weights for the hyperplane splitting one class from the remainder

$L$ is some approximate loss function returning an actual number eg square loss, outright, pivot etc

I've been informed that a solution for the above formula resembles

$$\partial_{w_k} = \sum\limits_{i=1 \atop y_i \neq k}^{N_I} \ L'(1+ \mathbf{w_k}\cdot\mathbf{x_i}-\mathbf{w_{y_i}}\cdot\mathbf{x_i})\cdot\mathbf{x_i} - \sum\limits_{i=1 \atop y_i =k}^{N_I} \sum\limits_{l=1,\atop l \neq k}^{N_K} L'(1+ \mathbf{w_k}\cdot\mathbf{x_i}-\mathbf{w_{y_i}}\cdot\mathbf{x_i})\cdot\mathbf{x_i}$$ Where $L'$ is the by-product of the loss function

This solution does not make good sense to me. Where does the 2nd summation ($\sum\limits_{i=1 \atop y_i =k}^{N_I} \sum\limits_{l=1,\atop l \neq k}^{N_K} )$ originated from? Nevertheless, in the first formula, when $i$ is such that $y_i=k$, we miss over it. Just how after that does this instance have a huge influence on the partial by-product? When I've tried to do this myself, I've obtained the first component of the by-product yet not the 2nd.

Related questions