It was not until recently that I realize that matrix differentiation is significantly important when using matrix representation to do computation. After searching for some relevant materials and lecture notes, I found more useful formulas than I expected. Now I list some of them which are pretty handy for me and may possibly be helpful for you one day.
Definition
The notation \(\frac{\partial y}{\partial x}\) =
\[\begin{bmatrix}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\\vdots & \vdots & \vdots \\\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \\\end{bmatrix}\]
denotes the $m \times n$ matrix of first-order partial derivatives of the transformation from $x$ to $y$. Such a matrix is called the Jacobian matrix of the transformation matrix $\varphi ()$, if $y=\varphi(x)$, where $y$ is a $m\times1$ vector and $x$ is a $1\times n$ vector.
Proposition
(1) Let $y=Ax$ and $A$ does not depend on $x$, then:
\[\frac{\partial y}{\partial x}=A\]
(2) Let $y=Ax$ and $x$ be a function of $z$ and $A$ does not depend on $z$, then:
\[\frac{\partial y}{\partial z}=A\frac{\partial x}{\partial z}\]
(3) For scalar $\alpha = y^TAx$ and $A$ is independent of $x, y$, then:
\[\frac{\partial \alpha}{\partial x}=y^TA\]
\[\frac{\partial \alpha}{\partial y}=A^Ty\]
(4) For scalar $\alpha = x^TAx$ and $A$ is independent of $x$, then:
\[\frac{\partial \alpha}{\partial x}=x^T(A+A^T)\]
(5) Based on (4) and now $A$ is a symmetric matrix, then:
\[\frac{\partial \alpha}{\partial x}=2x^T(A)\]
(6) For scalar $\alpha = y^Tx$ and $x, y$ are functions of $z$, then:
\[\frac{\partial \alpha}{\partial z}= x^T\frac{\partial y}{\partial z} + y^T\frac{\partial x}{\partial z}\]
(7) For scalar $\alpha = x^Tx$ and $x$ is functions of $z$, then:
\[\frac{\partial \alpha}{\partial z}=2x^T\frac{\partial x}{\partial z}\]
(8) Let scalar $\alpha = y^TAx$, then:
\[\frac{\partial \alpha}{\partial z} = x^TA^T\frac{\partial y}{\partial z} + y^TA\frac{\partial x}{\partial z}\]
(9) Let scalar $\alpha = x^TAx$ and $x$ be function of $z$:
\[\frac{\partial \alpha}{\partial z} = x^T(A+A^T)\frac{\partial y}{\partial z}\]
(10) Based on (9) and now $A$ is a symmetric matrix, then:
\[\frac{\partial \alpha}{\partial z} = 2x^TA\frac{\partial x}{\partial z}\]
Definition
\[\frac{\partial A}{\partial \alpha} = \begin{bmatrix}\frac{\partial a_{11}}{\partial \alpha} & \frac{\partial a_{12}}{\partial \alpha} & \cdots & \frac{\partial a_{1n}}{\partial \alpha} \\\vdots & \vdots & \vdots \\\frac{\partial a_{m1}}{\partial \alpha} & \frac{\partial a_{m2}}{\partial \alpha} & \cdots & \frac{\partial a_{mn}}{\partial \alpha} \\\end{bmatrix}\]
Proposition
Let $A$ be a nonsingular $m\times m$ matrix, whose elements are functions of scalar parameter $\alpha$, then:
\[\frac{\partial A}{\partial \alpha} = -A^{-1}\frac{\partial A}{\partial \alpha}A^{-1}\]
Summary
This article lists some useful matrix differentiation formulas that inspired me when understanding the least squares approximation of linear systems. Yet, everything is still in the scope of linear algebra and calculus.