Disclaimer: The notes are made from information available here and wikipedia.

Conjugate gradient is effective for solving systems of the form:

$\mathbf{A}x=b$

$x$ is an unknown vector
$\mathbf{A}$ is a known positive-definite, square, symmetric matrix
$b$ is a known vector

The CG method is most effective in solving systems where $\mathbf{A}$ is sparse. In case where $\mathbf{A}$ is dense, the solution can be found by factorizing $\mathbf{A}$ and using back substitution. On the other hand, when $\mathbf{A}$ is sparse, factorising and back-substitution become lengthy procedures.

Now, say we are finding the optimal solution of a quadratic equation $f(x)$:

$f(x)=\frac{1}{2} x^T \mathbf{A}x-b^Tx+c$

For a positive definite square symmetric matrix $A$ the equation can be solved by finding the point where the gradient of $f(x)$ becomes = 0. Therefore the solution of $f'(x)=0=Ax-b$ gives the critical point of $f(x)$.

So why use CG?

The idea is to solve the equation not by finding the intersection of hyperplanes rather, by iterative minimization of the problem.

Method of Steepest Descent

The method of steepest descent entails starting from an arbitrary point $x_0$ and consequently taking steps along the direction opposite to gradient $f'(x)$ which is $-f'(x_i)=b-Ax_i$, to reach at the optimal point of $f(x)$. It is convenient to say:

$x_{k+1} = x_k+ \alpha_k p_k$

Where $\alpha_k$ is the step length taken in the direction of $p_k$ which is $-f'(x)$.

As error $e_i=x^*-x_i$ and $r_i=b-Ax_i$ $r_i$ is essentially the direction of the steepest descent i.e. $-f'(x)$. Therefore the step length can be found by line search.

Eigenvectors and Eigenvalues

Eigenvector $v$ of matrix $\mathbf{B}$ is the non zero vector, which when multiplied with B doesn’t rotate, rather changes in length or reverses its direction. Therefore, for some scalar $\lambda$, $\mathbf{B}v=\lambda v$. This $\lambda$ is called the eigenvalue of vector $v$.

Spectral radius: $\rho(\mathbf{B})=|\lambda|$. If the value of spectral radius is small, it indicates faster convergence.

Jacobi iterative method

In pursuit of solving $\mathbf{A}x=b$, we split $\mathbf{A}$ to matrices with its diagonal $\mathbf{D}$ and non-diagonal elements $\mathbf{E}$

$\mathbf{D}x=-\mathbf{E}x+b$
$x=-\mathbf{D}^{-1}\mathbf{E}x+\mathbf{D}^{-1}b$
$x_{k+1}=-\mathbf{D}^{-1}\mathbf{E}x_k+\mathbf{D}^{-1}b$
for, $-\mathbf{D}^{-1}\mathbf{E}=\mathbf{B}$ & $\mathbf{D}^{-1}b=z$
$x_{k+1}=\mathbf{B}x_k+z$
$x_{k+1}=\mathbf{B}(x^*+e_k)+z$ where error $e_k=x_k-x^*$
Now, at the optimal point, $x_k=x_{k+1}$ hence, $x^*=\mathbf{B}x^*+z$
$x_{k+1}-x^*=\mathbf{B}e_k$
$e_{k+1}=\mathbf{B}e_k$

Now, splitting $e_k$ to respective eigenvectors of $\mathbf{B}$,we get corresponding eigenvalues which will eventually determine the rate of convergence of the error $e_{k+1}$. Lesser the value of eigenvalue $\lambda_k$ faster the convergence.

Convergence
Taking a simple approach where $e_k$ is in the direction of eigenvector, $\alpha=\lambda^{-1}$ using steepest descent the optimal point is reached at one attempt.
Similarly any $e$ can be broken further into its eigenvector components. If we take all eigenvector of $\mathbf{A}$ such that their eigenvalues are same then the optimal solution is reached in one step (Spherical rather than ellipsoidal cross-section) but in case of several non equal eigenvalues, the convergence gets tough.
REVIEW
So, Steepest Descent algorithm is rougher, this is not the most efficient way while Jacobi method is smoother because every eigenvector component is reduced in every iteration.
The Method of Conjugate Directions
In case of Steepest descent method, the common problem that one runs into is going down the same vector direction again and again increasing the cost of optimisation. The best idea here would be to accumulate all these steps in the same direction to a single effective step to avoid spending more time. A convenient method to do that is using conjugate gradient method, where the search directions are A-orthogonal to each other. $e(i+1)$ is A-orthogonal to $d(i)$:
$d(i)^T \mathbf{A}e(i+1)=0$
$d(i)^T \mathbf{A}(e(i)+ \alpha_i d(i))=0$
$\alpha_i=\frac{-d(i)^T \mathbf{A}e(i)}{d(i)^T \mathbf{A}d(i)}$
$\alpha_i=\frac{-d(i)^T r(i)}{d(i)^T \mathbf{A}d(i)}$
Where $d(i)$ is the search direction.