Published on

Jacobi vs. Gauss Seidel - Notes for Dummies

Authors
  • avatar
    Name
    Bojun Feng
    Twitter

Summary:

  1. Intro
  2. The Coding Explanation
  3. The Mathy Explanation

Intro

In numerical analysis, Jacobi Iteration and the Gauss Seidel method are classic examples of iterative methods. However, since most classes focus mainly on deriving the equations, it is easy for the underlying intuition to be buried under a mountain of mathematical symbols and pseudocode. After reading the explanations online, I felt like I understood each step of the derivation, but confused about the more practical side of things: How are these two algorithms different? What are the pros and cons of each? This post contains some of my notes.

Also, shoutout to Dillon Grannis for clarifying a lot of the mathematical concepts!

That's enough context. Let's get started.

The Coding Explanation

Essentially, we have a system Ax=bAx=b, where AA is a matrix and x,bx,b are vectors. We know values of matrix AA and vector bb but not vector xx:

equation_with_text

Under this condidion, here is a crash course on the two methods:

  • If AA follows certain conditions, the Jacobi Iteration can find a better approximation of xx given AA, bb, and a current approximation xx'.
  • If AA follows slightly more conditions, the Gauss Seidel method can do the same as Jacobi, but with better performance.
  • Practically, we start from some approximation and run one of the two methods many times until we are satisfied with the result.
  • Usually, we denote the kthk^{th} guess x(k)x^{(k)} to keep track of the iterations. In this language, each mathod takes in AA, bb, x(k1)x^{(k-1)} and produces x(k)x^{(k)} as a better guess.

What's the Difference?

Both methods provide a formula to compute the ithi^{th} term of x(k)x^{(k)}, which we call xi(k)x^{(k)}_i, using the subscript to indicate the elements within the vector.

In the two methods, the formula (denoted "f" in the image) is actually the exact same, the only difference being the order we update the values.

In Jacobi Iteration we compute the entire x(k)x^{(k)} from x(k1)x^{(k-1)}:

jacobi_iteration

Then we throw away the x(k1)x^{(k-1)} vector as a whole, replacing it with the new guess x(k)x^{(k)}.

In Gauss Seidel method, however, we throw away xi(k1)x^{(k-1)}_i right after computing xi(k)x^{(k)}_i.

gauss_seidel_method

After finishing the computation, we just need to replace the last term before starting the new iteration.

Which is Better?

Pros of Jacobi Over Gauss Siedel:

  • Jacobi has less strict requirements than Gauss Seidel, so certain classes of questions can only be computed with Jacobi iteration.
  • Jacobi is very good for distributed programming, since xi(k)x^{(k)}_i for different values of ii can be computed simutaneously on different computers.

Pros of Gauss Siedel Over Jacobi:

  • Gauss Siedel generally has better performance over Jacobi, since it uses updated values immediately and don't have to wait for the entire vector to finish.
  • Gauss Siedel can save a lot of space, since we don't need to keep track of both x(k)x^{(k)} and x(k1)x^{(k-1)} at the same time. Gauss Siedel only need one vector of memory, while Jacobi needs two.

The Mathy Explanation

To begin with, this is NOT a rigorous proof! We will be making many assumptions throughout this section without diving into the details, and a math textbook would be a much better source of information if you are looking for Lemmas and Theorems.

Instead, this section contains a quick rundown of the intuiton behind these algorithms, with some equations sprinkled throughout.

Linear Algebra 101: How to Approximate

Let's start simple. Given any invertible matrix MM:

MM1=IMM^{-1} = I

Similarly. Given any invertible matrix (IM)(I-M):

(IM)(IM)1=I(I-M)(I-M)^{-1} = I

That's a good warmup. Now we can look at something more interesting. Let's replace (IM)1(I-M)^{-1} with a sum of powers of MM and see what we get:

(IM)(M0+M1+M2)=(IM)(I+M+M2)=(IM)I+(IM)M+(IM)M2=(IM)+(MM2)+(M2M3)=IM3\begin{aligned} &\hspace{0.2in} (I-M)(M^0 + M^1 + M^2)\\ &= (I-M)(I + M + M^2)\\ &= (I-M)I + (I-M)M + (I-M)M^2\\ &= (I-M) + (M-M^2) + (M^2-M^3)\\ &= I-M^3 \end{aligned}

Note how the terms cancel out with each other! The same rule holds if we add more powers of MM. Since all consecutive terms cancel out except for the first and the last, we have:

(IM)(M0+M1+M2+...+Mk)=IMk+1(I-M) (M^0 + M^1 + M^2 + ... + M^k) = I-M^{k+1}

Or, if we want to write it formally and save some space:

(IM)i=0kMi=IMk+1(I-M) \sum_{i=0}^k M^i = I-M^{k+1}

We now make an assumption about MM: Suppose MkM^k becomes smaller and smaller as its power goes up. In other words, MkM^k gets smaller as kk gets bigger. MkM^{k} gets super close to zero as kk gets super large.

More formally in terms of limit, we can write it as:

limkMk=0\lim_{k\to\infty}M^{k} = 0

We can make a very similar claim regarding Mk+1M^{k+1}:

limkMk+1=0\lim_{k\to\infty}M^{k+1} = 0

Now we can plug this limit into the above equation, where the right hand side is IMk+1I-M^{k+1}, which gets super close to just II as kk gets super large.

More formally in terms of limit, we can write it as:

limk(IM)i=0kMi=limkI+Mk+1=I\lim_{k\to\infty} (I-M) \sum_{i=0}^k M^i = \lim_{k\to\infty} I + M^{k+1} = I

Recall that we have seen another equation with II on the right hand side:

(IM)(IM)1=I(I-M)(I-M)^{-1} = I

Putting the pieces together, we have:

(IM)(IM)1=limk(IM)i=0kMi(I-M)(I-M)^{-1} = \lim_{k\to\infty} (I-M) \sum_{i=0}^k M^i
(IM)(IM)1=(IM)limki=0kMi(I-M)(I-M)^{-1} = (I-M) \lim_{k\to\infty} \sum_{i=0}^k M^i
(IM)1=limki=0kMi(I-M)^{-1} = \lim_{k\to\infty} \sum_{i=0}^k M^i

And, if kk is large enough:

(IM)1i=0kMi(I-M)^{-1} \approx \sum_{i=0}^k M^i

As long as MM is a nice matrix, in the sense that (IM)1(I-M)^{-1} exists and MkM^k gets smaller as kk gets bigger, we can use the above logic to approximate (IM)1(I-M)^{-1} with sums of MiM^i.

Approximate, but in Linear Systems

Now that we know how to approximate things, we can try to apply what we've learned to a linear system. Consider the following (familiar) scenario, where we know values of matrix AA and vector bb but not vector xx:

Ax=bAx = b

Let's try to reframe it into something more familiar.

Define a matrix M=IAM = I-A so we have A=IMA = I-M:

(IM)x=b    x=(IM)1b(i=0kMi)b(I-M)x = b \implies x = (I-M)^{-1}b \approx (\sum_{i=0}^k M^i)b

That looks familiar! Rewrite the above equation into a easy to read format, we essentially have:

x(M0+M1+M2+M3+M4+...)bx \approx (M^0 + M^1 + M^2 + M^3 + M^4 + ...)b

The more terms of MiM^i we have, the more accurate the result would be!

One step further: Jacobi and Gauss Seidel

Finally, we can get to the meat: What is exactly going on in Jacobi and Gauss Seidel?

They essentially does what we did above, but took it one step further by breaking the matrix AA into two parts:

A=G+BA = G + B

GG is the "good" matrix and BB is the "bad" matrix. Both matrices are of the same size, but there is a key difference: Only the good matrix is guaranteed to be easily invertible. The bad matrix can be either hard or impossible to invert.

As a result, we are okay with computing G1G^{-1} but not B1B^{-1}.

After we break AA down, let's look at the linear system again:

(G+B)x=b(G + B)x = b

Since we are okay with calculating G1G^{-1}, we can multiply both sides with it:

G1(G+B)x=G1b(G1G+G1B)x=G1b(I+G1B)x=G1b\begin{aligned} G^{-1}(G + B)x &= G^{-1}b\\ (G^{-1}G + G^{-1}B)x &= G^{-1}b\\ (I + G^{-1}B)x &= G^{-1}b \end{aligned}

This is looking a bit more familiar, but not quite enough.

Let's rewrite it some more. Define M=G1BM = -G^{-1}B and b=G1bb' = G^{-1}b, we plug them into the equation:

(IM)x=bx=(IM)1b\begin{aligned} (I - M) x &= b'\\ x &= (I - M)^{-1} b' \end{aligned}

Applying the formula from the last section:

x(M0+M1+M2+...)bx \approx (M^0 + M^1 + M^2 + ...)b'

Voila! We have our approximation. Or, expanding the formula into GG and BB:

x((G1B)0+(G1B)1+(G1B)2+...)G1bx \approx \Big( (-G^{-1}B)^0 + (-G^{-1}B)^1 + (-G^{-1}B)^2 + ... \Big) G^{-1}b

Essentially, this is what Jacobi and Gauss Seidel are doing, although they broke AA down differently:

  • Jacobi chose GG to be just the diagonal, and BB is everything but the diagonal.
  • Gauss Seidel chose GG to be both the diagonal and everything below the diagonal, and BB is everything above the diagonal.

Tying the Math to the Code

The above explanation still begs a question: How do we compute that, exactly?

Both Jacobi and Gauss Seidel are iterative methods, so they perform the computation in iterations. Every new iteration makes an improvement on top of the previous one and produces a more reliable result.

The intuition is for us to add one term of MiM^i in every iteration:

x(k1)=(M0+M1+...+Mk1)bx^{(k-1)} = (M^0 + M^1 + ... + M^{k-1})b
x(k)=(M0+M1+...+Mk)bx^{(k)} = (M^0 + M^1 + ... + M^k)b

However, we don't want to recompute everything on each step since that takes up a lot of space in the computer. How can we compute the iteration while minimizing the cost?

Ideally, we can just use the values of MM, bb, and x(k1)x^{(k-1)} to perform the computation. The key is in the following equation:

Mx(k1)=M(M0+M1+...+Mk1)b=(M1+M2+...+Mk)b=(M0+M1+M2+...+Mk)bM0b=x(k)M0b=x(k)b\begin{aligned} Mx^{(k-1)} &= M(M^0 + M^1 + ... + M^{k-1})b\\ &= (M^1 + M^2 + ... + M^k)b\\ &= (M^0 + M^1 + M^2 + ... + M^k)b - M^0b\\ &= x^{(k)} - M^0b\\ &= x^{(k)} - b\\ \end{aligned}

After shuffling the terms, we have:

x(k)=Mx(k1)+b\begin{aligned} x^{(k)} &= Mx^{(k-1)} + b\\ \end{aligned}

And that is it! Of course, when we are applying Jacobi or Gauss Seidel, we are operating with the good and the bad matrix under the hood:

x(k)=(G1B)x(k1)+G1b\begin{aligned} x^{(k)} &= (-G^{-1}B)x^{(k-1)} + G^{-1}b\\ \end{aligned}

This can also be written as:

x(k)=G1b(G1B)x(k1)\begin{aligned} x^{(k)} &= G^{-1}b - (G^{-1}B)x^{(k-1)}\\ \end{aligned}

This would be the general form we need to perform the computation.

The exact formula of Jacobi and Gauss Seidel are then derived from writing each term down and expanding them, which is some painstacking calculation that I am not going to type up in latex. If you trust me, here are the formulas for solving Ax=bAx=b.

Jacobi:

xi(k)=1aii[j=1,jin(aijxj(k1))+bi]x_i^{(k)}=\frac{1}{a_{i i}}\left[\sum_{\substack{j=1, j \neq i}}^n\left(-a_{i j} x_j^{(k-1)}\right)+b_i\right]

Gauss Seidel:

xi(k)=1aii[j=1i1(aijxj(k))j=i+1n(aijxj(k1))+bi]x_i^{(k)}=\frac{1}{a_{i i}}\left[-\sum_{j=1}^{i-1}\left(a_{i j} x_j^{(k)}\right)-\sum_{j=i+1}^n\left(a_{i j} x_j^{(k-1)}\right)+b_i\right]

In Gauss Seidel, we skipped quite a bit of computation by using the already computed terms in x(k)x^{(k)} instead of revisiting all the terms in AA.

In coding practice, the similarity in formulas makes Gauss Seidel almost the same as Jacobi, the only difference being we are throwing away xi(k1)x^{(k-1)}_i immediately after xi(k)x^{(k)}_i is computed in Gauss Seidel, while in Jacobi we wait until the entire vector x(k)x^{(k)} is computed before throwing away x(k1)x^{(k-1)} all together.