Tridiagonal matrix algorithm: Difference between revisions
m Task 70: Update syntaxhighlight tags - remove use of deprecated <source> tags |
The previous code wouldn't have run. You can see this because the loop is from ix = 1 to X-1 inclusive, and accesses c[ix], but c only has elements from 0 to X-2. This causes an error on the ix = X-1 iteration, however, note that scratch[X-1] is never used in the computation, so we can simply only execute the line when ix is less than X-1. |
||
(35 intermediate revisions by 21 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Improved reduction for specific matrices}} |
|||
In [[numerical linear algebra]], the '''tridiagonal matrix algorithm''', also known as the '''Thomas algorithm''' (named after [[Llewellyn Thomas]]), is a simplified form of [[Gaussian elimination]] that can be used to solve [[Tridiagonal matrix|tridiagonal systems of equations]]. A tridiagonal system for ''n'' unknowns may be written as |
|||
In [[numerical linear algebra]], the '''tridiagonal matrix algorithm''', also known as the '''Thomas algorithm''' (named after [[Llewellyn Thomas]]), is a simplified form of [[Gaussian elimination]] that can be used to solve [[Tridiagonal matrix|tridiagonal systems of equations]]. A tridiagonal system for ''n'' unknowns may be written as |
|||
:<math>a_i x_{i-1} + b_i x_i + c_i x_{i+1} = d_i,</math> |
|||
:<math> |
|||
where <math>a_1 = 0</math> and <math>c_n = 0</math>. |
|||
a_i x_{i - 1} + b_i x_i + c_i x_{i + 1} = d_i , \,\!</math> |
|||
where <math> a_1 = 0\, </math> and <math> c_n = 0\, </math>. |
|||
:<math> |
:<math> |
||
\begin{bmatrix} |
\begin{bmatrix} |
||
b_1 & c_1 & & & 0 \\ |
|||
a_2 & b_2 & c_2 & & \\ |
|||
& a_3 & b_3 & \ddots & \\ |
|||
& & \ddots & \ddots & c_{n-1} \\ |
|||
0 & & & a_n & b_n |
|||
\end{bmatrix} |
\end{bmatrix} |
||
\begin{bmatrix} |
\begin{bmatrix} |
||
x_1 \\ |
|||
x_2 \\ |
|||
x_3 \\ |
|||
\vdots |
\vdots \\ |
||
x_n |
|||
\end{bmatrix} |
\end{bmatrix} |
||
= |
= |
||
\begin{bmatrix} |
\begin{bmatrix} |
||
d_1 \\ |
|||
d_2 \\ |
|||
d_3 \\ |
|||
\vdots |
\vdots \\ |
||
d_n |
|||
\end{bmatrix} |
\end{bmatrix} |
||
. |
. |
||
Line 33: | Line 33: | ||
For such systems, the solution can be obtained in <math>O(n)</math> operations instead of <math>O(n^3)</math> required by [[Gaussian elimination]]. A first sweep eliminates the <math>a_i</math>'s, and then an (abbreviated) backward substitution produces the solution. Examples of such matrices commonly arise from the discretization of 1D [[Poisson equation]] and natural cubic [[spline interpolation]]. |
For such systems, the solution can be obtained in <math>O(n)</math> operations instead of <math>O(n^3)</math> required by [[Gaussian elimination]]. A first sweep eliminates the <math>a_i</math>'s, and then an (abbreviated) backward substitution produces the solution. Examples of such matrices commonly arise from the discretization of 1D [[Poisson equation]] and natural cubic [[spline interpolation]]. |
||
Thomas' algorithm is not [[numerical stability|stable]] in general, but is so in several special cases, such as when the matrix is [[diagonally dominant]] (either by rows or columns) or [[symmetric positive definite]];<ref name="Niyogi2006">{{cite book|author=Pradip Niyogi|title=Introduction to Computational Fluid Dynamics|year=2006|publisher=Pearson Education India|isbn=978-81-7758-764-7|page=76}}</ref><ref name="Datta2010"/> for a more precise characterization of stability of Thomas' algorithm, see Higham Theorem 9.12.<ref name="Higham2002">{{cite book|author=Nicholas J. Higham|title=Accuracy and Stability of Numerical Algorithms: Second Edition|year=2002|publisher=SIAM|isbn=978-0-89871-802-7|page=175}}</ref> If stability is required in the general case, [[Gaussian elimination]] with [[partial pivoting]] (GEPP) is recommended instead.<ref name="Datta2010">{{cite book|author=Biswa Nath Datta|title=Numerical Linear Algebra and Applications, Second Edition|year=2010|publisher=SIAM|isbn=978-0-89871-765-5|page=162}}</ref> |
Thomas' algorithm is not [[numerical stability|stable]] in general, but is so in several special cases, such as when the matrix is [[diagonally dominant]] (either by rows or columns) or [[symmetric positive definite]];<ref name="Niyogi2006">{{cite book |author=Pradip Niyogi |title=Introduction to Computational Fluid Dynamics |year=2006 |publisher=Pearson Education India |isbn=978-81-7758-764-7 |page=76}}</ref><ref name="Datta2010"/> for a more precise characterization of stability of Thomas' algorithm, see Higham Theorem 9.12.<ref name="Higham2002">{{cite book |author=Nicholas J. Higham |title=Accuracy and Stability of Numerical Algorithms: Second Edition |year=2002 |publisher=SIAM |isbn=978-0-89871-802-7 |page=175}}</ref> If stability is required in the general case, [[Gaussian elimination]] with [[partial pivoting]] (GEPP) is recommended instead.<ref name="Datta2010">{{cite book |author=Biswa Nath Datta |title=Numerical Linear Algebra and Applications, Second Edition |year=2010 |publisher=SIAM |isbn=978-0-89871-765-5 |page=162}}</ref> |
||
==Method== |
==Method== |
||
The forward sweep consists of the computation of new coefficients as follows, denoting the new coefficients with primes: |
The forward sweep consists of the computation of new coefficients as follows, denoting the new coefficients with primes: |
||
:<math>c'_i = |
: <math>c'_i = |
||
\begin{cases} |
\begin{cases} |
||
\cfrac{c_i}{b_i}, & i = 1, \\ |
|||
\begin{array}{lcl} |
|||
\cfrac{c_i}{b_i |
\cfrac{c_i}{b_i - a_i c'_{i - 1}}, & i = 2, 3, \dots, n - 1 |
||
\end{cases} |
|||
\cfrac{c_i}{b_i - a_i c'_{i - 1}} & ; & i = 2, 3, \dots, n-1 \\ |
|||
</math> |
|||
\end{array} |
|||
\end{cases} |
|||
\,</math> |
|||
and |
and |
||
:<math>d'_i = |
: <math>d'_i = |
||
\begin{cases} |
\begin{cases} |
||
\cfrac{d_i}{b_i}, & i = 1, \\ |
|||
\begin{array}{lcl} |
|||
\cfrac{d_i |
\cfrac{d_i - a_i d'_{i - 1}}{b_i - a_i c'_{i - 1}}, & i = 2, 3, \dots, n. |
||
\end{cases} |
|||
\cfrac{d_i - a_i d'_{i - 1}}{b_i - a_i c'_{i - 1}} & ; & i = 2, 3, \dots, n. \\ |
|||
</math> |
|||
\end{array} |
|||
\end{cases} |
|||
\,</math> |
|||
The solution is then obtained by back substitution: |
The solution is then obtained by back substitution: |
||
: <math>x_n = d'_n,</math> |
|||
:<math> |
: <math>x_i = d'_i - c'_i x_{i + 1}, \quad i = n - 1, n - 2, \ldots, 1.</math> |
||
:<math>x_i = d'_i - c'_i x_{i + 1} \qquad ; \ i = n - 1, n - 2, \ldots, 1.</math> |
|||
The method above does not modify the original coefficient vectors, but must also keep track of the new coefficients. If the coefficient vectors may be modified, then an algorithm with less bookkeeping is: |
The method above does not modify the original coefficient vectors, but must also keep track of the new coefficients. If the coefficient vectors may be modified, then an algorithm with less bookkeeping is: |
||
For <math>i = 2, 3, \dots, n,</math> do |
|||
: <math>w = \cfrac{a_i}{b_{i-1}},</math> |
|||
::<math> |
|||
: <math>b_i := b_i-w c_{i-1},</math> |
|||
\begin{array}{lcl} |
|||
: <math>d_i := d_i-w d_{i-1},</math> |
|||
w=\cfrac{a_i}{b_{i-1}}\\ |
|||
b_i:=b_i-w c_{i-1}\\ |
|||
d_i:=d_i-w d_{i-1}\\ |
|||
\end{array} |
|||
</math> |
|||
followed by the back substitution |
followed by the back substitution |
||
: <math>x_n = \cfrac{d_n}{b_n},</math> |
|||
:<math> |
|||
: <math>x_i = \cfrac{d_i - c_i x_{i+1}}{b_i} \quad \text{for } i = n - 1, n - 2, \dots, 1.</math> |
|||
\begin{array}{lcl} |
|||
x_n=\cfrac{d_n}{b_n}\\ |
|||
The implementation as a [[C (programming language)|C]] function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused: |
|||
x_i=\cfrac{d_i-c_i x_{i+1}}{b_i} ,\quad\text{for } i=n-1,n-2, \dots, 1 |
|||
<syntaxhighlight lang="c"> |
|||
\end{array} |
|||
void thomas(const int X, double x[restrict X], |
|||
</math> |
|||
const double a[restrict X], const double b[restrict X], |
|||
const double c[restrict X], double scratch[restrict X]) { |
|||
/* |
|||
solves Ax = d, where A is a tridiagonal matrix consisting of vectors a, b, c |
|||
X = number of equations |
|||
x[] = initially contains the input v, and returns x. indexed from [0, ..., X - 1] |
|||
a[] = subdiagonal, indexed from [1, ..., X - 1] |
|||
b[] = main diagonal, indexed from [0, ..., X - 1] |
|||
c[] = superdiagonal, indexed from [0, ..., X - 2] |
|||
scratch[] = scratch space of length X, provided by caller, allowing a, b, c to be const |
|||
not performed in this example: manual expensive common subexpression elimination |
|||
*/ |
|||
scratch[0] = c[0] / b[0]; |
|||
x[0] = x[0] / b[0]; |
|||
/* loop from 1 to X - 1 inclusive */ |
|||
for (int ix = 1; ix < X; ix++) { |
|||
if (ix < X-1){ |
|||
scratch[ix] = c[ix] / (b[ix] - a[ix] * scratch[ix - 1]); |
|||
} |
|||
x[ix] = (x[ix] - a[ix] * x[ix - 1]) / (b[ix] - a[ix] * scratch[ix - 1]); |
|||
} |
|||
/* loop from X - 2 to 0 inclusive */ |
|||
The implementation in a VBA subroutine without preserving the coefficient vectors is shown below |
|||
for (int ix = X - 2; ix >= 0; ix--) |
|||
<syntaxhighlight lang="vb"> |
|||
x[ix] -= scratch[ix] * x[ix + 1]; |
|||
Sub TriDiagonal_Matrix_Algorithm(N%, A#(), B#(), C#(), D#(), X#()) |
|||
} |
|||
Dim i%, W# |
|||
For i = 2 To N |
|||
W = A(i) / B(i - 1) |
|||
B(i) = B(i) - W * C(i - 1) |
|||
D(i) = D(i) - W * D(i - 1) |
|||
Next i |
|||
X(N) = D(N) / B(N) |
|||
For i = N - 1 To 1 Step -1 |
|||
X(i) = (D(i) - C(i) * X(i + 1)) / B(i) |
|||
Next i |
|||
End Sub |
|||
</syntaxhighlight> |
</syntaxhighlight> |
||
Line 105: | Line 107: | ||
Suppose that the unknowns are <math>x_1,\ldots, x_n</math>, and that the equations to be solved are: |
Suppose that the unknowns are <math>x_1,\ldots, x_n</math>, and that the equations to be solved are: |
||
:<math>\begin{ |
:<math>\begin{alignat}{4} |
||
b_1 x_1 + c_1 x_2 |
& && b_1 x_1 && + c_1 x_2 && = d_1 \\ |
||
a_i x_{i - 1} + b_i x_i + c_i x_{i + 1} & = d_i |
& a_i x_{i - 1} && + b_i x_i && + c_i x_{i + 1} && = d_i \,, \quad i = 2, \ldots, n - 1 \\ |
||
a_n x_{n - 1} + b_n x_n & |
& a_n x_{n - 1} && + b_n x_n && && = d_n \,. |
||
\end{ |
\end{alignat} |
||
</math> |
</math> |
||
Line 169: | Line 171: | ||
:<math>x_n = d'_n\,</math> |
:<math>x_n = d'_n\,</math> |
||
:<math>x_i = d'_i - c'_i x_{i + 1} \qquad ; \ i = n - 1, n - 2, \ldots, 1.</math> |
:<math>x_i = d'_i - c'_i x_{i + 1} \qquad ; \ i = n - 1, n - 2, \ldots, 1.</math> |
||
Line 176: | Line 177: | ||
:<math> |
:<math> |
||
\begin{alignat}{4} |
|||
& a_1 x_{n} && + b_1 x_1 && + c_1 x_2 && = d_1 \\ |
|||
& a_i x_{i - 1} && + b_i x_i && + c_i x_{i + 1} && = d_i \,, \quad i = 2, \ldots, n - 1 \\ |
|||
& a_n x_{n - 1} && + b_n x_n && + c_n x_1 && = d_n \,. |
|||
\end{alignat} |
|||
</math> |
|||
In this case, we can make use of the [[Sherman–Morrison formula]] to avoid the additional operations of Gaussian elimination and still use the Thomas algorithm. The method requires solving a modified non-cyclic version of the system for both the input and a sparse corrective vector, and then combining the solutions. This can be done efficiently if both solutions are computed at once, as the forward portion of the pure tridiagonal matrix algorithm can be shared. |
|||
If we indicate by: |
|||
<math display="block"> |
|||
A=\begin{bmatrix} |
|||
b_1 & c_1 & & & a_1 \\ |
|||
a_2 & b_2 & c_2 & & \\ |
|||
& a_3 & b_3 & \ddots & \\ |
|||
& & \ddots & \ddots & c_{n-1} \\ |
|||
c_n & & & a_n & b_n |
|||
\end{bmatrix} |
|||
,x= |
|||
\begin{bmatrix} |
|||
x_1 \\ |
|||
x_2 \\ |
|||
x_3 \\ |
|||
\vdots \\ |
|||
x_n |
|||
\end{bmatrix} |
|||
,d= |
|||
\begin{bmatrix} |
|||
d_1 \\ |
|||
d_2 \\ |
|||
d_3 \\ |
|||
\vdots \\ |
|||
d_n |
|||
\end{bmatrix} |
|||
</math> |
|||
Then the system to be solved is: <math display="block">Ax = d</math> |
|||
In this case the coefficients <math> a_1 </math> and <math> c_n </math> are, generally speaking, non-zero, so their presence does not allow to apply the Thomas algorithm directly. We can therefore consider <math> B\in\mathbb{R}^{n\times n} </math> and <math> u,v\in\mathbb{R}^{n} </math> as following: |
|||
<math display="block"> |
|||
B=\begin{bmatrix} |
|||
b_1-\gamma & c_1 & & & 0 \\ |
|||
a_2 & b_2 & c_2 & & \\ |
|||
& a_3 & b_3 & \ddots & \\ |
|||
& & \ddots & \ddots & c_{n-1} \\ |
|||
0 & & & a_n & b_n-\frac{c_na_1}{\gamma} |
|||
\end{bmatrix}, |
|||
u=\begin{bmatrix} |
|||
\gamma \\ |
|||
0 \\ |
|||
0 \\ |
|||
\vdots \\ |
|||
c_n |
|||
\end{bmatrix} |
|||
,v= |
|||
\begin{bmatrix} |
|||
1 \\ |
|||
0 \\ |
|||
0 \\ |
|||
\vdots \\ |
|||
a_1/\gamma |
|||
\end{bmatrix} |
|||
. |
|||
</math> |
|||
Where <math> \gamma\in\mathbb{R} </math> is a parameter to be chosen. The matrix <math> A </math> can be reconstructed as <math> A = B + u v^\mathsf{T}</math>. The solution is then obtained in the following way:<ref>{{Cite journal |last1=Batista |first1=Milan |last2=Ibrahim Karawia |first2=Abdel Rahman A. |date=2009 |title=The use of the Sherman–Morrison–Woodbury formula to solve cyclic block tri-diagonal and cyclic block penta-diagonal linear systems of equations |url=http://dx.doi.org/10.1016/j.amc.2009.01.003 |journal=Applied Mathematics and Computation |volume=210 |issue=2 |pages=558–563 |doi=10.1016/j.amc.2009.01.003 |issn=0096-3003}}</ref> first we solve two tridiagonal systems of equations applying the Thomas algorithm: |
|||
<math display="block">By=d \qquad \qquad Bq=u </math> |
|||
Then we reconstruct the solution <math>x </math> using the [[Sherman–Morrison formula|Shermann-Morrison formula]]: |
|||
<math display="block">\begin{align} |
|||
x &=A^{-1}d |
|||
=(B+uv^T)^{-1}d |
|||
=B^{-1}d-\frac{B^{-1}uv^TB^{-1}}{1+v^TB^{-1}u}d |
|||
=y-\frac{qv^Ty}{1+v^Tq} |
|||
\end{align} </math> |
|||
The implementation as a [[C (programming language)|C]] function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused: |
|||
<syntaxhighlight lang="c"> |
|||
void cyclic_thomas(const int X, double x[restrict X], const double a[restrict X], const double b[restrict X], const double c[restrict X], double cmod[restrict X], double u[restrict X]) { |
|||
/* |
|||
solves Ax = v, where A is a cyclic tridiagonal matrix consisting of vectors a, b, c |
|||
X = number of equations |
|||
x[] = initially contains the input v, and returns x. indexed from [0, ..., X - 1] |
|||
a[] = subdiagonal, regularly indexed from [1, ..., X - 1], a[0] is lower left corner |
|||
b[] = main diagonal, indexed from [0, ..., X - 1] |
|||
c[] = superdiagonal, regularly indexed from [0, ..., X - 2], c[X - 1] is upper right |
|||
cmod[], u[] = scratch vectors each of length X |
|||
*/ |
|||
/* lower left and upper right corners of the cyclic tridiagonal system respectively */ |
|||
const double alpha = a[0]; |
|||
const double beta = c[X - 1]; |
|||
/* arbitrary, but chosen such that division by zero is avoided */ |
|||
const double gamma = -b[0]; |
|||
cmod[0] = alpha / (b[0] - gamma); |
|||
u[0] = gamma / (b[0] - gamma); |
|||
x[0] /= (b[0] - gamma); |
|||
/* loop from 1 to X - 2 inclusive */ |
|||
for (int ix = 1; ix + 1 < X; ix++) { |
|||
const double m = 1.0 / (b[ix] - a[ix] * cmod[ix - 1]); |
|||
cmod[ix] = c[ix] * m; |
|||
u[ix] = (0.0f - a[ix] * u[ix - 1]) * m; |
|||
x[ix] = (x[ix] - a[ix] * x[ix - 1]) * m; |
|||
} |
|||
/* handle X - 1 */ |
|||
const double m = 1.0 / (b[X - 1] - alpha * beta / gamma - beta * cmod[X - 2]); |
|||
u[X - 1] = (alpha - a[X - 1] * u[X - 2]) * m; |
|||
x[X - 1] = (x[X - 1] - a[X - 1] * x[X - 2]) * m; |
|||
/* loop from X - 2 to 0 inclusive */ |
|||
for (int ix = X - 2; ix >= 0; ix--) { |
|||
u[ix] -= cmod[ix] * u[ix + 1]; |
|||
x[ix] -= cmod[ix] * x[ix + 1]; |
|||
} |
|||
const double fact = (x[0] + x[X - 1] * beta / gamma) / (1.0 + u[0] + u[X - 1] * beta / gamma); |
|||
/* loop from 0 to X - 1 inclusive */ |
|||
for (int ix = 0; ix < X; ix++) |
|||
x[ix] -= fact * u[ix]; |
|||
} |
|||
</syntaxhighlight> |
|||
There is also another way to solve the slightly perturbed form of the tridiagonal system considered above.<ref>{{Citation |last1=Ryaben’kii |first1=Victor S. |title=Introduction |date=2006-11-02 |url=http://dx.doi.org/10.1201/9781420011166-1 |work=A Theoretical Introduction to Numerical Analysis |pages=1–19 |publisher=Chapman and Hall/CRC |isbn=978-0-429-14339-7 |access-date=2022-05-25 |last2=Tsynkov |first2=Semyon V.|doi=10.1201/9781420011166-1 }}</ref> Let us consider two auxiliary linear systems of dimension <math>(n-1) \times (n-1) </math>: |
|||
<math display="block"> |
|||
\begin{align} |
\begin{align} |
||
\qquad \ \ \ \ \ b_2 u_{2} + c_2 u_3 &= d_2 \\ |
|||
a_3 u_2 + b_3 u_3 + c_3 u_4 &= d_3 \\ |
|||
a_i x_{i - 1} + b_i x_i + c_i x_{i + 1} & = d_i,\quad\quad i = 2,\ldots,n-1 \\ |
|||
a_i u_{i-1} + b_i u_i + c_i u_{i+1} &= d_i\\ |
|||
\dots \\ |
|||
a_n u_{n - 1}+ b_n u_n\qquad &= d_n \,. |
|||
\end{align} |
\end{align} |
||
\quad i = 4, \ldots, n - 1 \qquad \qquad |
|||
\begin{align} |
|||
\qquad \ \ \ \ \ b_2 v_{2} + c_2 v_3 &= -a_2 \\ |
|||
a_3 v_2 + b_3 v_3 + c_3 v_4 &= 0 \\ |
|||
a_i u_{i-1} + b_i u_i + c_i u_{i+1} &= 0\\ |
|||
\dots \\ |
|||
a_n v_{n - 1}+ b_n v_n\qquad &= -c_n \,. |
|||
\end{align} |
|||
\quad i = 4, \ldots, n - 1 |
|||
</math> |
</math> |
||
For convenience, we additionally define <math> u_1 = 0</math> and <math>v_1 = 1</math>. We can now find the solutions <math>\{u_2,u_3\dots,u_n \} </math> and <math>\{v_2,v_3\dots,v_n \}</math> applying Thomas algorithm to the two auxiliary tridiagonal system. |
|||
In this case, we can make use of the [[Sherman-Morrison formula]] to avoid the additional operations of Gaussian elimination and still use the Thomas algorithm. The method requires solving a modified non-cyclic version of the system for both the input and a sparse corrective vector, and then combining the solutions. This can be done efficiently if both solutions are computed at once, as the forward portion of the pure tridiagonal matrix algorithm can be shared. |
|||
The solution <math>\{ x_1,x_2\dots,x_n \}</math> can be then represented in the form: |
|||
In other situations, the system of equations may be '''block tridiagonal''' (see [[block matrix]]), with smaller submatrices arranged as the individual elements in the above matrix system (e.g., the 2D [[Poisson equation discretized into block tridiagonal|Poisson problem]]). Simplified forms of Gaussian elimination have been developed for these situations.<ref name="Quarteroni2007">{{cite book|last1=Quarteroni|first1=Alfio|last2=Sacco|first2=Riccardo|last3=Saleri|first3=Fausto | year=2007|title=Numerical Mathematics|publisher= Springer, New York|isbn= 978-3-540-34658-6|chapter=Section 3.8}}</ref> |
|||
<math display="block">x_i = u_i + x_1 v_i \qquad i=1,2,\dots, n</math> |
|||
Indeed, multiplying each equation of the second auxiliary system by <math>x_1</math>, adding with the corresponding equation of the first auxiliary system and using the representation <math>x_i = u_i + x_1 v_i</math>, we immediately see that equations number <math>2</math> through <math>n</math> of the original system are satisfied; it only remains to satisfy equation number <math>1</math>. To do so, consider formula for <math>i=2</math> and <math>i=n</math> and substitute <math>x_2 = u_2 + x_1 v_2</math>and <math>x_n = u_n + x_1 v_n</math> into the first equation of the original system. This yields one scalar equation for <math>x_1</math>: |
|||
The textbook ''Numerical Mathematics'' by Quarteroni, Sacco and Saleri, lists a modified version of the algorithm which avoids some of the divisions (using instead multiplications), which is beneficial on some computer architectures. |
|||
<math display="block"> b_1x_1+c_1(u_2+x_1v_2)+a_1(u_n+x_1v_n) = d_1</math> |
|||
As such, we find: |
|||
Parallel tridiagonal solvers have been published for many vector and parallel architectures, including GPUs<ref name="ChangHwu14">{{cite conference|last1=Chang|first1=L.-W.|last2=Hwu|first2=W,-M.|title=A guide for implementing tridiagonal solvers on GPUs|book-title=Numerical Computations with GPUs|publisher=Springer|isbn=978-3-319-06548-9|editor=V. Kidratenko|year=2014}}</ref> |
|||
<math display="block"> x_1 = \frac{d_1-a_1u_n-c_1u_2}{b_1+a_1v_n+c_1v_2}</math> |
|||
<ref name="Venetis">{{cite journal|last1=Venetis|first1=I.E.|last2=Kouris|first2=A.|last3=Sobczyk|first3=A.|last4=Gallopoulos|first4=E.|last5=Sameh|first5=A.|title=A direct tridiagonal solver based on Givens rotations for GPU architectures|journal=Parallel Computing|volume=49|pages=101–116|year=2015}}</ref> |
|||
The implementation as a [[C (programming language)|C]] function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused: |
|||
For an extensive treatment of parallel tridiagonal and block tridiagonal solvers see <ref name="GPS.16">{{cite book|last1=Gallopoulos|first1=E.|last2=Philippe|first2=B.|last3=Sameh|first3=A.H. | year=2016|title=Parallelism in Matrix Computations|publisher= Springer|isbn= 978-94-017-7188-7|chapter=Chapter 5}}</ref> |
|||
<syntaxhighlight lang="c"> |
|||
void cyclic_thomas(const int X, double x[restrict X], const double a[restrict X], const double b[restrict X], const double c[restrict X], double cmod[restrict X], double v[restrict X]) { |
|||
/* first solve a system of length X - 1 for two right hand sides, ignoring ix == 0 */ |
|||
cmod[1] = c[1] / b[1]; |
|||
v[1] = -a[1] / b[1]; |
|||
x[1] = x[1] / b[1]; |
|||
/* loop from 2 to X - 1 inclusive */ |
|||
for (int ix = 2; ix < X - 1; ix++) { |
|||
const double m = 1.0 / (b[ix] - a[ix] * cmod[ix - 1]); |
|||
cmod[ix] = c[ix] * m; |
|||
v[ix] = (0.0f - a[ix] * v[ix - 1]) * m; |
|||
x[ix] = (x[ix] - a[ix] * x[ix - 1]) * m; |
|||
} |
|||
/* handle X - 1 */ |
|||
const double m = 1.0 / (b[X - 1] - a[X - 1] * cmod[X - 2]); |
|||
cmod[X - 1] = c[X - 1] * m; |
|||
v[X - 1] = (-c[0] - a[X - 1] * v[X - 2]) * m; |
|||
x[X - 1] = (x[X - 1] - a[X - 1] * x[X - 2]) * m; |
|||
/* loop from X - 2 to 1 inclusive */ |
|||
for (int ix = X - 2; ix >= 1; ix--) { |
|||
v[ix] -= cmod[ix] * v[ix + 1]; |
|||
x[ix] -= cmod[ix] * x[ix + 1]; |
|||
} |
|||
x[0] = (x[0] - a[0] * x[X - 1] - c[0] * x[1]) / (b[0] + a[0] * v[X - 1] + c[0] * v[1]); |
|||
/* loop from 1 to X - 1 inclusive */ |
|||
for (int ix = 1; ix < X; ix++) |
|||
x[ix] += x[0] * v[ix]; |
|||
} |
|||
</syntaxhighlight> |
|||
In both cases the auxiliary systems to be solved are genuinely tri-diagonal, so the overall computational complexity of solving system <math> Ax = d </math> remains linear with the respect to the dimension of the system <math> n </math>, that is <math>O(n)</math> arithmetic operations. |
|||
In other situations, the system of equations may be '''block tridiagonal''' (see [[block matrix]]), with smaller submatrices arranged as the individual elements in the above matrix system (e.g., the 2D [[Poisson equation discretized into block tridiagonal|Poisson problem]]). Simplified forms of Gaussian elimination have been developed for these situations.<ref name="Quarteroni2007">{{cite book|author-link1= Alfio Quarteroni| last1=Quarteroni|first1=Alfio |last2=Sacco|first2=Riccardo | last3=Saleri|first3=Fausto | year=2007|title=Numerical Mathematics|publisher= Springer, New York|isbn= 978-3-540-34658-6 | chapter=Section 3.8}}</ref> |
|||
The textbook ''Numerical Mathematics'' by [[Alfio Quarteroni]], Sacco and Saleri, lists a modified version of the algorithm which avoids some of the divisions (using instead multiplications), which is beneficial on some computer architectures. |
|||
Parallel tridiagonal solvers have been published for many vector and parallel architectures, including GPUs<ref name="ChangHwu14">{{cite conference |last1=Chang|first1=L.-W.|last2=Hwu|first2=W,-M.|title=A guide for implementing tridiagonal solvers on GPUs|book-title=Numerical Computations with GPUs | publisher=Springer|isbn=978-3-319-06548-9|editor=V. Kidratenko|year=2014}}</ref><ref name="Venetis">{{cite journal | last1=Venetis|first1=I.E. | last2=Kouris|first2=A.|last3=Sobczyk|first3=A.| last4=Gallopoulos|first4=E.|last5=Sameh|first5=A.|title=A direct tridiagonal solver based on Givens rotations for GPU architectures|journal=Parallel Computing|volume=49|pages=101–116|year=2015|doi=10.1016/j.parco.2015.03.008}}</ref> |
|||
For an extensive treatment of parallel tridiagonal and block tridiagonal solvers see <ref name="GPS.16">{{cite book| last1=Gallopoulos|first1=E. |last2=Philippe|first2=B.|last3=Sameh|first3=A.H. | year=2016|title=Parallelism in Matrix Computations|publisher= Springer|isbn= 978-94-017-7188-7|chapter=Chapter 5}}</ref> |
|||
==References== |
==References== |
||
{{Wikibooks|Algorithm Implementation|Linear Algebra/Tridiagonal matrix algorithm|Tridiagonal matrix algorithm}} |
{{Wikibooks|Algorithm Implementation|Linear Algebra/Tridiagonal matrix algorithm|Tridiagonal matrix algorithm}} |
||
<references /> |
|||
{{reflist}} |
|||
* {{cite book |author1=Conte, S. D. |author2=de Boor, C. |year=1972 |title=Elementary Numerical Analysis |url=https://archive.org/details/elementarynumeri00samu |url-access=registration |publisher=McGraw-Hill, New York |isbn=0070124469}} |
|||
* {{CFDWiki|name=Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm)}} |
|||
*{{cite book|author1=Conte, S.D. |author2=deBoor, C. |lastauthoramp=yes |year=1972|title=Elementary Numerical Analysis|url=https://archive.org/details/elementarynumeri00samu |url-access=registration |publisher= McGraw-Hill, New York|isbn= 0070124469}} |
|||
* {{Cite book |last1=Press |first1=W. H. |last2=Teukolsky |first2=S. A. |last3=Vetterling |first3=W. T. |last4=Flannery |first4=B. P. |year=2007 |title=Numerical Recipes: The Art of Scientific Computing |edition=3rd |publisher=Cambridge University Press |location=New York |isbn=978-0-521-88068-8 |chapter=Section 2.4 |chapter-url=http://apps.nrbook.com/empanel/index.html?pg=56}} |
|||
*{{CFDWiki|name=Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm)}} |
|||
*{{Cite book|last1=Press|first1=WH|last2=Teukolsky|first2=SA|last3=Vetterling|first3=WT|last4=Flannery|first4=BP|year=2007|title=Numerical Recipes: The Art of Scientific Computing|edition=3rd|publisher=Cambridge University Press| location=New York|isbn=978-0-521-88068-8|chapter=Section 2.4|chapter-url=http://apps.nrbook.com/empanel/index.html?pg=56}} |
|||
{{Numerical linear algebra}} |
{{Numerical linear algebra}} |
||
Line 206: | Line 395: | ||
{{DEFAULTSORT:Tridiagonal Matrix Algorithm}} |
{{DEFAULTSORT:Tridiagonal Matrix Algorithm}} |
||
[[Category:Numerical linear algebra]] |
[[Category:Numerical linear algebra]] |
||
[[Category:Articles with example BASIC code]] |
Latest revision as of 21:00, 8 April 2024
In numerical linear algebra, the tridiagonal matrix algorithm, also known as the Thomas algorithm (named after Llewellyn Thomas), is a simplified form of Gaussian elimination that can be used to solve tridiagonal systems of equations. A tridiagonal system for n unknowns may be written as
where and .
For such systems, the solution can be obtained in operations instead of required by Gaussian elimination. A first sweep eliminates the 's, and then an (abbreviated) backward substitution produces the solution. Examples of such matrices commonly arise from the discretization of 1D Poisson equation and natural cubic spline interpolation.
Thomas' algorithm is not stable in general, but is so in several special cases, such as when the matrix is diagonally dominant (either by rows or columns) or symmetric positive definite;[1][2] for a more precise characterization of stability of Thomas' algorithm, see Higham Theorem 9.12.[3] If stability is required in the general case, Gaussian elimination with partial pivoting (GEPP) is recommended instead.[2]
Method
[edit]The forward sweep consists of the computation of new coefficients as follows, denoting the new coefficients with primes:
and
The solution is then obtained by back substitution:
The method above does not modify the original coefficient vectors, but must also keep track of the new coefficients. If the coefficient vectors may be modified, then an algorithm with less bookkeeping is:
For do
followed by the back substitution
The implementation as a C function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused:
void thomas(const int X, double x[restrict X],
const double a[restrict X], const double b[restrict X],
const double c[restrict X], double scratch[restrict X]) {
/*
solves Ax = d, where A is a tridiagonal matrix consisting of vectors a, b, c
X = number of equations
x[] = initially contains the input v, and returns x. indexed from [0, ..., X - 1]
a[] = subdiagonal, indexed from [1, ..., X - 1]
b[] = main diagonal, indexed from [0, ..., X - 1]
c[] = superdiagonal, indexed from [0, ..., X - 2]
scratch[] = scratch space of length X, provided by caller, allowing a, b, c to be const
not performed in this example: manual expensive common subexpression elimination
*/
scratch[0] = c[0] / b[0];
x[0] = x[0] / b[0];
/* loop from 1 to X - 1 inclusive */
for (int ix = 1; ix < X; ix++) {
if (ix < X-1){
scratch[ix] = c[ix] / (b[ix] - a[ix] * scratch[ix - 1]);
}
x[ix] = (x[ix] - a[ix] * x[ix - 1]) / (b[ix] - a[ix] * scratch[ix - 1]);
}
/* loop from X - 2 to 0 inclusive */
for (int ix = X - 2; ix >= 0; ix--)
x[ix] -= scratch[ix] * x[ix + 1];
}
Derivation
[edit]The derivation of the tridiagonal matrix algorithm is a special case of Gaussian elimination.
Suppose that the unknowns are , and that the equations to be solved are:
Consider modifying the second () equation with the first equation as follows:
which would give:
Note that has been eliminated from the second equation. Using a similar tactic with the modified second equation on the third equation yields:
This time was eliminated. If this procedure is repeated until the row; the (modified) equation will involve only one unknown, . This may be solved for and then used to solve the equation, and so on until all of the unknowns are solved for.
Clearly, the coefficients on the modified equations get more and more complicated if stated explicitly. By examining the procedure, the modified coefficients (notated with tildes) may instead be defined recursively:
To further hasten the solution process, may be divided out (if there's no division by zero risk), the newer modified coefficients, each notated with a prime, will be:
This gives the following system with the same unknowns and coefficients defined in terms of the original ones above:
The last equation involves only one unknown. Solving it in turn reduces the next last equation to one unknown, so that this backward substitution can be used to find all of the unknowns:
Variants
[edit]In some situations, particularly those involving periodic boundary conditions, a slightly perturbed form of the tridiagonal system may need to be solved:
In this case, we can make use of the Sherman–Morrison formula to avoid the additional operations of Gaussian elimination and still use the Thomas algorithm. The method requires solving a modified non-cyclic version of the system for both the input and a sparse corrective vector, and then combining the solutions. This can be done efficiently if both solutions are computed at once, as the forward portion of the pure tridiagonal matrix algorithm can be shared.
If we indicate by:
Then the system to be solved is:
In this case the coefficients and are, generally speaking, non-zero, so their presence does not allow to apply the Thomas algorithm directly. We can therefore consider and as following: Where is a parameter to be chosen. The matrix can be reconstructed as . The solution is then obtained in the following way:[4] first we solve two tridiagonal systems of equations applying the Thomas algorithm:
Then we reconstruct the solution using the Shermann-Morrison formula:
The implementation as a C function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused:
void cyclic_thomas(const int X, double x[restrict X], const double a[restrict X], const double b[restrict X], const double c[restrict X], double cmod[restrict X], double u[restrict X]) {
/*
solves Ax = v, where A is a cyclic tridiagonal matrix consisting of vectors a, b, c
X = number of equations
x[] = initially contains the input v, and returns x. indexed from [0, ..., X - 1]
a[] = subdiagonal, regularly indexed from [1, ..., X - 1], a[0] is lower left corner
b[] = main diagonal, indexed from [0, ..., X - 1]
c[] = superdiagonal, regularly indexed from [0, ..., X - 2], c[X - 1] is upper right
cmod[], u[] = scratch vectors each of length X
*/
/* lower left and upper right corners of the cyclic tridiagonal system respectively */
const double alpha = a[0];
const double beta = c[X - 1];
/* arbitrary, but chosen such that division by zero is avoided */
const double gamma = -b[0];
cmod[0] = alpha / (b[0] - gamma);
u[0] = gamma / (b[0] - gamma);
x[0] /= (b[0] - gamma);
/* loop from 1 to X - 2 inclusive */
for (int ix = 1; ix + 1 < X; ix++) {
const double m = 1.0 / (b[ix] - a[ix] * cmod[ix - 1]);
cmod[ix] = c[ix] * m;
u[ix] = (0.0f - a[ix] * u[ix - 1]) * m;
x[ix] = (x[ix] - a[ix] * x[ix - 1]) * m;
}
/* handle X - 1 */
const double m = 1.0 / (b[X - 1] - alpha * beta / gamma - beta * cmod[X - 2]);
u[X - 1] = (alpha - a[X - 1] * u[X - 2]) * m;
x[X - 1] = (x[X - 1] - a[X - 1] * x[X - 2]) * m;
/* loop from X - 2 to 0 inclusive */
for (int ix = X - 2; ix >= 0; ix--) {
u[ix] -= cmod[ix] * u[ix + 1];
x[ix] -= cmod[ix] * x[ix + 1];
}
const double fact = (x[0] + x[X - 1] * beta / gamma) / (1.0 + u[0] + u[X - 1] * beta / gamma);
/* loop from 0 to X - 1 inclusive */
for (int ix = 0; ix < X; ix++)
x[ix] -= fact * u[ix];
}
There is also another way to solve the slightly perturbed form of the tridiagonal system considered above.[5] Let us consider two auxiliary linear systems of dimension :
For convenience, we additionally define and . We can now find the solutions and applying Thomas algorithm to the two auxiliary tridiagonal system.
The solution can be then represented in the form:
Indeed, multiplying each equation of the second auxiliary system by , adding with the corresponding equation of the first auxiliary system and using the representation , we immediately see that equations number through of the original system are satisfied; it only remains to satisfy equation number . To do so, consider formula for and and substitute and into the first equation of the original system. This yields one scalar equation for :
As such, we find:
The implementation as a C function, which uses scratch space to avoid modifying its inputs for a-c, allowing them to be reused:
void cyclic_thomas(const int X, double x[restrict X], const double a[restrict X], const double b[restrict X], const double c[restrict X], double cmod[restrict X], double v[restrict X]) {
/* first solve a system of length X - 1 for two right hand sides, ignoring ix == 0 */
cmod[1] = c[1] / b[1];
v[1] = -a[1] / b[1];
x[1] = x[1] / b[1];
/* loop from 2 to X - 1 inclusive */
for (int ix = 2; ix < X - 1; ix++) {
const double m = 1.0 / (b[ix] - a[ix] * cmod[ix - 1]);
cmod[ix] = c[ix] * m;
v[ix] = (0.0f - a[ix] * v[ix - 1]) * m;
x[ix] = (x[ix] - a[ix] * x[ix - 1]) * m;
}
/* handle X - 1 */
const double m = 1.0 / (b[X - 1] - a[X - 1] * cmod[X - 2]);
cmod[X - 1] = c[X - 1] * m;
v[X - 1] = (-c[0] - a[X - 1] * v[X - 2]) * m;
x[X - 1] = (x[X - 1] - a[X - 1] * x[X - 2]) * m;
/* loop from X - 2 to 1 inclusive */
for (int ix = X - 2; ix >= 1; ix--) {
v[ix] -= cmod[ix] * v[ix + 1];
x[ix] -= cmod[ix] * x[ix + 1];
}
x[0] = (x[0] - a[0] * x[X - 1] - c[0] * x[1]) / (b[0] + a[0] * v[X - 1] + c[0] * v[1]);
/* loop from 1 to X - 1 inclusive */
for (int ix = 1; ix < X; ix++)
x[ix] += x[0] * v[ix];
}
In both cases the auxiliary systems to be solved are genuinely tri-diagonal, so the overall computational complexity of solving system remains linear with the respect to the dimension of the system , that is arithmetic operations.
In other situations, the system of equations may be block tridiagonal (see block matrix), with smaller submatrices arranged as the individual elements in the above matrix system (e.g., the 2D Poisson problem). Simplified forms of Gaussian elimination have been developed for these situations.[6]
The textbook Numerical Mathematics by Alfio Quarteroni, Sacco and Saleri, lists a modified version of the algorithm which avoids some of the divisions (using instead multiplications), which is beneficial on some computer architectures.
Parallel tridiagonal solvers have been published for many vector and parallel architectures, including GPUs[7][8]
For an extensive treatment of parallel tridiagonal and block tridiagonal solvers see [9]
References
[edit]- ^ Pradip Niyogi (2006). Introduction to Computational Fluid Dynamics. Pearson Education India. p. 76. ISBN 978-81-7758-764-7.
- ^ a b Biswa Nath Datta (2010). Numerical Linear Algebra and Applications, Second Edition. SIAM. p. 162. ISBN 978-0-89871-765-5.
- ^ Nicholas J. Higham (2002). Accuracy and Stability of Numerical Algorithms: Second Edition. SIAM. p. 175. ISBN 978-0-89871-802-7.
- ^ Batista, Milan; Ibrahim Karawia, Abdel Rahman A. (2009). "The use of the Sherman–Morrison–Woodbury formula to solve cyclic block tri-diagonal and cyclic block penta-diagonal linear systems of equations". Applied Mathematics and Computation. 210 (2): 558–563. doi:10.1016/j.amc.2009.01.003. ISSN 0096-3003.
- ^ Ryaben’kii, Victor S.; Tsynkov, Semyon V. (2006-11-02), "Introduction", A Theoretical Introduction to Numerical Analysis, Chapman and Hall/CRC, pp. 1–19, doi:10.1201/9781420011166-1, ISBN 978-0-429-14339-7, retrieved 2022-05-25
- ^ Quarteroni, Alfio; Sacco, Riccardo; Saleri, Fausto (2007). "Section 3.8". Numerical Mathematics. Springer, New York. ISBN 978-3-540-34658-6.
- ^ Chang, L.-W.; Hwu, W,-M. (2014). "A guide for implementing tridiagonal solvers on GPUs". In V. Kidratenko (ed.). Numerical Computations with GPUs. Springer. ISBN 978-3-319-06548-9.
{{cite conference}}
: CS1 maint: multiple names: authors list (link) - ^ Venetis, I.E.; Kouris, A.; Sobczyk, A.; Gallopoulos, E.; Sameh, A. (2015). "A direct tridiagonal solver based on Givens rotations for GPU architectures". Parallel Computing. 49: 101–116. doi:10.1016/j.parco.2015.03.008.
- ^ Gallopoulos, E.; Philippe, B.; Sameh, A.H. (2016). "Chapter 5". Parallelism in Matrix Computations. Springer. ISBN 978-94-017-7188-7.
- Conte, S. D.; de Boor, C. (1972). Elementary Numerical Analysis. McGraw-Hill, New York. ISBN 0070124469.
- This article incorporates text from the article Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm) on CFD-Wiki that is under the GFDL license.
- Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. (2007). "Section 2.4". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.