Gödel machine: Difference between revisions

Content deleted Content added

Inline

Revision as of 17:18, 11 June 2019

A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way.^{[clarification needed]} It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy.^[1]^[2] The machine was invented by Jürgen Schmidhuber (first proposed in 2003^[3]), but is named after Kurt Gödel who inspired the mathematical theories.^[4]

The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures.^[5] Though theoretically possible, no full implementation has been created.^[6]

The Gödel machine is often compared with Marcus Hutter's AIXItl, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better.^[7]

Limitations

Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired.^[8] This doesn't take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome.

The Gödel machine has limitations of its own, however. Any formal system that encompasses arithmetic is either flawed or allows for unprovable but true statements.^[3] Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove.^[3]

Variables of interest

There are three variables that are particularly useful in the run time of the Gödel machine.^[3]

At some time $t$ , the variable ${\text{time}}$ will have the binary equivalent of $t$ . This is incremented steadily throughout the run time of the machine.

Any input meant for the Gödel machine from the natural environment is stored in variable $x$ . It is likely the case that $x$ will hold different values for different values of variable ${\text{time}}$ .

The outputs of the Gödel machine are stored in variable $y$ , where $y(t)$ would be the output bit-string at some time $t$ .

At any given time $t$ , where $(1\leq t\leq T)$ , the goal is to maximize future success or utility. A typical utility function follows the pattern $u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R}$ :

u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}

where $r(t)$ is a real-valued reward input (encoded within $s(t)$ ) at time $t$ , $E_{\mu }[\cdot \mid \cdot ]$ denotes the conditional expectation operator with respect to some possibly unknown distribution $\mu$ from a set $M$ of possible distributions ( $M$ reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned ${\text{time}}=\operatorname {time} (s)$ is a function of state $s$ which uniquely identifies the current cycle ^[3]. Note that we take into account the possibility of extending the expected lifespan through appropriate actions ^[3].

Instructions used by proof techniques

The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification.^[3]

get-axiom(n)

Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme:

Hardware Axioms formally specify how components of the machine could change from one cycle to the next.
Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs.
Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y.
Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine.
Initial State Axioms contain information about how to reconstruct parts or all of the initial state.
Utility Axioms describe the overall goal in the form of utility function u.

apply-rule(k, m, n)

Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof.

delete-theorem(m)

Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function.

set-switchprog(m, n)

Replaces switchprog S^p_m:n, provided it is a non-empty substring of S^p.

check()

Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs).^[3] This is demonstrated in the following description of the decoded check() function for the Gödel Machine:

D_{KA}={\frac {d_{\text{pore}}}{3}}u={\frac {d_{\text{pore}}}{3}}{\sqrt {\frac {8\kappa NT}{\pi M_{A}}}}

state2theorem(m, n)

Takes in two arguments, m and n, and attempts to convert the contents of S_m:n into a theorem.

Example applications

Time-limited NP-hard optimization

The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias.^[3]

Fast theorem proving

Prove or disprove as quickly as possible that all even integer > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof.^[9]

Maximizing expected reward with bounded resources

A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.^[3]

References

^ Mahmud, M. M. Hassan (2008). Universal Transfer Learning. ProQuest. pp. 16–18. ISBN 9780549909880.
^ Anderson, Michael L.; Tim Oates (Spring 2007). "A review of recent research in metareasoning and metalearning". AI Magazine. 28 (1): 7.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Schmidhuber, Jürgen (December 2006). Gödel Machines: Self-Referential ¨ Universal Problem Solvers Making Provably Optimal Self-Improvements (PDF). Retrieved 10 November 2014.
^ "Gödel machine".
^ Schaul, Tom; Schmidhuber, Juergen (2010). "Metalearning". Scholarpedia. 5 (6): 4650. doi:10.4249/scholarpedia.4650. Retrieved 10 November 2014.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Steunebrink, Bas R.; Schmidhuber, Jürgen (2011). A Family of Gödel Machine Implementations. Vol. 6830. pp. 275–280. CiteSeerX 10.1.1.300.3076. doi:10.1007/978-3-642-22887-2_29. ISBN 978-3-642-22886-5. {{cite book}}: |journal= ignored (help)
^ Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel" (PDF). Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y. Retrieved 13 November 2014.
^ Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel" (PDF). Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y. Retrieved 13 November 2014.
^ Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel". Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y.

External links

[1] Mahmud, M. M. Hassan (2008). Universal Transfer Learning. ProQuest. pp. 16–18. ISBN 9780549909880.

[2] Anderson, Michael L.; Tim Oates (Spring 2007). "A review of recent research in metareasoning and metalearning". AI Magazine. 28 (1): 7.

[Gödel_Machines.-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Schmidhuber, Jürgen (December 2006). Gödel Machines: Self-Referential ¨ Universal Problem Solvers Making Provably Optimal Self-Improvements (PDF). Retrieved 10 November 2014.

[4] "Gödel machine".

[5] Schaul, Tom; Schmidhuber, Juergen (2010). "Metalearning". Scholarpedia. 5 (6): 4650. doi:10.4249/scholarpedia.4650. Retrieved 10 November 2014.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[6] Steunebrink, Bas R.; Schmidhuber, Jürgen (2011). A Family of Gödel Machine Implementations. Vol. 6830. pp. 275–280. CiteSeerX 10.1.1.300.3076. doi:10.1007/978-3-642-22887-2_29. ISBN 978-3-642-22886-5. {{cite book}}: |journal= ignored (help)

[7] Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel" (PDF). Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y. Retrieved 13 November 2014.

[8] Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel" (PDF). Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y. Retrieved 13 November 2014.

[9] Schmidhuber, Jürgen (5 March 2009). "Ultimate Cognition à la Gödel". Cognitive Computation. 1 (2): 177–193. CiteSeerX 10.1.1.218.3323. doi:10.1007/s12559-009-9014-y.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 1: / Line 1: @@
 {{refimprove|date=March 2010}}
-A '''Gödel machine''' is a theoretical self-improving [[computer program]] that solves problems in an optimal way.{{clarify|date=June 2017}} It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy.<ref>{{cite book |title=Universal Transfer Learning |last=Mahmud |first=M. M. Hassan |pages=16–18 |year=2008 |publisher=[[ProQuest]] |isbn=9780549909880 }}</ref><ref>{{cite journal |last=Anderson |first=Michael L. |author2=Tim Oates |date=Spring 2007 |title= A review of recent research in metareasoning and metalearning  |journal=[[AI Magazine]] |volume=28 |issue=1 |pages=7 | url = https://vvvvw.aaai.org/ojs/index.php/aimagazine/article/viewFile/2025/1918}}</ref> The machine was invented by [[Jürgen Schmidhuber]] (first proposed in 2003<ref name="Gödel Machines." />), but is named after [[Kurt Gödel]] who inspired the mathematical theories.<ref>{{cite web|title=Gödel machine|url=http://wiki.lesswrong.com/wiki/G%C3%B6del_machine}}</ref>
+A '''Gödel machine''' is a hypothetical self-improving [[computer program]] that solves problems in an optimal way.{{clarify|date=June 2017}} It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy.<ref>{{cite book |title=Universal Transfer Learning |last=Mahmud |first=M. M. Hassan |pages=16–18 |year=2008 |publisher=[[ProQuest]] |isbn=9780549909880 }}</ref><ref>{{cite journal |last=Anderson |first=Michael L. |author2=Tim Oates |date=Spring 2007 |title= A review of recent research in metareasoning and metalearning  |journal=[[AI Magazine]] |volume=28 |issue=1 |pages=7 | url = https://vvvvw.aaai.org/ojs/index.php/aimagazine/article/viewFile/2025/1918}}</ref> The machine was invented by [[Jürgen Schmidhuber]] (first proposed in 2003<ref name="Gödel Machines." />), but is named after [[Kurt Gödel]] who inspired the mathematical theories.<ref>{{cite web|title=Gödel machine|url=http://wiki.lesswrong.com/wiki/G%C3%B6del_machine}}</ref>
 The Gödel machine is often discussed when dealing with issues of [[Meta learning (computer science)|meta-learning]], also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures.<ref>{{cite journal|last1=Schaul|first1=Tom|last2=Schmidhuber|first2=Juergen|title=Metalearning|issue=6|url=http://scholarpedia.org/article/Metalearning|journal=Scholarpedia|volume=5|accessdate=10 November 2014|pages=4650|doi=10.4249/scholarpedia.4650|date=2010}}</ref> Though theoretically possible, no full implementation has been created.<ref>{{cite book|title= A Family of Gödel Machine Implementations |doi=10.1007/978-3-642-22887-2_29|pages=275–280|journal=Lecture Notes in Computer Science|volume=6830|year = 2011|last1 = Steunebrink|first1 = Bas R.|last2=Schmidhuber|first2=Jürgen|isbn=978-3-642-22886-5|citeseerx=10.1.1.300.3076}}</ref>
@@ Line 19: / Line 19: @@
 There are three variables that are particularly useful in the run time of the Gödel machine.<ref name="Gödel Machines." />
-* At some time <math>t</math>, the variable <math>time</math> will have the binary equivalent of <math>t</math>. This is incremented steadily throughout the run time of the machine.
+* At some time <math>t</math>, the variable <math>\text{time}</math> will have the binary equivalent of <math>t</math>. This is incremented steadily throughout the run time of the machine.
-* Any [[input (computer science)|input]] meant for the Gödel machine from the natural environment is stored in variable <math>x</math>. It is likely the case that <math>x</math> will hold different values for different values of variable <math>time</math>.
+* Any [[input (computer science)|input]] meant for the Gödel machine from the natural environment is stored in variable <math>x</math>. It is likely the case that <math>x</math> will hold different values for different values of variable <math>\text{time}</math>.
 * The outputs of the Gödel machine are stored in variable <math>y</math>, where <math>y(t)</math> would be the output bit-string at some time <math>t</math>.
-At any given time <math>t</math>, where <math>(1 \leq t \leq T)</math>, the goal is to maximize future success or utility. A typical ''utility function'' follows the pattern <math>u(s, Env) : S \times E \rightarrow \mathbb{R}</math>:
+At any given time <math>t</math>, where <math>(1 \leq t \leq T)</math>, the goal is to maximize future success or utility. A typical ''utility function'' follows the pattern <math>u(s, \mathrm{Env}) : S \times E \rightarrow \mathbb{R}</math>:
-: <math>u(s, Env) = E_\mu \Bigg[ \sum_{\tau=time}^T r(\tau) \mid s, Env \Bigg]</math>
+: <math>u(s, \mathrm{Env}) = E_\mu \Bigg[ \sum_{\tau=\text{time}}^T r(\tau) \mid s, \mathrm{Env} \Bigg]</math>
 where <math>r(t)</math> is a real-valued reward input (encoded within <math>s(t)</math>) at time <math>t</math>, <math>E_\mu [ \cdot \mid \cdot ]</math> denotes the
 conditional expectation operator with respect to some possibly unknown distribution <math>\mu</math> from a
-set <math>M</math> of possible distributions (<math>M</math> reflects whatever is known about the possibly probabilistic
+set <math>M</math> of possible distributions (<math>M</math> reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned <math>\text{time} = \operatorname{time}(s)</math> is a function of state <math>s</math> which uniquely identifies the current cycle <ref name="Gödel Machines."/>. Note that we take into account the possibility of extending the expected lifespan through appropriate actions <ref name="Gödel Machines."/>.
-reactions of the environment), and the above-mentioned <math>time = time(s)</math> is a function of state <math>s</math> which uniquely identifies the current cycle <ref name="Gödel Machines."/>. Note that we take into account the possibility of extending the expected lifespan through appropriate actions <ref name="Gödel Machines."/>.
 {{Clear}}
@@ Line 43: / Line 42: @@
 to insert an incorrect theorem into proof, thus trivializing proof verification.<ref name="Gödel Machines." />
-===get-axiom(n)===
+===get-axiom(''n'')===
-Appends the n-th [[axiom]] as a theorem to the current theorem sequence. Below is the initial axiom scheme:
+Appends the ''n''-th [[axiom]] as a theorem to the current theorem sequence. Below is the initial axiom scheme:
 * '''Hardware Axioms''' formally specify how components of the machine could change from one cycle to the next.
@@ Line 54: / Line 53: @@
 * '''Utility Axioms''' describe the overall goal in the form of utility function ''u''.
-===apply-rule(k, m, n)===
+===apply-rule(''k'', ''m'', ''n'')===
 Takes in the index ''k'' of an inference rule (such as [[Modus tollens]], [[Modus ponens]]), and attempts to apply it to the two previously proved theorems ''m'' and ''n''. The resulting theorem is then added to the proof.
-===delete-theorem(m)===
+===delete-theorem(''m'')===
 Deletes the theorem stored at index ''m'' in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above '''''apply-rule''''' function.
-===set-switchprog(m, n)===
+===set-switchprog(''m'', ''n'')===
 Replaces ''switchprog'' ''S<sup> p</sup><sub>m:n</sub>'', provided it is a non-empty [[substring]] of ''S<sup> p</sup>''.
@@ Line 68: / Line 67: @@
 ===check()===
+Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function ''u'' (Item 1f), the utility of a switch from ''p'' to the current switchprog would be higher than the utility of continuing the execution of ''p'' (which would keep searching for alternative switchprogs).<ref name="Gödel Machines." /> This is demonstrated in the following description of the decoded check() function for the Gödel Machine:
-Verifies whether the goal of the proof search has been reached. A target theorem states
-that given the current axiomatized utility function ''u'' (Item 1f), the utility of a switch from
-''p'' to the current switchprog would be higher than the utility of continuing the execution of
-''p'' (which would keep searching for alternative switchprogs).<ref name="Gödel Machines." /> This is demonstrated in the below image:
+: <math> D_{KA} = \frac{ d_\text{pore}} 3 u = \frac{d_\text{pore}} 3 \sqrt{\frac{8\kappa NT}{\pi M_A}} </math>
-[[File:Equation2.png|thumb|left|Description of the decoded check() function for the Godel Machine]]
+===state2theorem(''m'', ''n'')===
-{{clear}}
-===state2theorem(m, n)===
 Takes in two arguments, ''m'' and ''n'', and attempts to convert the contents of ''S<sub>m:n</sub>'' into a theorem.
@@ Line 85: / Line 79: @@
 ===Time-limited NP-hard optimization===
+The initial input to the Gödel machine is the representation of a connected graph with a large number of [[Vertex (graph theory)|nodes]] linked by edges of various lengths. Within given time ''T'' it should find a [[cycle (graph theory)|cyclic]] path connecting all nodes. The only real-valued reward will occur at time ''T''. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias.<ref name="Gödel Machines." />
-The initial input to the Gödel machine is the representation of a
-connected graph with a large number of [[Vertex (graph theory)|nodes]] linked by
-edges of various lengths. Within given time ''T'' it should find
-a [[cycle (graph theory)|cyclic]] path connecting all nodes. The only real-valued
-reward will occur at time ''T''. It equals 1 divided by the
-length of the best path found so far (0 if none was found).
-There are no other inputs. The by-product of maximizing
-expected reward is to find the shortest path findable within
-the limited time, given the initial bias.<ref name="Gödel Machines." />
 ===Fast theorem proving===
+Prove or disprove as quickly as possible that all even integer > 2 are the sum of two primes ([[Goldbach’s conjecture]]). The reward is 1/''t'', where ''t'' is the time required to produce and verify the first such proof.<ref>{{cite journal|last1=Schmidhuber|first1=Jürgen|title=Ultimate Cognition à la Gödel|journal=Cognitive Computation|volume=1|issue=2|pages=177–193|doi=10.1007/s12559-009-9014-y|date=5 March 2009|citeseerx=10.1.1.218.3323}}<!--|accessdate=11 November 2014--></ref>
-Prove or disprove as quickly as possible that all even integer > 2 are the sum of
-two primes ([[Goldbach’s conjecture]]). The reward is 1/''t'', where ''t'' is the time required to produce and verify the first
-such proof.<ref>{{cite journal|last1=Schmidhuber|first1=Jürgen|title=Ultimate Cognition à la Gödel|journal=Cognitive Computation|volume=1|issue=2|pages=177–193|doi=10.1007/s12559-009-9014-y|date=5 March 2009|citeseerx=10.1.1.218.3323}}<!--|accessdate=11 November 2014--></ref>
 ===Maximizing expected reward with bounded resources===
+A [[cognitive]] robot that needs at least 1 [[liter]] of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The [[probabilistic]] environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.<ref name="Gödel Machines." />
-A [[cognitive]] robot that needs at least 1 [[liter]] of gasoline per hour interacts with a partially unknown
-environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion
-to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on.
-The [[probabilistic]] environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.<ref name="Gödel Machines." />
 ==See also==