MediaWiki to Latex Converter

List overview All Threads
Download

newer

older

Gerrit CLI lib

media tarballs announcement

Hugo Vincent

13 Dec 2004 13 Dec '04

1:48 a.m.

Hi everyone,

I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent.

I would prefer a script written in Python, but any recommendations would be very welcome.

Do you know of anything suitable?

Kind Regards, Hugo Vincent, Bluewater Systems.

Show replies by date

Magnus Manske

13 Dec 13 Dec

3:43 a.m.

Hugo Vincent wrote:

...

Hi everyone,

I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent.

I would prefer a script written in Python, but any recommendations would be very welcome.

Do you know of anything suitable?

I don't know an existing solution, *but* you could help with the wiki2xml parser (bison format) which was started by Timwi.

It should be (relatively) easy to convert from XML to LaTeX within the MediaWiki software. I have already started a demo XML-to-XHTML parser (in CVS HEAD). The output could be adjusted to generate LaTeX, PDF, RTF, or even wiki code (wikitext beautifier!).

That would be a long-term investment, so to speak, but I'm certain it will pay off.

Magnus

Hugo Vincent

14 Dec 14 Dec

1:36 a.m.

Thanks everyone,

I decided to write my own, using reg-ex substitutions, done in Python. Its about 90% there - I will post it online somewhere when I am done.

Kind Regards, Hugo Vincent.

Elisabeth Bauer

13 Dec 13 Dec

6:05 a.m.

Hiho,

Hugo Vincent wrote:

...

I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent.

http://sourceforge.net/projects/wikipdf/ http://de.wikipedia.org/wiki/Wikipedia:PDF-Generator it generates latex files and runs it through pdflatex.

I don't know if it is usable in the current state, though (last time I checked it missed table and image support)

greetings, elian

Dirk Hünniger

16 Jun 16 Jun

2:21 p.m.

Hugo Vincent <hugo <at> bluewatersys.com> writes:

...

Hi everyone,

I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/) and I need to extra the content from it and convert it into LaTeX syntax for printed documentation. I have googled for a suitable OSS solution but nothing was apparent.

I would prefer a script written in Python, but any recommendations would be very welcome.

Do you know of anything suitable?

Kind Regards, Hugo Vincent, Bluewater Systems.

This problem is actually sovled there is an easy way to export mediawiki articles to LaTeX and PDF.

see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Yours Dirk Hünniger

Svip

3:33 p.m.

On 16 June 2012 10:51, Dirk Hünniger [email protected] wrote:

...

This problem is actually sovled there is an easy way to export mediawiki articles to LaTeX and PDF.

see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Interesting, but why is it so large? Is the source code available?

Dirk Hünniger

3:55 p.m.

On 06/16/2012 12:03 PM, Svip wrote:

...

Interesting, but why is it so large? Is the source code available?

The source code is available here

http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/

The Binary is large because it contains everything necessery to compile the generated LaTeX code, which is basically a full installation of MikTeX. Yours Dirk Hünniger

Platonides

10:19 p.m.

On 16/06/12 12:25, Dirk Hünniger wrote:

...

On 06/16/2012 12:03 PM, Svip wrote:

...
Interesting, but why is it so large? Is the source code available?

The source code is available here

http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/

The Binary is large because it contains everything necessery to compile the generated LaTeX code, which is basically a full installation of MikTeX. Yours Dirk Hünniger

Have you heard of dependencies? You have to download a 364M file, which extracts to 898M Of those 94M are Linux-specific. The rest includes miktex files, object files, dlls, exes, imagemagick, tcl/tk, Olson db... The real code seem to lie at trunk/wb2pdf/trunk/src, being just 4MB.

And if we look at the linux version, it isn't better. It does not only place everything into a /usr/bin subfolder, it copies everything (90M) to /tmp on each run. Completely oblivious of security. Running this program on a shared system is a vulnerability on itself.

Why don't you make a package with just the wb2pdf specific files? Also, temporary build files are not needed on a release.

Dirk Hünniger

10:44 p.m.

On 06/16/2012 06:49 PM, Platonides wrote:

...

Have you heard of dependencies?You have to download a 364M file, which extracts to 898MOf those 94M are Linux-specific. The rest includes miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson db...The real code seem to lie at trunk/wb2pdf/trunk/src, being just 4MB. And if we look at the linux version, it isn't better. It does not onlyplace everything into a /usr/bin subfolder, it copies everything (90M)to /tmp on each run. Completely oblivious of security.Running this program on a shared system is a vulnerability on itself. Why don't you make a package with just the wb2pdf specific files?Also, temporary build files are not needed on a release.

I provide one download that is easy to use for any user of both Linux an Windows. Thus is obviously contains files unnecessary for each of the two operating systems. I have heard of dependencies and the .deb contains a lot of them, and they are downloaded when it is installed. I can produce a higher quality .deb file. It will still be 90MByte because I need a full Unicode font. To be precise I need twelve variants of it and thats the 90MByte. I essentially did the tmp trick in order to get around the work of researching where to install each file and to properly fix the path names in the code and to test that. So for now you can run the software, you can test every feature you want, and if you or somebody else decided s/he wants to use it, I will make a .deb file that fits yours needs. This will probably take two weeks, with most of the time being spent on chose proper directories.

Yours Dirk

Platonides

17 Jun 17 Jun

5:20 a.m.

On 16/06/12 19:14, Dirk Hünniger wrote:

...

On 06/16/2012 06:49 PM, Platonides wrote:

...
Have you heard of dependencies?You have to download a 364M file, which extracts to 898MOf those 94M are Linux-specific. The rest includes miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson db...The real code seem to lie at trunk/wb2pdf/trunk/src, being just 4MB. And if we look at the linux version, it isn't better. It does not onlyplace everything into a /usr/bin subfolder, it copies everything (90M)to /tmp on each run. Completely oblivious of security.Running this program on a shared system is a vulnerability on itself. Why don't you make a package with just the wb2pdf specific files?Also, temporary build files are not needed on a release.

I provide one download that is easy to use for any user of both Linux an Windows. Thus is obviously contains files unnecessary for each of the two operating systems.

If it was just a few extra MB, I could agree. But 94M / 800M IMHO are past the point here you should split per OS.

...

I have heard of dependencies and the .deb contains a lot of them, and they are downloaded when it is installed. I can produce a higher quality .deb file.

...

It will still be 90MByte because I need a full Unicode font. To be precise I need twelve variants of it and thats the 90MByte.

You mean the mega font? That's actually 207M uncompressed :) That should probably go to a different package (and depend on it). I don't see why it couldn't fallback to another available font if it's not available, though. Many wikis are written in just a tiny subset of unicode.

It seems you're creating it from wqyzenhei + unifont + freeserif fonts. Why do you need to merge them?

...

I essentially did the tmp trick in order to get around the work of researching where to install each file and to properly fix the path names in the code and to test that.

In case of doubt, you should have placed the folder in /usr/lib A number of would be better placed at /usr/share, though. But I'm not sure what are many files. For instance, what's the purpose of geturl and pa programs?

And why do you have copies at bin/ and dist/build? Furthermore, why are they different? Build artifacts are also common there.

...

So for now you can run the software, you can test every feature you want, and if you or somebody else decided s/he wants to use it, I will make a .deb file that fits yours needs. This will probably take two weeks, with most of the time being spent on chose proper directories.

I feel a bit wary of running that :S

Dirk Hünniger

12:51 p.m.

...

You mean the mega font? That's actually 207M uncompressed :) That should probably go to a different package (and depend on it). I don't see why it couldn't fallback to another available font if it's not available, though.

The point is that the change of the font has to happen inside a run of LaTeX compiler. I tried that and it sometimes works but often the compiler does not produce any output if I do that. So the best is to give the compiler one font for the whole document and let run with that.

...

It seems you're creating it from wqyzenhei + unifont + freeserif fonts. Why do you need to merge them?

I merged them because changing the font in LaTeX does not always work, especially inside headings which become part of the table of contents.

...

...
I essentially did the tmp trick in order to get around the work of researching where to install each file and to properly fix the path names in the code and to test that.

In case of doubt, you should have placed the folder in /usr/lib A number of would be better placed at /usr/share, though. But I'm not sure what are many files. For instance, what's the purpose of geturl and pa programs?

The main part of the program is written in the wonderful and easy to learn purely functional programming language Haskell. Some minor parts are written in Python3, these two parts need to communicate. Currently pa and geturl are binaries created by the Haskell Compiler ghc. pa is essitially a compiler for the mediawiki language, it parses to a tree and writes it down as LaTeX. The problem with the mediawiki language is that it allows improper bracketing of tags and thus is not context free and thus there is no BNF for it and thus all normal parsers are ruled out and thus you need to use a more obscure technology like monadic parser combinators in Haskell.

But since you seem to have a good idea where to put which file, you maybe could give me some hints on that, since that would make my work much easier.

...

And why do you have copies at bin/ and dist/build? Furthermore, why are they different? Build artifacts are also common there.

I will remember this for future versions of the deb file. Essentially I only need the stuff in the bin directory. The stuff in the build directory is just created by the ghc build tools.

Yours Dirk

Dirk Hünniger

18 Jun 18 Jun

5:01 p.m.

...

You mean the mega font? That's actually 207M uncompressed :) That should probably go to a different package (and depend on it). I don't see why it couldn't fallback to another available font if it's not available, though.

I could indeed work without that font. But in this case I will create font switching commands in the latex file. This means that it won't compile with pdflatex, since that does not allow font switching inside headings. Furthermore the LaTeX file will become significantly less readable. I also cannot put the fonts to another package, since the Debian project is not going to accept that package, as I just investigated. So essentially it is not possible to create a significantly better deb file from my point of view. Yours Dirk

Platonides

16 Jun 16 Jun

9:23 p.m.

On 16/06/12 10:51, Dirk Hünniger wrote:

...

This problem is actually sovled there is an easy way to export mediawiki articles to LaTeX and PDF.

see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Yours Dirk Hünniger

How does it compare with http://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?

Also, are you aware you're replying to an 8 years old thread?

Dirk Hünniger

10:01 p.m.

On 06/16/2012 05:53 PM, Platonides wrote:

...

On 16/06/12 10:51, Dirk Hünniger wrote:> This problem is actually sovled there is an easy way to export mediawiki> articles to LaTeX and PDF.> > see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf%3E > Yours Dirk Hünniger How does it compare withhttp://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?

I invested much more time in the development. So it is probably more complete. If you really want to know I can make a feature by feature list. But its going to be very long.

Just to give you an idea how deeply I went into detail I give you a question I had to think about. If a table is very wide, it has to be landscape, but if it is a nested one it must not. And if it as very long it has to span several pages. And if it begins with a set of rows continuously containing at least on header cell each, those rows have to be repeated on top of each new page of the table. And by the way what happens if these cells contain footnotes.

Sounds like fun?

An important advantage for the user is that you can immediately use it in wikipedia, wikibooks, etc. This is because it is running on the client side.

On the other hand Wiki2LaTeX runs on the server side. That means it needs to be installed by the administrator of the Wiki.

I will also provide a server side version of my software if requested to do so.

Yours Dirk

4525

Age (days ago)

7270

Last active (days ago)

[email protected]

13 comments

6 participants

tags (0)

participants (6)

Dirk Hünniger
Elisabeth Bauer
Hugo Vincent
Magnus Manske
Platonides
Svip