Exporting MediaWiki sites to Google sites

In the last post I’ve talked about moving files from SharePoint to Google Site,

what about a Mediawiki site?
how to export it and upload it?
I’ve already covered the uploading files to Google sites via the Google sites API,
you can use java or python to write a script to mass upload all your files and htmls to your new google site,
but first you need to export everything out of the Mediawiki site.

here is a working method to export your site, these are the steps:

  1. install python
  2. download the mw2html script
  3. export the mediawiki site

so lets start :

Installing python

If you are working from a linux distro with yum its as easy as typing:
yum install python

for windows – you will need to download and install from the python site:
the main python download page for windows
or you can just grab the latest version for September 4 2011 Python 3.2.2
Download and install the msi package.

next step is to add the installation to the computer’s path,
in windows 7  open the start panel and type in the search panel:
“view advance system settings”

press the “Environment Variables”

in the System Variables section search for the “Path” line and choose edit:

now to the end of the line add the path of where you have installed the python,
for example I have installed it to  C:\Python27 , and so i will add at the end of the Path line this:
;C:\Python27
notice the “;”  – this separates between the items on the Path line.

Now if you will type “python -V” on your command line it will run the python binary and will show you your version

now that we have the python working lets move to the next step:

Download the mw2html script

you can find it here:

from Connelly Barnes blog

this script will crawl all over your media wiki site and will grab all the html files and all the attachments there.

Exporting the mediawiki site

usage:
url – URL of mediawiki page to convert to static HTML.
outdir – Output directory.

-f, –force – Overwrite existing files in outdir.
–no-flatten – Do not flatten directory structure.
–no-lower – Retain original case for output filenames and dirs.
–no-clean – Do not clean up filenames (clean replaces
non-alphanumeric chars with _, renames math thumbs).
–no-hack-skin – Do not modify skin CSS and HTML for looks.
–no-made-by – Suppress “generated by” comment in HTML source.
–no-move-href – Disable <movehref> tag. [1]
–no-remove-png – Retain external link PNG icons.
–no-remove-history – Retain image history and links to information.
-l, –left=a.html – Paste HTML fragment file into left sidebar.
-t, –top=a.html – Paste HTML fragment file into top horiz bar.
-b, –bottom=a.html – Paste HTML fragment file into footer horiz bar.
-i, –index=filename – Move given filename in outdir to index.html.

Example Usage:
mw2html http://127.0.0.1/mywiki/ out -f -i main_page.html -l sidebar.html

 Important note:

this script will crawl all over your wiki site, but it does ignore all the files that do not have a link to them,
all the  “orphan” pages, to solve this issue, instead of pointing to the main page, you can run the mw2html script while pointing to the orphans page,
it will get all the regular files from your site and the Orphan pages.
the link looks like:

http://yourwikisite/wiki/index.php?title=Special:Lonelypages&limit=500&offset=0

now that you have a folder with all your wiki page, you can edit the content of the folder,
remove all the files staring with the word “image_”
remove all the un-needed js files,
and do internal changes in the html files like change the links pointing to the original wiki site and point them to the new Google site URL.

now you can use the Google sites API to upload all your html to your new site.

  

Technorati Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>