After a few days of trying out various methods of importing (a large number of) pages into my local Mediawiki (including writing my own import script, before I realized Mediawiki had export/import capabilities), I've come up with a procedure that works. Below are steps + solutions to common problems that occur when importing.
- Go to the Special:Export page of the source wiki, e.g.: http://en.wikipedia.org/wiki/Special:Export, and select a bunch of articles you want to export. DO NOT select "Include Templates" and click Export. If you select "include templates", the xml file will simply include the template syntax in the pages which use templates, but _not_ the actual source templates. The result would be a lot of ugly syntax in the pages you import making you wonder whether the import worked. (By the way it's interesting that most of the pages in wikipedia seem to be using templates extensively - I realized this when I imported using the "include templates" option).
- Now, go to the Special:Import page of the target wiki, choose the XML file you just downloaded, press Import.
- Session timeout - the file is too big. Try to import smaller batches. (Similarly if you get a maximum file upload size type error).
- Fatal error: Allowed memory size of nnnnnnn bytes exhausted (tried to allocate nnnnnnnn bytes) - this is a PHP error - here's how to resolve it.
- Error in fetchObject()": Illegal mix of collations for operation ' IN ' (localhost) - this is a mediawiki problem, luckily there's a quick fix here.
For completeness, here's the Python import script I wrote. It seems to work but no guarantees. It's always better to try the mediawiki import first.
I keep coming up with this error:
ReplyDeleteError in fetchObject()": Illegal mix of collations for operation ' IN ' (localhost)
unfortunately I couldn't find the "quick fix" at the link you provided. Can you help me out?
Any chance you know what steps I would need to make to download ALL TEMPLATES FROM WIKIPEDIA? I would like them all... yes. I've looked everywhere... starting to get desperate...
ReplyDeleteuno: I am very sorry for the late reply, I hope this can still be of help. It appears that the page I was linking to has changed, but I looked up the revision history and the fix is the following: you need to set $wgDBmysql5 = false; in LocalSettings.php.
ReplyDeleterandomblink: I didn't attempt to download the actual templates so I'm not sure how you would go about doing it; perhaps someone on the #mediawiki IRC channel might know.
Thanks a lot for this post. Helped me to sort an issue I had importing infobox templates into my wiki installation.
ReplyDelete