MWoffliner

MWoffliner is a tool for making a local offline HTML snapshot of any online MediaWiki instance. It goes through all online articles (or a selection if specified) and create the corresponding ZIM file. It has mainly been tested against Wikimedia projects like Wikipedia and Wiktionary --- but it should also work for any recent MediaWiki.

Read CONTRIBUTING.md to know more about MWoffliner development.

User Help is available in the for a a FAQ.

Features

Scrape with or without image thumbnail
Scrape with or without audio/video multimedia content
S3 cache (optional)
Image size optimiser / Webp converter
Scrape all articles in namespaces or title list based
Specify additional/non-main namespaces to scrape

Run mwoffliner --help to get all the possible options.

Prerequisites

*NIX Operating System (GNU/Linux, macOS, ...)
Redis
NodeJS version 24 (we support only one single Node.JS version, other versions might work or not)
Libzim (On GNU/Linux & macOS we automatically download it)
Various build tools which are probably already installed on your machine (packages libjpeg-dev, libglu1, autoconf, automake, gcc on Debian/Ubuntu)

... and an online MediaWiki with its API available.

Usage

To install latest released MWoffliner version from NPM repo (use -g to install globally, not only in current folder):

npm i -g mwoffliner

Warning

Note that you might need to run this command with the sudo command, depending how your npm / OS is configured. npm permission checking can be a bit annoying for a newcomer. Please read the documentation carefully if you hit problems: https://docs.npmjs.com/cli/v7/using-npm/scripts#user

Then you can run the scraper:

mwoffliner --help

To use MWoffliner with a S3 cache, you should provide a S3 URL like this:

--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"

Contribute

If you've retrieved mwoffliner source code (e.g. with a git clone of our repo), you can then install and run it locally (including with your local modifications):

npm i
npm run mwoffliner -- --help

Detailed contribution documentation and guidelines are available.

API

MWoffliner provides also an API and therefore can be used as a NodeJS library. Here a stub example that could go in your index.mjs file:

import * as mwoffliner from 'mwoffliner';

const parameters = {
    mwUrl: "https://es.wikipedia.org",
    adminEmail: "[email protected]",
    verbose: true,
    format: "nopic",
    articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise

Background

Complementary information about MWoffliner:

MediaWiki software is used by thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia.
MediaWiki is a PHP wiki runtime engine.
Wikitext is the name of the markup language that MediaWiki uses.
MediaWiki includes a parser for WikiText into HTML, and this parser creates the HTML pages displayed in your browser.
Have a look at the scraper functional architecture

License

GPLv3 or later, see LICENSE for more details.

Acknowledgements

This project received funding through NGI Zero Core, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.

Name		Name	Last commit message	Last commit date
Latest commit History 3,816 Commits
.github		.github
.vscode		.vscode
dev		dev
docker		docker
docs		docs
extensions		extensions
res		res
src		src
test		test
translation		translation
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.node-version		.node-version
.npmignore		.npmignore
.npmrc		.npmrc
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CONTRIBUTING.md		CONTRIBUTING.md
Changelog		Changelog
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.js		index.js
jest.config.cjs		jest.config.cjs
offliner-definition.json		offliner-definition.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

MWoffliner

Features

Prerequisites

Usage

Contribute

API

Background

License

Acknowledgements

About

Uh oh!

Releases 15

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors 50

Uh oh!

Languages

Uh oh!

License

openzim/mwoffliner

Folders and files

Latest commit

History

Repository files navigation

MWoffliner

Features

Prerequisites

Usage

Contribute

API

Background

License

Acknowledgements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors 50

Uh oh!

Languages

Packages