Projects

Over the years I have contributed to a number of projects. I list a few of them below with links where possible.

System development projects

  • The Translate Toolkit — a software library and toolset for localisation engineering with support for file formats, quality checks, translation management, translation memory, terminology matching, etc. It also forms the basis of several other localisation tools, such as Pootle and Virtaal.
  • Pootle — a web based localisation management system also supporting online translation. Python, Django, jQuery.
  • Virtaal — a desktop translation tool for which I did much of the design and programming. Python, GTK+, SQLite.
  • The Amagama translation memory server — a large scale translation memory service for software localisation, including its dataset. Python, Flask, PostgreSQL.
  • Terminator — a web based system for terminology development. Python, Django.
  • Django compression middleware — middleware for the Django web framework to implement content encoding (compression).
  • A system for plagiarism detection in university programming assignments.
  • The claims processing system at Discovery Health. The Java software was backed by an Oracle database, and processed claims at the biggest medical aid company in South Africa.
  • A Java system backed by a PostgreSQL database at the Nova Institute for processing interview data and extracting data for reporting and strategic planning.

Projects creating language technology and language resources

  • The Afrikaans Wikipedia
  • The Afrikaans spell checker used in OpenOffice.org/LibreOffice and many other programs (such as Opera, macOS, the Enchant framework, OmegaT, etc.)
  • Hyphenation rules for Afrikaans and Zulu for OpenOffice.org/LibreOffice
  • The creation of a bilingual Zulu–Xhosa corpus
  • The Afrikaans dictionary for predictive text on Mozilla’s mobile operating system, Firefox OS
  • Digitisation of old books in South African languages
  • Machine translation between Zulu and English
  • The translations in my software localisation work also contributes to the available resources for Afrikaans in terms of translation memories and terminology.

Localisation projects

Over the years I have contributed Afrikaans translations to many projects. This is part of my work on trying to create and provide as much of an Afrikaans computing environment as I can.

Some contributions were once-off, but with some projects I have stayed involved over many years and releases. In some cases this involved collaboration with others, and often I took over translations from previous localisers. All of these are Free and Open Source software available to anyone. Not all of the translations are constantly kept up to date, though. I’ll be very happy to collaborate with people who want to get involved.

  • Mozilla — Firefox, several Mozilla websites, Firefox OS, etc.
  • LibreOffice — the premier free office suite and successor to OpenOffice.org
  • The GNOME project — popular desktop software for Unix type systems
  • LXDE — a lightweight desktop environment
  • Freedesktop.org (PulseAudio, udisks, shared MIME info) — shared components common to many desktop systems
  • Mageia — a complete distribution of a Linux based system
  • Fedora — another complete distribution of a Linux based system
  • Pidgin — a chat client supporting many networking protocols
  • Audacity — an audio editor
  • Abiword — a simple word processor
  • VLC — a video and audio player
  • GCompris — a suite of educational games for children
  • Tuxpaint — a fun drawing program for kids. An excellent choice for starting with software localisation!
  • NextCloud — a web platform for file sharing and communication
  • Jitsi Meet — a free video conferencing system using WebRTC technology
  • MediaWiki — the software behind Wikipedia and many other sites
  • Django and some Django applications — a Python web framework
  • Nikola — a static blog generator
  • Squid — a web caching proxy
  • elinks — a text based web browser
  • Pootle and Virtaal — localisation tools should be localised too!

Training projects

  • Wikipedia — I have trained people in numerous workshops and one-on-one sessions over the years, among them at Unisa, the Suid-Afrikaanse Akademie vir Wetenskap en Kuns (the Afrikaans academy of science and arts) and SADiLaR.
  • Software localisation
    • I have given training at multiple Translate@thons, among them in Accra (Ghana), Kampala (Uganda), Grahamstown and Durban (South Africa).
    • In-depth in-house training of interns at Translate.org.za.
    • In 2010 I presented the “local language” module for the TILP course (The Institute of Localisation Professionals).
    • On two occasions I mentored students in the GNOME outreach program for women (nowadays called Outreachy).
    • I wrote the book Effecting Change Through Localisation specifically to support training in software localisation with a focus on Free and Open Source Software.
    • In 2011 presented the workshop “Any language properly supported in CAT tools” at the Conference on Human Language Technology for Development (HLTD) in Alexandria, Egypt.
  • Keyboard input and typing the diacritics of South African languages. Unisa. 2016.
  • Teaching Information Technology and Computer Science. For example, I was a temporary part-time Lecturer in Information Technology at a private university for a few semesters 2003–2004, and supervised Honours research projects at Unisa. I also do some work as an external examiner.

My preferred tools

Although I’ve done quite a bit of work in Java and C over the years, I use Python most. For my research and machine learning projects I’ve used numpy, scikit-learn, pandas and the surrounding ecosystem quite a bit. For performance improvements, I’ve often reached for multiprocessing, pypy or cython after profiling and benchmarking.

Many scripts have been extended and combined with bash, grep, xargs, sed and the GNU coreutils. I’ve used Linux distributions all of my professional life and am very comfortable with using it as workstation and server. On my own machines I’ve mainly used Mageia and its predecessors (Mandriva, Mandrake), but in virtual machines and on servers I’ve used a lot of other systems too, such as Debian, Ubuntu, Fedora, RHEL, OpenSolaris, and others. For desktop tools I’ve mainly worked in GTK+ code bases, with a bit of exposure to Qt as well.

In the database space I admire SQLite and PostgreSQL in particular. Both of those represent excellent tools with very different goals. I’ve also worked with their full-text indexing, and try to stay up to date with their development news.

For writing I keep getting back to vim. For most of my academic writing I use LaTeX, although I have used LibreOffice Writer quite a bit, including for a complete book. My slides are usually prepared in LibreOffice Impress, although it doesn’t feel quite up to the standard of other components such as Writer. I mostly prefer to use simple, terse slides — this is mostly why I don’t publish slides for my talks and lectures.

For translation I mostly prefer the tool I helped to design and build myself — Virtaal. Although it is in need of some updates (porting to GTK3 and Python3), it still feels like miles ahead of anything else, particularly the online translation tools. Dynamic equivalence of the translation with the source text is often central to my choices. And I obsessively use dictionaries — even when coining new terms related to older ones. These days I’m increasingly using corpus searches too.