fmmap version 2.0.0

I introduced fmmap on my blog about two months ago. fmmap is a Python module that can be used instead of the built-in mmap module and offers better performance. I recently released version 2.0.0, and I thought I would share some of the improvements of the last two months.

I implemented an .rfind() method that is usually much faster than the built-in implementation. There is a benchmark script in the repository to compare the performance of .find() and .rfind() to CPython. Future versions might give even more attention to performance.

Compared to the initial start on my Linux workstation, the project is now tested on multiple operating systems:

  • Linux
  • FreeBSD, NetBSD, OpenBSD
  • Solaris (I tested on illumos/OpenIndiana)
  • macOS
  • Windows

The continuous integration systems currently test on Ubuntu 16.04 and 18.04, macOS 10.15 and Windows Server 2019. I do further manual checks on Mageia 7, RHEL 7, FreeBSD 11.3, NetBSD 8.2, OpenBSD 6.6, OpenIndiana 5.11, macOS 10.14 and Windows 10. The continuous integration systems and many of the Unix systems are tested on virtual machines.

The project got support for platforms that don’t have the C library functions that fmmap uses for searching. This was required to support Solaris, macOS and Windows.

Advice flags for madvise(2) on all supported platforms were added — as a backport for older Python versions, but also to support the system specific values that are not supported by CPython.

The project now supports Python versions 3.4–3.9. Not all combinations of operating system and Python version are tested, but of the 7 × 6 matrix (operating system × Python versions, 42 combinations), 23 combinations are tested. Although there is some room for improvement, much of the untested combinations are for Python 3.4 (that is not supported by most platforms) and for Python 3.9 (that is not officially released yet). If we ignore those two versions, 20 of the 28 combinations are tested.

I have a few more things in mind that I would like to do going forward. Although I am mostly happy with the searching performance, there is still some room for improvement due to varying performance characteristics of the functions in the C libraries. (I read through the relevant implementations of all the C libraries with code available.) I am not sure if it is worth spending much more time on performance, but maybe there are some meaningful gains left. Although fast searching was my reason for starting the project, I am also considering expanding the API slightly to support a few more use cases for memory maps.

Django compression middleware mentioned in the Django speed handbook

A while ago a colleague pointed me to the Django speed handbook. This performance guide for Django has advice for different layers of a typical Django application: the database, the network and the front-end. It looks like a collection of good advice for Django developers to speed up their web sites.

My project, Django compression middleware, is recommended for the compression of responses from the server. It was nice to see my project referred to as “the excellent django-compression-middleware library”. Thank you to Shibel K. Mansour!

Introducing fmmap

I recently polished some code I had lying around and can now introduce fmmap. It is a Python module that can be used instead of the built-in mmap module and offers better performance. My own interest was specifically in a faster .find() method. The “f” in fmmap might refer to “find”, “fast”, or someone’s name.

Memory mapping is an approach of accessing a file as if it is just an array in memory. No explicit file reading or writing is required. As you access this area of memory, the operating system manages the input and output to the underlying file as necessary. In some circumstances it can result in better performance.

A few years ago I tried mmap in a toy program, and got some performance gains. Then I noticed that the .find() method in CPython, while implemented in C, used a naive algorithm, and I wanted to see if I could improve performance more. First I tried to implement some of glibc’s algorithms in a Cython module, but eventually got the best performance by using the optimised functions in glibc. The whole process also refreshed and improved some of my knowledge of the C APIs for strings and memory.

Now I decided to take this code out into a project of its own, and decided that providing a drop-in replacement for the built-in mmap module would be the most pleasant way to expose this to other people. This of course brings all of the advantages and not so pleasant overhead of project infrastructure, tests, CI setup, and rediscovering how to package Python packages in the ever changing environment. However, I’m subclassing the built-in class, so I only had to implement the parts I’m interested in — or so I thought. In an attempt to do this well, I got the tests from the standard library (CPython) to test my implementation against. The tests from the PyPy project is not exactly the same, so I just decided to drop that in as well. That ended up being a good decision, as it caught a few bugs that the tests in the standard library did not catch.

The test suite in the standard library develops over time, and the current version in git is aimed at the upcoming release (Python 3.9). In an attempt to be a good citizen of the Python world, I try to support as many Python versions as are feasible. However, each Python version added tests that don’t work on previous versions due to bug fixes, features and API changes. So I can’t test on older Python versions (not even Python 3.8) on this single test suite without some difficulty. And so my project that wanted to expose one function to the world, became a project to backport all the latest features. According to the test suite, it now works in Python 3.5 – 3.8, and I will fix the one remaining failure on Python 3.4 easily as part of providing an improved version of .rfind().

I haven’t yet done proper benchmarking, but you might find .find() substantially faster with fmmap. The exact performance is heavily dependent on your C library. Let me know how it works for you!

My talk at PostgresConfZA

I presented a talk about PostgreSQL performance in the amaGama project today at South Africa’s Postgres Conference. The talk gave some background about translation memory systems and covered how the amaGama project uses PostgreSQL’s full-text functionality.

As you can read in the summary, I managed to make a vast improvement to the databasis performance. Users of our hosted instance have already benefited from the improved performance for the last few weeks.

An interesting aspect of this work is how the partially overlapping partial indexes are complemented by the better physical layout on disk (achieved by the CLUSTER command). The performance is improved in the case of both hot and cold disk cache.

I trust a video will be made available in due course. Congratulations to the organizers on a very nice event!

(New in October 2020: a video of the talk is available now.)

Refreshing amaGama

I recently started working again on improving the amaGama translation memory server and service. The project provides a translation memory system that is used by translation tools such as Pootle and Virtaal. The web service that the Translate project hosts contains translations of several popular pieces of free and open source software. This provides translators in over a hundred languages with suggestions from previous translation work in FOSS localisation. Several areas of amaGama require work, and I wanted to prioritise well so as to reach a number of goals.

Firstly the server itself didn’t receive the attention it needed in the last while. The service was not responding at all, and a number of updates were necessary. I’ve already upgraded the operating system, but a review of the system configuration was also required. Users of Virtaal will be happy to know that I implemented the necessary changes so that the amaGama plugin in Virtaal is working again. On the server things are working at least as well as before, and better in a few areas.

Performance on the service has been inconsistent for many years. There are a number of reasons, including server configuration, and the code itself. I’ve often seen some requests taking more than ten seconds. A translation memory response arriving that late is unlikely to be useful. By that time I have probably translated the whole segment from scratch and moved on. I believe most users need a response in less than a second. Since network latency alone can take more than that, we really need the web service itself to be as fast as possible.

I hope to write soon about interesting changes in the code to improve performance, but I already improved things with simple configuration changes. While there are certain database queries that are slow, handling other responses at the same time allows other users to be mostly unaffected, thereby reducing the impact of a performance problem in one request. Before the server only served single requests at a time—I have no idea why it was configured like that. Some requests still take more than 10 seconds, but this does not occur as frequently any more. The slow responses deserve a blog post or two of their own, and I’m still working on that. (Update: Since then I spoke about this at PostgresConfZA.)

The current database (the memory of translations) on the server is pretty old by now. I’ve started working on refreshing all of that data as well. That is almost a whole project in itself! Many projects moved their version control systems in the last few years, and in some cases I can’t easily find some of the things we included in the database before. If there are specific projects you think should be included in amaGama, feel free to contact me.

Another goal in all of my work is to invest in making things easier in future. The server configuration is simpler, the configuration of the web service is moved out of the code, etc. Hopefully it means that a small volunteer group (even if it is as small as me) can keep this going for a long while still.

Django compression middleware now supports Zstandard

I think I first learnt about Zstandard (zstd) in a blog post by Gregory Szorc. At some stage I saw that zstd is also registered with IANA’s content coding registry for HTTP and I tried to find out how much of the web ecosystem already supported it. At that time there was a patch for zstd support in Nginx, but nothing else, as I recall.

Things are not much better right now, but zstd continued maturing and has been adopted for non-web use by many projects. I recently checked and found one HTTP client, wget2, that claimed support for zstd. So I decided to add support for zstd to Django compression middleware and to test it with wget2. With wget2 I can be sure that at least one web client is able to consume what the middleware provides.

I released version 0.3.0 of Django compression middleware a few days ago with support for zstd. Since I don’t know of any browsers that support it yet, I don’t expect many people to be excited about this. There isn’t even information on the Can I use … website about zstd yet (github issue). However, I see this as my small contribution to the ecosystem.

It is not clear that Zstandard will provide a massive win over the alternatives in all cases, but my testing on multiple HTML and JSON files suggest that it is mostly equal to or better than Brotli and basically always better than gzip with the defaults I currently use. “Better” here means a smaller payload produced in the same or less time.

Django compression middleware now supports zstd, Brotli and gzip.

Does the internet understand your language?

I heard an advertisement on the radio this morning where a son is talking to his father about some commercial service. He points the dad to the web address, and—since the ad is in Afrikaans and he spells the address in English—mentions “Die internet verstaan nie Afrikaans nie” (“The internet doesn’t understand Afrikaans”). I’m thinking of the reasons why whoever wrote the ad felt it would somehow improve things to add that bit to the copy.

Of course, I mostly agree—the Internet doesn’t understand Afrikaans, but neither does it understand English or any other language. Maybe the organisation just feels a bit bad that they don’t have an Afrikaans presence on the web, or might not even know how easy it is to register another domain name as an alias to their main website.

On the other hand, software processing information on the web is able to do amazing things with the information on the web—in English, Afrikaans and other languages. I’m not trying to belittle the fact that the technology support for languages are not equal, but domain names are just characters—you can type in whatever you want (the complexities of International Domain Names ignored for now).

Working with language data is my bread and butter, so it was an unfortunate reminder of the common perceptions about language and technology. I hope some people listening to that questioned it, or at least started thinking about how it could be changed.

My paper at OLC / DEASA

Yesterday I presented at the Open Learning Conference of Distance Education Association of Southern Africa. The title of my paper is “Re-evaluation of multilingual terminology”. I tried to make the case that terminological resources can serve as more than reference resources and I showed concrete examples of how it can also assist with conceptual modelling.

Ontology engineering is big business in the field of natural language processing, but I routinely still meet academics who think that terms with translations (maybe with definitions) is the highest goal we should strive for. My presentation was an attempt to provide a broadened vision.