fmmap version 2.0.0

I introduced fmmap on my blog about two months ago. fmmap is a Python module that can be used instead of the built-in mmap module and offers better performance. I recently released version 2.0.0, and I thought I would share some of the improvements of the last two months.

I implemented an .rfind() method that is usually much faster than the built-in implementation. There is a benchmark script in the repository to compare the performance of .find() and .rfind() to CPython. Future versions might give even more attention to performance.

Compared to the initial start on my Linux workstation, the project is now tested on multiple operating systems:

  • Linux
  • FreeBSD, NetBSD, OpenBSD
  • Solaris (I tested on illumos/OpenIndiana)
  • macOS
  • Windows

The continuous integration systems currently test on Ubuntu 16.04 and 18.04, macOS 10.15 and Windows Server 2019. I do further manual checks on Mageia 7, RHEL 7, FreeBSD 11.3, NetBSD 8.2, OpenBSD 6.6, OpenIndiana 5.11, macOS 10.14 and Windows 10. The continuous integration systems and many of the Unix systems are tested on virtual machines.

The project got support for platforms that don’t have the C library functions that fmmap uses for searching. This was required to support Solaris, macOS and Windows.

Advice flags for madvise(2) on all supported platforms were added — as a backport for older Python versions, but also to support the system specific values that are not supported by CPython.

The project now supports Python versions 3.4–3.9. Not all combinations of operating system and Python version are tested, but of the 7 × 6 matrix (operating system × Python versions, 42 combinations), 23 combinations are tested. Although there is some room for improvement, much of the untested combinations are for Python 3.4 (that is not supported by most platforms) and for Python 3.9 (that is not officially released yet). If we ignore those two versions, 20 of the 28 combinations are tested.

I have a few more things in mind that I would like to do going forward. Although I am mostly happy with the searching performance, there is still some room for improvement due to varying performance characteristics of the functions in the C libraries. (I read through the relevant implementations of all the C libraries with code available.) I am not sure if it is worth spending much more time on performance, but maybe there are some meaningful gains left. Although fast searching was my reason for starting the project, I am also considering expanding the API slightly to support a few more use cases for memory maps.