Skip to content

Python: high performance backend #8

@imagovrn

Description

@imagovrn

More Efficient Python Implementation

Current flatdata-py implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:

  • Benchmark two implementations on the same data, to know the gap, monitor the benchmarks in CI. Performance benchmarks #9
  • Optimize pure-python implementation.
  • Introduce parallel processing in pure python implementation (or ease integration with a library that would do it for us, like dask).
  • As an alternative approach, create flatdata-py-ext implementation which would build and use binary extensions to improve performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions