Posted on

Python code profiling

Python code will use CPU (~time) and memory (~space). If the code is slow it will be difficult to use with a lot of data but it should work in theory. However memory usage will lead to a crash if there is an overflow of memory, even for an instant. So we need to monitor the code during execution to be able to interpolate the performance of the code. The code would be run on a small sample of data and then extrapolated to the full dataset.

How code is processed by the machine

Code is human readable but machine will read low level stuff like bits. Assembler gives a good idea about complexity and execution

How different types of algorithms are processed

  • If statements are not a subtle way of programing, indeed in our mind we only assess the selected condition, but the machine thinks about all the conditions. So all the possibilities will be computed even if the conditions cannot be met.
  • For loops are the most popupal way of programing things, to the extent of increasing the duplication of code. This can be seen in the profiler

Types of data

In short, there are two main types of data: Numeric data operations is the most common type of data. This kind of data has a rather limited memory footprint. String data is the most useful data type for human readability. This kind of data has a rather large memory footprint and adds complexity.

Profiling CPU speed

Can be mesured with sciagraph. The outcome will be lines of code and the time it takes to execute them. This can be used to optimize the code.

Profiling memory usage

Can be mesured with fil profiler. The outcome is text also, with detail on peak usage.

Product life cycle

We can see how valuable the profiling is for the life time of the snippet or app. However due to the absract and complex nature of profiling, it is quite obvious that this step would only be done once at the beginning of the project. What if we manage to store data overtime but also integrate it into the CICD pipeline?

reference

pandas-vectorization